Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the concept of data perturbation as a method to protect privacy in database systems. It explains how randomized databases can be used to introduce noise, focusing on retention-replacement perturbations. The instructor discusses mechanisms for perturbing data, such as generating values from a column's probability distribution and replacing or retaining them based on probabilities. The lecture also addresses aggregate reconstruction on perturbed data, estimating original values, and reconstructing multi-column queries. Additionally, it explores the application of data perturbation in training models for data mining while preserving privacy. The tradeoff between privacy guarantees and error rates is highlighted, along with the implications of data perturbation on data integrity.