Data Perturbation: Protecting Privacy in Database Systems

In course

This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students

Description

This lecture covers the concept of data perturbation as a method to protect privacy in database systems. It explains how randomized databases can be used to introduce noise, focusing on retention-replacement perturbations. The instructor discusses mechanisms for perturbing data, such as generating values from a column's probability distribution and replacing or retaining them based on probabilities. The lecture also addresses aggregate reconstruction on perturbed data, estimating original values, and reconstructing multi-column queries. Additionally, it explores the application of data perturbation in training models for data mining while preserving privacy. The tradeoff between privacy guarantees and error rates is highlighted, along with the implications of data perturbation on data integrity.

Instructor

Anastasia Ailamaki

Official source