Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture introduces the concept of synthetic data generation as a privacy-preserving technique for data publishing. It covers the challenges of anonymization, attribute inference, and identity disclosure in raw datasets. The promise of synthetic data lies in enabling cross-boundary data analytics without compromising privacy. Various generative models and Bayesian networks are discussed, highlighting the importance of protecting customers' sensitive data. The lecture evaluates the privacy gain of publishing synthetic datasets compared to raw datasets, focusing on membership inference and attribute disclosure threats. It concludes that while synthetic data offers some privacy protection, it is not a foolproof solution against privacy threats.