Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the process of deanonymization using two public datasets: one anonymized and published by Netflix, and the other non-anonymized. The datasets are loaded and displayed, containing random names and evaluations. The exercise involves matching users between the datasets, sorting by rating, and finding missing films. The lecture progresses to handling larger datasets, evaluating user matches, and addressing the challenges of real-world databases. Techniques such as frequency evaluation and probabilistic correlations are discussed, emphasizing the complexities of accurate matching and the need for probabilistic approaches.