Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the challenges of entity resolution in data streams, focusing on the high cost and inefficiency of existing approaches. The instructor presents optimizations like Locality Sensitive Hashing and Prefix Filtering to improve resolution throughput. Techniques such as blocking and workload balancing are discussed to handle imbalanced distributions. The lecture also delves into the experimental setup using Flink, synthetic data, and sliding streaming windows. Multi-objective optimization strategies are explored to enhance performance, along with real-time workload partitioning. The conclusion emphasizes the optimization of entity resolution in a streaming fashion and proposes future work to further reduce comparisons and improve efficiency.
This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.
Watch on Mediaspace