Lecture

Entity Resolution in Data Streams

Description

This lecture covers the challenges of entity resolution in data streams, focusing on the high cost and inefficiency of existing approaches. The instructor presents optimizations like Locality Sensitive Hashing and Prefix Filtering to improve resolution throughput. Techniques such as blocking and workload balancing are discussed to handle imbalanced distributions. The lecture also delves into the experimental setup using Flink, synthetic data, and sliding streaming windows. Multi-objective optimization strategies are explored to enhance performance, along with real-time workload partitioning. The conclusion emphasizes the optimization of entity resolution in a streaming fashion and proposes future work to further reduce comparisons and improve efficiency.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.