Lecture

Temporality and Entity Resolution

Description

This lecture covers the challenges of dealing with temporality in data, including the time of data entry and the time when a recorded phenomenon is considered true. It also delves into entity resolution, which involves identifying and merging duplicate entity profiles across datasets. Various techniques such as fuzzy matching, deduplication, and similarity metrics like Jaccard similarity are discussed. The lecture further explores the complexities of duplicate entity detection, data deduplication, and the computational costs involved. Strategies for reducing the computational cost of duplicate detection, such as blocking for candidate selection and q-gram set join, are explained. The session concludes with a summary of entity resolution and data wrangling tasks, emphasizing the importance of optimizations to make clustering more efficient.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.