Lecture

Temporality and Entity Resolution

Description

This lecture covers the challenges of dealing with temporality in data, including the time of data entry and the time when a recorded phenomenon is considered true. It also delves into entity resolution, which involves identifying and merging duplicate entity profiles across datasets. Various techniques such as fuzzy matching, deduplication, and similarity metrics like Jaccard similarity are discussed. The lecture further explores the complexities of duplicate entity detection, data deduplication, and the computational costs involved. Strategies for reducing the computational cost of duplicate detection, such as blocking for candidate selection and q-gram set join, are explained. The session concludes with a summary of entity resolution and data wrangling tasks, emphasizing the importance of optimizations to make clustering more efficient.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.