Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked. "Record linkage" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. However, many other terms are used for this process. Unfortunately, this profusion of terminology has led to few cross-references between these research communities. Computer scientists often refer to it as "data matching" or as the "object identity problem". Commercial mail and database applications refer to it as "merge/purge processing" or "list washing". Other names used to describe the same concept include: "coreference/entity/identity/name/record resolution", "entity disambiguation/linking", "fuzzy matching", "duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration" and "conflation". While they share similar names, record linkage and Linked Data are two separate approaches to processing and structuring data. Although both involve identifying matching entities across different data sets, record linkage standardly equates "entities" with human individuals; by contrast, Linked Data is based on the possibility of interlinking any web resource across data sets, using a correspondingly broader concept of identifier, namely a URI. The initial idea of record linkage goes back to Halbert L.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (6)
Entity Resolution: Techniques and Applications
Explores entity resolution techniques for identifying and aggregating different entity profiles across datasets, covering challenges and solutions.
Entity Resolution Techniques
Explores entity resolution techniques, data deduplication, similarity metrics, computational cost, blocking techniques, and scaling out similarity joins.
Show more
Related publications (30)

A Practical Influence Approximation for Privacy-Preserving Data Filtering in Federated Learning

Boi Faltings, Ljubomir Rokvic, Panayiotis Danassis

Federated Learning by nature is susceptible to low-quality, corrupted, or even malicious data that can severely degrade the quality of the learned model. Traditional techniques for data valuation cannot be applied as the data is never revealed. We present ...
2023

Challenging the Assumptions: Rethinking Privacy, Bias, and Security in Machine Learning

Bogdan Kulynych

Predictive models based on machine learning (ML) offer a compelling promise: bringing clarity and structure to complex natural and social environments. However, the use of ML poses substantial risks related to the privacy of their training data as well as ...
EPFL2023

POSEIDON: Privacy-Preserving Federated Neural Network Learning

Jean-Pierre Hubaux, Juan Ramón Troncoso-Pastoriza, Jean-Philippe Léonard Bossuat, Apostolos Pyrgelis, David Jules Froelicher, Joao André Gomes de Sá e Sousa, Sinem Sav

In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an N-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network ...
INTERNET SOC2021
Show more
Related concepts (2)
Master data management
Master data management (MDM) is a technology-enabled discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets. Organisations, or groups of organisations, may establish the need for master data management when they hold more than one copy of data about a business entity. Holding more than one copy of this master data inherently means that there is an inefficiency in maintaining a "single version of the truth" across all copies.
Data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.