Record linkage

Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked. "Record linkage" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. However, many other terms are used for this process. Unfortunately, this profusion of terminology has led to few cross-references between these research communities. Computer scientists often refer to it as "data matching" or as the "object identity problem". Commercial mail and database applications refer to it as "merge/purge processing" or "list washing". Other names used to describe the same concept include: "coreference/entity/identity/name/record resolution", "entity disambiguation/linking", "fuzzy matching", "duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration" and "conflation". While they share similar names, record linkage and Linked Data are two separate approaches to processing and structuring data. Although both involve identifying matching entities across different data sets, record linkage standardly equates "entities" with human individuals; by contrast, Linked Data is based on the possibility of interlinking any web resource across data sets, using a correspondingly broader concept of identifier, namely a URI. The initial idea of record linkage goes back to Halbert L.

POSEIDON: Privacy-Preserving Federated Neural Network Learning

Jean-Pierre Hubaux, Juan Ramón Troncoso-Pastoriza, David Jules Froelicher, Apostolos Pyrgelis, Joao André Gomes de Sá e Sousa, Jean-Philippe Léonard Bossuat, Sinem Sav

In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an N-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network ...

INTERNET SOC2021

Storage Management in Smart Data Lake

Anastasia Ailamaki, Haoqiong Bian, Bikash Chandra, Ioannis Mytilinis

Data lakes are complex ecosystems where heterogeneity prevails. Raw data of diverse formats are stored and processed, while long and expensive ETL processes are avoided. Apart from data heterogeneity, data lakes also entail hardware heterogeneity. Typical ...

2021

POSEIDON: Privacy-Preserving Federated Neural Network Learning

Jean-Pierre Hubaux, Juan Ramón Troncoso-Pastoriza, David Jules Froelicher, Apostolos Pyrgelis, Joao André Gomes de Sá e Sousa, Jean-Philippe Léonard Bossuat, Sinem Sav

INTERNET SOC2021

Storage Management in Smart Data Lake

Anastasia Ailamaki, Haoqiong Bian, Bikash Chandra, Ioannis Mytilinis

2021

POSEIDON: Privacy-Preserving Federated Neural Network Learning

Storage Management in Smart Data Lake

Holistic, Efficient, and Real-time Cleaning of Heterogeneous Data

Graph Chatbot

POSEIDON: Privacy-Preserving Federated Neural Network Learning

Storage Management in Smart Data Lake

Holistic, Efficient, and Real-time Cleaning of Heterogeneous Data