Concept

Semantic heterogeneity

Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogeneity is compounded due to the flexibility of semi-structured data and various tagging methods applied to documents or unstructured data. Semantic heterogeneity is one of the more important sources of differences in heterogeneous datasets. Yet, for multiple data sources to interoperate with one another, it is essential to reconcile these semantic differences. Decomposing the various sources of semantic heterogeneities provides a basis for understanding how to map and transform data to overcome these differences. One of the first known classification schemes applied to data semantics is from William Kent more than two decades ago. Kent's approach dealt more with structural mapping issues than differences in meaning, which he pointed to data dictionaries as potentially solving. One of the most comprehensive classifications is from Pluempitiwiriyawej and Hammer, "Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources". They classify heterogeneities into three broad classes: Structural conflicts arise when the schema of the sources representing related or overlapping data exhibit discrepancies. Structural conflicts can be detected when comparing the underlying schema. The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts between the element types and attribute names. Domain conflicts arise when the semantics of the data sources that will be integrated exhibit discrepancies. Domain conflicts can be detected by looking at the information contained in the schema and using knowledge about the underlying data domains. The class of domain conflicts includes schematic discrepancy, scale or unit, precision, and data representation conflicts.

Official source

https://en.wikipedia.org/wiki/Semantic_heterogeneity

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related courses (1)

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

Related publications (25)

Federated learning with uncertainty-based client clustering for fleet-wide fault diagnosis

Olga Fink, Hao Lu, Chao Hu

Operators from various industries have been pushing the adoption of wireless sensing nodes for industrial monitoring, and such efforts have produced sizeable condition monitoring datasets that can be used to build diagnosis algorithms capable of warning ma ...

2024

Distributed Optimization with Byzantine Robustness Guarantees

Lie He

As modern machine learning continues to achieve unprecedented benchmarks, the resource demands to train these advanced models grow drastically. This has led to a paradigm shift towards distributed training. However, the presence of adversariesâ whether ma ...

EPFL2023

Cognitive twin construction for system of systems operation based on semantic integration and high-level architecture

Dimitrios Kyritsis, Jinzhi Lu, Han Li

With the increasing complexity of engineered systems, digital twins (DTs) have been widely used to support integrated modeling, simulation, and decision-making of the system of systems (SoS). However, when integrating DTs of each constituent system, it is ...

IOS PRESS2022

Related concepts (3)

Data mapping

In computing and data management, data mapping is the process of creating data element mappings between two distinct data models.

Semantic integration

Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information (physical, psychological, and social), documents of all sorts, contacts (including social graphs), search results, and advertising and marketing relevance derived from them. In this regard, semantics focuses on the organization of and action upon information by acting as an intermediary between heterogeneous data sources, which may conflict not only by structure but also context or value.

Data integration

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example) domains. Data integration appears with increasing frequency as the volume (that is, big data) and the need to share existing data explodes.

Official source

https://en.wikipedia.org/wiki/Semantic_heterogeneity

About this result

Related courses (1)

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

Related lectures (7)

Distributed Information Systems: Overview and Challenges

Covers Distributed Information Systems challenges, including autonomy, heterogeneity, trust evaluation, and privacy protection.

Semantic Web: Overview and Tools

Explores the Semantic Web, ontologies, schema mapping, RDF, and knowledge graphs.

Data Warehouses and Decision Support Systems

Explores data warehouses, decision support systems, OLAP, data lakes, multidimensional data models, and query optimizations.