Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.
The current information landscape is characterised by a vast amount of relatively semantically homogeneous, when observed in isolation, data silos that are, however, drastically semantically fragmented when considered as a whole. Within each data silo, information can be harvested without the risk of misinterpretation due to conforming to the same ontology that formally defines the types and relations in the application domain. Nonetheless, when data are retrieved from multiple and heterogeneous data silos, special consideration is required to ensure a common and uniform interpretation. Establishing semantic bridges across semantically heterogeneous data silos, i.e., align the corresponding ontologies, becomes, thus, crucial. At the same time, there is an exponential increase in the number of data as well as in the number of heterogeneous data silos. It becomes apparent that the exponentially increasing information landscape prohibits manual curation strategies and illustrates the importance of an automatic computational approach that relies less on human expertise and intervention. The focal point of this thesis is to build semantic bridges across heterogeneous data silos. We first focus on discovering equivalence relations between entities appearing in different ontologies. We propose to approach the problem of ontology alignment from a representation learning perspective. We demonstrate that by exploiting transfer learning we can overcome the main obstacles, i.e., the small sample size and the serious class imbalance problem, that hinder the application of machine learning to the problem. Our approach is based on learning terminological embeddings so as they are implicitly tailored to the task of ontology alignment. We compare our proposed methods to state-of-the-art systems based on feature engineering using a plethora of evaluation benchmarks. We present significant performance improvements and we demonstrate the advantages that representation learning brings to the problem of ontology alignment. Subsequently, we focus on discovering general relations existing between entities appearing in the same ontology or knowledge base. This problem is known under the terms knowledge base completion and link prediction. We examine the contribution of geometrical space to this problem. We focus on the family of translational models that, despite showing a lagging performance on certain datasets, allow to effectively represent certain families of rules. We extend these models to the hyperbolic space to better reflect the topological properties of knowledge bases. We empirically show, using a variety of link prediction datasets, that hyperbolic space allows to narrow down significantly the performance gap between translational and bilinear models; illustrating that the lagging performance of translational models is not an intrinsic characteristic of them. Finally, we demonstrate a new promising direction for developing models that, although not fully expressive, allow to better represent certain rules. In summary, this thesis proposes new ways to approach the problems of ontology alignment and link prediction in the setting of representation learning. It advances beyond the state-of- the-art methods in a multitude of different ways. It also serves to strengthen our understanding of the role of geometrical space for relation prediction and to illustrate prominent directions for performing more fine-grained reasoning tasks in the embedding space.
Sarah Irene Brutton Kenderdine, Yumeng Hou