Publication

Towards better entity resolution techniques for Web document collections

Karl Aberer, Zoltán Miklós, Surender Reddy Yerva
2010
Report or working paper
Abstract

As person names are non-unique, the same name on different Web pages might or might not refer to the same real-world person. This entity identification problem is one of the most challenging issues in realizing the Semantic Web or entity-oriented search. We address this disambiguation problem, which is very similar to the entity resolution problem studied in relational databases, however there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the persons, moreover the available information is very heterogeneous, thus we are only able to obtain some uncertain evidence about whether two names refer to the same person using similarity functions. These similarity functions capture some aspects of the similarities between Web-pages, where the names occur, thus they perform very differently for the different names. We analyze some data engineering techniques to cope with the limited accuracy of the similarity functions and to combine multiple functions. Even with our simple techniques we could demonstrate systematic performance improvements and produce comparable results to state-of-the-art methods.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.