Publication

Quality-aware similarity assessment for entity matching in Web data

Related publications (72)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

idMesh: graph-based disambiguation of linked data

Karl Aberer, Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost

We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities -- represented by globally identifiable declarative artifacts -- self-organize in a dynamic and probabilistic manner. Our solution has the ...

ACM2009

A Comparison of Techniques for Sampling Web Pages

Monika Henzinger, Eda Baykan

As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to resort to other techniques like sampling to determine the properties of the ...

2009

Web page language identification based on URLs

Monika Henzinger, Ingmar Weber, Eda Baykan

Given only the URL of a web page, can we identify its language? This is the question that we examine in this paper. Such a language classifier is, for example, useful for crawlers of web search engines, which frequently try to satisfy certain language quot ...

2008

From Web 1.0 to Web 2.0 and back - How did your Grandma use to tag?

Karl Aberer, Sebastian Michel, Gleb Skobeltsyn, Adriana Budura

We consider the applicability of terms extracted from anchortext as a source of Web page descriptions in the form of tags. With a relatively simple and easy-to-use method, we show that anchortext significantly overlaps with tags obtained from the popular t ...

2008

Distributed link-based ranking in P2P Web retrieval

Jie Wu

One of the main differences between modern search engines and traditional ones is the adoption of link-based ranking algorithm in ordering Web documents. Google has claimed that it is its link-based ranking algorithm, PageRank that has made the quality of ...

EPFL2006

Indexation de Documents Manuscrits

Alessandro Vinciarelli

Les systèmes de reconnaissance automatique de l'écriture permettent de transfomer des collections de documents manuscrits en archives de documents numériques. L'avantage n'est pas tellement la réduction de l'espace nécéssaire pour stoquer les données, mais ...

2006

Indexation de Documents Manuscrits

Alessandro Vinciarelli

IDIAP2006

Cosadoca, Consortium de sauvetage du patrimoine documentaire en cas de catastrophe : un site web pour la sauvegarde du patrimoine documentaire

En 2003, trois institutions confrontées aux problèmes posés par la sauvegarde du patrimoine documentaire et situées sur le même site géographique se sont regroupées au sein du Consortium de sauvetage du patrimoine documentaire en cas de catastrophe (COSADO ...

2005

Using a Layered Markov Model for Decentralized Web Ranking

Karl Aberer, Jie Wu

The link structure of the Web graph is used in algorithms such as Kleinberg's HITS and Google's PageRank to assign authoritative weights to Web pages and thus rank them. In HITS, a solid theoretical model is lacking and the algorithm often leads to non-uni ...

2004

Thematic Annotation: extracting concepts out of documents

Martin Rajman

Semantic document annotation may be useful for many tasks. In particular, in the framework of the MDM project(http://www.issco.unige.ch/projects/im2/mdm/), topical annotation -- i.e. the annotation of document segments with tags identifying the topics disc ...

2004