Quality-aware similarity assessment for entity matching in Web data
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to resort to other techniques like sampling to determine the properties of the ...
We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities -- represented by globally identifiable declarative artifacts -- self-organize in a dynamic and probabilistic manner. Our solution has the ...
We consider the applicability of terms extracted from anchortext as a source of Web page descriptions in the form of tags. With a relatively simple and easy-to-use method, we show that anchortext significantly overlaps with tags obtained from the popular t ...
Les systèmes de reconnaissance automatique de l'écriture permettent de transfomer des collections de documents manuscrits en archives de documents numériques. L'avantage n'est pas tellement la réduction de l'espace nécéssaire pour stoquer les données, mais ...
Les systèmes de reconnaissance automatique de l'écriture permettent de transfomer des collections de documents manuscrits en archives de documents numériques. L'avantage n'est pas tellement la réduction de l'espace nécéssaire pour stoquer les données, mais ...
The link structure of the Web graph is used in algorithms such as Kleinberg's HITS and Google's PageRank to assign authoritative weights to Web pages and thus rank them. In HITS, a solid theoretical model is lacking and the algorithm often leads to non-uni ...
En 2003, trois institutions confrontées aux problèmes posés par la sauvegarde du patrimoine documentaire et situées sur le même site géographique se sont regroupées au sein du Consortium de sauvetage du patrimoine documentaire en cas de catastrophe (COSADO ...
Given only the URL of a web page, can we identify its language? This is the question that we examine in this paper. Such a language classifier is, for example, useful for crawlers of web search engines, which frequently try to satisfy certain language quot ...
One of the main differences between modern search engines and traditional ones is the adoption of link-based ranking algorithm in ordering Web documents. Google has claimed that it is its link-based ranking algorithm, PageRank that has made the quality of ...
Semantic document annotation may be useful for many tasks. In particular, in the framework of the MDM project(http://www.issco.unige.ch/projects/im2/mdm/), topical annotation -- i.e. the annotation of document segments with tags identifying the topics disc ...