Publication

Indexation de Documents Manuscrits

Publications associées (32)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Some Things You Always Wanted to Know About Web Pages (But Were Too Busy to Ask)

Willy Zwaenepoel, Simon Schubert

The organic growth of the web has led to web sites that exhibit a large variety of properties. We conduct a large- scale study to gain quantitative insights into the browser-side effects of the structure and behavior of thousands of the most popular web si ...

2012

An Agent-Based Focused Crawling Framework for Topic- and Genre-Related Web Document Discovery

Nikolaos Pappas

The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retri ...

IEEE2012

Towards better entity resolution techniques for Web document collections

Karl Aberer, Zoltán Miklós, Surender Reddy Yerva

As person names are non-unique, the same name on different Web pages might or might not refer to the same real-world person. This entity identification problem is one of the most challenging issues in realizing the Semantic Web or entity-oriented search. W ...

1st International Workshop on Data Engineering meets the Semantic Web (DESWeb'2010) (co-located with ICDE'2010)2010

Towards better entity resolution techniques for Web document collections

Karl Aberer, Zoltán Miklós, Surender Reddy Yerva

IEEE2010

Understanding the Web

Eda Baykan

The World Wide Web is one of the most widely used information resources. Understanding the web better will enable us to benefit more of it. In this thesis we develop techniques to learn the properties of the web pages like language and topic using only the ...

EPFL2009

Purely URL-based Topic Classification

Monika Henzinger, Ingmar Weber, Eda Baykan, Ludmila Marian

Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content, but a URL-only classifier is preferable, (i) when speed is crucial, (ii) to enable conte ...

2009

A Comparison of Techniques for Sampling Web Pages

Monika Henzinger, Eda Baykan

As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to resort to other techniques like sampling to determine the properties of the ...

2009

Services surround you: Physical-virtual linkage with contextual bookmarks

Xavier Righetti

Our daily life is pervaded by digital information and devices, not least the common mobile phone. However, a seamless connection between our physical world, such as a movie trailer on a screen in the main rail station and its digital counterparts, such as ...

2008

Physical-virtual linkage with contextual bookmarks

Xavier Righetti

In our everyday life we often see objects or persons and are aware that there are related digital services such as an online ticket service when seeing a poster advertising a concert. Currently it is a rather time consuming activity to find the related inf ...

ACM Press2008

Web page language identification based on URLs

Monika Henzinger, Ingmar Weber, Eda Baykan

Given only the URL of a web page, can we identify its language? This is the question that we examine in this paper. Such a language classifier is, for example, useful for crawlers of web search engines, which frequently try to satisfy certain language quot ...

2008