Publication

A Multitask Learning Approach to Document Representation using Unlabeled Data

Related publications (38)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Machine Learning for Information Retrieval

David Grangier

In this thesis, we explore the use of machine learning techniques for information retrieval. More specifically, we focus on ad-hoc retrieval, which is concerned with searching large corpora to identify the documents relevant to user queries. Thisidentifica ...

IDIAP2008

Machine learning approaches to text representation using unlabeled data

Mikaela Keller

With the rapid expansion in the use of computers for producing digitalized textual documents, the need of automatic systems for organizing and retrieving the information contained in large databases has become essential. In general, information retrieval s ...

EPFL2008

SOM-based Clustering of Multilingual Documents Using an Ontology

Minh Hai Pham

Clustering similar documents is a difficult task for text data mining. Difficulties stem especially from the way documents are translated into numerical vectors. In this chapter, we will present a method that uses Self Organizing Map (SOM) to cluster medic ...

Information Science Reference2008

Unsupervised Learning for Information Distillation

Kamand Kamangar

Current document archives are enormously large and constantly increasing and that makes it practically impossible to make use of them efficiently. To analyze and interpret large volumes of speech and text of these archives in multiple languages and produce ...

IDIAP2007

A Thousand Words in a Scene

Daniel Gatica-Perez, Jean-Marc Odobez, Pedro Manuel Da Silva Quelhas

This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like \emph{bag-of-visterms} represent ...

2007

Machine Learning Approaches to Text Representation using Unlabeled Data

Mikaela Keller

Ecole Polytechnique Fédérale de Lausanne2006

Machine Learning Approaches to Text Representation using Unlabeled Data

Mikaela Keller

IDIAP2006

Automatic genre classification of music content

Giorgio Zoia, Nicolas Scaringella

This paper reviews the state-of-the-art in automatic genre classification of music collections through three main paradigms: expert systems, unsupervised classification, and supervised classification. The paper discusses the importance of music genres with ...

2006

A Thousand Words in a Scene

Daniel Gatica-Perez, Jean-Marc Odobez, Pedro Manuel Da Silva Quelhas

IDIAP2005

Noisy Text Categorization

Alessandro Vinciarelli

This work presents a system for the categorization of noisy texts. By noisy it is meant any text obtained through an extraction process (affected by errors) from media different than digital texts. We show that, even with an average Word Error Rate of arou ...

2004