Publication

Text Similarity in Vector Space Models: A Comparative Study

Related publications (52)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Natural Language Processing (Almost) from Scratch

Ronan Collobert, Michael Karlen

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is a ...

2011

How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives

Andrei Popescu-Belis, Thomas Meyer

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on c2 and by counting the number of discourse conne ...

2011

A Speech-based Just-in-Time Retrieval System using Semantic Search

Philip Neil Garner, Andrei Popescu-Belis, Majid Yazdani

The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Intern ...

2011

Fisher Kernels and Probabilistic Latent Semantic Models

Emmanuel Eckard

Tasks that rely on semantic content of documents, notably Information Retrieval and Document Classification, can benefit from a good account of document context, i.e. the semantic association between documents. To this effect, the scheme of latent semantic ...

EPFL2010

Semantic Vector Machines

Vincent Etter

We first present our work in machine translation, during which we used aligned sentences to train a neural network to embed n-grams of different languages into an d-dimensional space, such that n-grams that are the translation of each other are close with ...

2009

Rôle de la matrice d'information et pondération des composantes dans les noyaux de Fisher pour PLSI

Jean-Cédric Chappelier, Emmanuel Eckard

ABSTRACT. An information-geometric approach for document similarities in the framework of “Probabilistic Latent Semantic Indexing” was ﬁrst proposed by T. Hofmann (2000) and later extended (“revisited”) by Nyffenegger et al. (2006). This paper presents an ...

2009

PLSI: The True Fisher Kernel and beyond IID Processes, Information Matrix and Model Identification in PLSI

Jean-Cédric Chappelier, Emmanuel Eckard

The Probabilistic Latent Semantic indexing model, introduced by T. Hofmann (1999), has engendered applications ill numerous fields, notably document classification and information retrieval. In this context, the Fisher kernel was found to be an appropriate ...

Springer-Verlag New York, Ms Ingrid Cunningham, 175 Fifth Ave, New York, Ny 10010 Usa2009

Machine learning for information retrieval

David Grangier

In this thesis, we explore the use of machine learning techniques for information retrieval. More specifically, we focus on ad-hoc retrieval, which is concerned with searching large corpora to identify the documents relevant to user queries. This identific ...

EPFL2008

Machine Learning for Information Retrieval

David Grangier

École Polytechnique Fédérale de Lausanne2008

Machine Learning for Information Retrieval

David Grangier

IDIAP2008