Utilisation de PLSI en recherche d'information

The PLSI model (“Probabilistic Latent Semantic Indexing”) offers a document indexing scheme based on probabilistic latent category models. It entailed applications in diverse ﬁelds, notably in information retrieval (IR). Nevertheless, PLSI cannot process documents not seen during parameter inference, a major liability for queries in IR. A method known as “folding-in” allows to circumvent this problem up to a point, but has its own weaknesses. The present paper introduces a new document-query similarity measure for PLSI based on language models that entirely avoids the problem a query projection. We compare this similarity to Fisher kernels, the state of the art similarities for PLSI. Moreover, we present an evaluation of PLSI on a particularly large training set of almost 7500 document and over one million term occurrence large, created from the TREC–AP collection.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Utilisation de PLSI en recherche d'information

Graph Chatbot

Chat with Graph Search

Data Summarization with Social Contexts

N-gram-Based Low-Dimensional Representation for Document Classification

Keyword Extraction and Clustering for Document Recommendation in Conversations

N-gram-Based Low-Dimensional Representation for Document Classification

Keyword Extraction and Clustering for Document Recommendation in Conversations

Data Summarization with Social Contexts