Publication

Text Similarity in Vector Space Models: A Comparative Study

Kenneth Younge, Omid Shahmirzadi, Adam Lugowski
2018
Conference paper

Abstract

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Official source

https://infoscience.epfl.ch/record/274170?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Ontological neighbourhood

Information engineering

Natural language processing: Topics in natural language processing

Related concepts (33)

Related publications (52)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Text Similarity in Vector Space Models: A Comparative Study

Graph Chatbot

Chat with Graph Search

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

Natural Language Processing (NLP) driven categorisation and detection of discourse in historical US patents

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Natural Language Processing (NLP) driven categorisation and detection of discourse in historical US patents

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework