Publication

Text Similarity in Vector Space Models: A Comparative Study

Related publications (52)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

This paper examines how the European press dealt with the no-vax reactions against the Covid-19 vaccine and the dis- and misinformation associated with this movement. Using a curated dataset of 1786 articles from 19 European newspapers on the anti-vaccine ...

ASSOC COMPUTING MACHINERY2023

Natural Language Processing (NLP) driven categorisation and detection of discourse in historical US patents

Jérôme Baudry, Nicolas Christophe Chachereau, Bhargav Srinivasa Desikan, Prakhar Gupta

Patents have traditionally been used in the history of technology as an indication of the thinking process of the inventors, of the challenges or “reverse salients” they faced, or of the social groups influencing the construction of technology. More recent ...

2022

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Vinitra Swamy, Thiemo Wambsganss

Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in di ...

2022

Robustness, replicability and scalability in topic modelling

Orion B Penner

Approaches for estimating the similarity between individual publications are an area of long -standing interest in the scientometrics and informetrics communities. Traditional techniques have generally relied on references and other metadata, while text mi ...

ELSEVIER2022

Learning computationally efficient static word and sentence representations

Prakhar Gupta

Most of the Natural Language Processing (NLP) algorithms involve use of distributed vector representations of linguistic units (primarily words and sentences) also known as embeddings in one way or another. These embeddings come in two flavours namely, sta ...

EPFL2021

Further results on latent discourse models and word embeddings

Youssef Allouah

We discuss some properties of generative models for word embeddings. Namely, (Arora et al., 2016) proposed a latent discourse model implying the concentration of the partition function of the word vectors. This concentration phenomenon led to an asymptotic ...

MICROTOME PUBL2021

The Unstoppable Rise of Computational Linguistics in Deep Learning

James Henderson

In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of v ...

Association for Computational Linguistics2020

Multi-scale sequential network for semantic text segmentation and localization

Jean-Marc Odobez, Olivier Canévet, Michael Villamizar

We present a novel method for semantic text document analysis which in addition to localizing text it labels the text in user-defined semantic categories. More precisely, it consists of a fully-convolutional and sequential network that we apply to the part ...

2020

Evolution of Topics and Novelty in Science

Orion B Penner

Methods of estimating the similarity between individual publications is an area of long-standing interest in the scientometrics community. Traditional methods have generally relied on references and other metadata, while text mining approaches based on tit ...

INT SOC SCIENTOMETRICS & INFORMETRICS-ISSI2019

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

Martin Jaggi, Robert West, Martin Josifoski, Ivan Paskov

There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding doc ...

2019