Cosine similarity

Related courses (3)

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

CS-401: Applied data analysis

This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat

CS-431: Introduction to natural language processing

The objective of this course is to present the main models, formalisms and algorithms necessary for the development of applications in the field of natural language information processing. The concept

Related lectures (34)

Word Embeddings: Lab Session

Covers the implementation of a basic search engine using word embeddings and cosine similarity.

Vector Space Retrieval Exercise

Covers TF-IDF computation, document vectors, cosine similarity, and precision formulas.

Word Embeddings: Modeling Word Context and Similarity

Covers word embeddings, modeling word context and similarity in a low-dimensional space.

Related publications (32)

Robustness, replicability and scalability in topic modelling

Orion B Penner

Approaches for estimating the similarity between individual publications are an area of long -standing interest in the scientometrics and informetrics communities. Traditional techniques have generally relied on references and other metadata, while text mi ...

ELSEVIER2022

Evolution of Topics and Novelty in Science

Orion B Penner

Methods of estimating the similarity between individual publications is an area of long-standing interest in the scientometrics community. Traditional methods have generally relied on references and other metadata, while text mining approaches based on tit ...

INT SOC SCIENTOMETRICS & INFORMETRICS-ISSI2019

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

Martin Jaggi, Robert West, Martin Josifoski, Ivan Paskov

There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding doc ...

2019

Related people (5)

Jean-Cédric Chappelier, David Grangier, Emmanuel Eckard, Guillermo Aradilla, Orion B Penner

Related units (1)

L'IDIAP Laboratory

Related concepts (4)

Word2vec

Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector.

Latent semantic analysis

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis).

Information retrieval

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Robustness, replicability and scalability in topic modelling

Evolution of Topics and Novelty in Science

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

Graph Chatbot

Chat with Graph Search

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

Evolution of Topics and Novelty in Science

Robustness, replicability and scalability in topic modelling