Concept

Word2vec

Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that they capture the semantic and syntactic qualities of words; as such, a simple mathematical function (cosine similarity) can indicate the level of semantic similarity between the words represented by those vectors. Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word2vec can utilize either of two model architectures to produce these distributed representations of words: continuous bag-of-words (CBOW) or continuous skip-gram. In both architectures, word2vec considers both individual words and a sliding window of context words surrounding individual words as it iterates over the entire corpus. In the continuous bag-of-words architecture, the model predicts the current word from the window of surrounding context words. The order of context words does not influence prediction (bag-of-words assumption). In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words. According to the authors' note, CBOW is faster while skip-gram does a better job for infrequent words.

Official source

https://en.wikipedia.org/wiki/Word2vec

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Word2vec

Graph Chatbot

Chat with Graph Search

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

Learnable latent embeddings for joint behavioural and neural analysis

4M: Massively Multimodal Masked Modeling

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework

Learnable latent embeddings for joint behavioural and neural analysis

4M: Massively Multimodal Masked Modeling