Concept

Word-sense induction

In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context. The output of a word-sense induction algorithm is a clustering of contexts in which the target word occurs or a clustering of words related to the target word. Three main methods have been proposed in the literature: Context clustering Word clustering Co-occurrence graphs The underlying hypothesis of this approach is that, words are semantically similar if they appear in similar documents, with in similar context windows, or in similar syntactic contexts. Each occurrence of a target word in a corpus is represented as a context vector. These context vectors can be either first-order vectors, which directly represent the context at hand, or second-order vectors, i.e., the contexts of the target word are similar if their words tend to co-occur together. The vectors are then clustered into groups, each identifying a sense of the target word. A well-known approach to context clustering is the Context-group Discrimination algorithm based on large matrix computation methods. Word clustering is a different approach to the induction of word senses. It consists of clustering words, which are semantically similar and can thus bear a specific meaning. Lin’s algorithm is a prototypical example of word clustering, which is based on syntactic dependency statistics, which occur in a corpus to produce sets of words for each discovered sense of a target word. The Clustering By Committee (CBC) also uses syntactic contexts, but exploits a similarity matrix to encode the similarities between words and relies on the notion of committees to output different senses of the word of interest.

Source officielle

https://en.wikipedia.org/wiki/Word-sense_induction

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Cours associés (9)

ENV-367: Environmental and construction law

Ce cours donne aux étudiant-e-s les connaissances de base nécessaires pour comprendre les dimensions juridiques de leur activité professionnelle concernant l'aménagement du territoire et la protection

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

EE-724: Human language technology: applications to information access

The Human Language Technology (HLT) course introduces methods and applications for language processing and generation, using statistical learning and neural networks.

Afficher plus

Publications associées (17)

Climatic Affinities or Type As A Structure Of Relations, Based On Typological Transfer In The Work Of Lacaton & Vassal And Other Similar Precedents

Tiago André Pratas Borges

It is a generally accepted idea that typology is an essential element in the disciplinary dimension of architecture. The concept of typology, in its most common definition, is sufficiently malleable to cover a wide range of uses, but it is also this vaguen ...

2023

Climatic Affinities or Type As A Structure Of Relations Based On Typological Transfer In The Work Of Lacaton & Vassal And Other Similar Precedents

Tiago André Pratas Borges

2023

Multi-Robot 3D Gas Distribution Mapping: Coordination, Information Sharing and Environmental Knowledge

Alcherio Martinoli, Chiara Ercolani, Thomas Laurent Peeters

Environmental monitoring and mapping operations are an essential tool to combat climate change. An important branch of this domain concerns the construction of reliable gas maps. Adaptive navigation strategies coupled with multi-robot systems improve the o ...

2023

Afficher plus

Concepts associés (1)

Désambiguïsation lexicale

La désambiguïsation lexicale ou désambigüisation lexicale est la détermination du sens d'un mot dans une phrase lorsque ce mot peut avoir plusieurs sens possibles. Dans la linguistique informatique, la désambiguïsation lexicale est un problème non résolu dans le traitement des langues naturelles et de l'ontologie informatique. La résolution de ce problème permettrait des avancées importantes dans d'autres champs de la linguistique informatique comme l'analyse du discours, l'amélioration de la pertinence des résultats des moteurs de recherche, la résolution des anaphores, la cohérence, l'inférence, etc.

Source officielle

https://en.wikipedia.org/wiki/Word-sense_induction

À propos de ce résultat

Cours associés (9)

ENV-367: Environmental and construction law

CS-423: Distributed information systems

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

EE-724: Human language technology: applications to information access

The Human Language Technology (HLT) course introduces methods and applications for language processing and generation, using statistical learning and neural networks.

Afficher plus

Séances de cours associées (22)

Traitement de texte: Matrice, Documents, Sujets

Explore la gestion du texte, en se concentrant sur les matrices, les documents et les sujets, y compris les défis de la classification des documents et des modèles avancés comme BERT.

Indexation sémantique latente

Couvre l'indexation sémantique latente, l'intégration de mots, et le modèle de skipgram avec un échantillonnage négatif.

Sémantique lexicale

Explore la sémantique lexicale, le sens des mots, les relations sémantiques et WordNet, en mettant en évidence les applications dans l'ingénierie du langage et la récupération d'informations.

Afficher plus

Publications associées (17)

Climatic Affinities or Type As A Structure Of Relations, Based On Typological Transfer In The Work Of Lacaton & Vassal And Other Similar Precedents

Tiago André Pratas Borges

2023

Climatic Affinities or Type As A Structure Of Relations Based On Typological Transfer In The Work Of Lacaton & Vassal And Other Similar Precedents

Tiago André Pratas Borges

2023

Multi-Robot 3D Gas Distribution Mapping: Coordination, Information Sharing and Environmental Knowledge

Alcherio Martinoli, Chiara Ercolani, Thomas Laurent Peeters

2023

Afficher plus

Concepts associés (1)

Désambiguïsation lexicale