In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.
The output of a word-sense induction algorithm is a clustering of contexts in which the target word occurs or a clustering of words related to the target word. Three main methods have been proposed in the literature:
Context clustering
Word clustering
Co-occurrence graphs
The underlying hypothesis of this approach is that, words are semantically similar if they appear in similar documents, with in similar context windows, or in similar syntactic contexts. Each occurrence of a target word in a corpus is represented as a context vector. These context vectors can be either first-order vectors, which directly represent the context at hand, or second-order vectors, i.e., the contexts of the target word are similar if their words tend to co-occur together. The vectors are then clustered into groups, each identifying a sense of the target word. A well-known approach to context clustering is the Context-group Discrimination algorithm based on large matrix computation methods.
Word clustering is a different approach to the induction of word senses. It consists of clustering words, which are semantically similar and can thus bear a specific meaning. Lin’s algorithm is a prototypical example of word clustering, which is based on syntactic dependency statistics, which occur in a corpus to produce sets of words for each discovered sense of a target word. The Clustering By Committee (CBC) also uses syntactic contexts, but exploits a similarity matrix to encode the similarities between words and relies on the notion of committees to output different senses of the word of interest.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.
Ce cours donne aux étudiant-e-s les connaissances de base nécessaires pour comprendre les dimensions juridiques de leur activité professionnelle concernant l'aménagement du territoire et la protection
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
Explores the distinction between Eulerian and Lagrangian descriptions of fluid flow through velocity field concepts and different types of lines visualization.
SWICE (Sustainable Wellbeing for the Individual and the Collectivity in the Energy transition) aims to answer: how to improve wellbeing for all with a much lower energy use? Wellbeing is a state of thriving, which involves full participation in society, a ...
2023
It is a generally accepted idea that typology is an essential element in the disciplinary dimension of architecture. The concept of typology, in its most common definition, is sufficiently malleable to cover a wide range of uses, but it is also this vaguen ...
It is a generally accepted idea that typology is an essential element in the disciplinary dimension of architecture. The concept of typology, in its most common definition, is sufficiently malleable to cover a wide range of uses, but it is also this vaguen ...