In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context. The output of a word-sense induction algorithm is a clustering of contexts in which the target word occurs or a clustering of words related to the target word. Three main methods have been proposed in the literature: Context clustering Word clustering Co-occurrence graphs The underlying hypothesis of this approach is that, words are semantically similar if they appear in similar documents, with in similar context windows, or in similar syntactic contexts. Each occurrence of a target word in a corpus is represented as a context vector. These context vectors can be either first-order vectors, which directly represent the context at hand, or second-order vectors, i.e., the contexts of the target word are similar if their words tend to co-occur together. The vectors are then clustered into groups, each identifying a sense of the target word. A well-known approach to context clustering is the Context-group Discrimination algorithm based on large matrix computation methods. Word clustering is a different approach to the induction of word senses. It consists of clustering words, which are semantically similar and can thus bear a specific meaning. Lin’s algorithm is a prototypical example of word clustering, which is based on syntactic dependency statistics, which occur in a corpus to produce sets of words for each discovered sense of a target word. The Clustering By Committee (CBC) also uses syntactic contexts, but exploits a similarity matrix to encode the similarities between words and relies on the notion of committees to output different senses of the word of interest.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (8)
CS-423: Distributed information systems
This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.
MATH-201: Analysis III
Calcul différentiel et intégral. Eléments d'analyse complexe.
ENV-367: Environmental and construction law
Ce cours donne aux étudiant-e-s les connaissances de base nécessaires pour comprendre les dimensions juridiques de leur activité professionnelle concernant l'aménagement du territoire et la protection
Show more
Related lectures (37)
Linear Operations: Convergence and Sequences
Explores linear operations, convergence, sequences, and compact sets in mathematical analysis.
Environmental Legislation and Construction Law
Covers environmental legislation, sources, principles, and application to construction activities.
Fluid Kinematics: Velocity Field Distinctions
Explores the distinction between Eulerian and Lagrangian descriptions of fluid flow through velocity field concepts and different types of lines visualization.
Show more
Related publications (48)

Climatic Affinities or Type As A Structure Of Relations, Based On Typological Transfer In The Work Of Lacaton & Vassal And Other Similar Precedents

Tiago André Pratas Borges

It is a generally accepted idea that typology is an essential element in the disciplinary dimension of architecture. The concept of typology, in its most common definition, is sufficiently malleable to cover a wide range of uses, but it is also this vaguen ...
2023

Climatic Affinities or Type As A Structure Of Relations Based On Typological Transfer In The Work Of Lacaton & Vassal And Other Similar Precedents

Tiago André Pratas Borges

It is a generally accepted idea that typology is an essential element in the disciplinary dimension of architecture. The concept of typology, in its most common definition, is sufficiently malleable to cover a wide range of uses, but it is also this vaguen ...
2023
Show more
Related concepts (1)
Word-sense disambiguation
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.