Concept

Part-of-speech tagging

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, by a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, and because some parts of speech are complex. This is not rare—in natural languages (as opposed to many artificial languages), a large percentage of word-forms are ambiguous. For example, even "dogs", which is usually thought of as just a plural noun, can also be a verb: The sailor dogs the hatch. Correct grammatical tagging will reflect that "dogs" is here used as a verb, not as the more common plural noun. Grammatical context is one way to determine this; semantic analysis can also be used to infer that "sailor" and "hatch" implicate "dogs" as 1) in the nautical context and 2) an action applied to the object "hatch" (in this context, "dogs" is a nautical term meaning "fastens (a watertight door) securely"). Schools commonly teach that there are 9 parts of speech in English: noun, verb, article, adjective, preposition, pronoun, adverb, conjunction, and interjection. However, there are clearly many more categories and sub-categories. For nouns, the plural, possessive, and singular forms can be distinguished.

Official source

https://en.wikipedia.org/wiki/Part-of-speech_tagging

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related courses (2)

CS-431: Introduction to natural language processing

The objective of this course is to present the main models, formalisms and algorithms necessary for the development of applications in the field of natural language information processing. The concept

ME-390: Foundations of artificial intelligence

This course provides the students with 1) a set of theoretical concepts to understand the machine learning approach; and 2) a subset of the tools to use this approach for problems arising in mechanica

Related concepts (10)

Statistical classification

In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient (sex, blood pressure, presence or absence of certain symptoms, etc.). Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or features.

Word-sense disambiguation

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Treebank

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data. The term treebank was coined by linguist Geoffrey Leech in the 1980s, by analogy to other repositories such as a seedbank or bloodbank. This is because both syntactic and semantic structure are commonly represented compositionally as a tree structure.

Official source

https://en.wikipedia.org/wiki/Part-of-speech_tagging

About this result

Related courses (2)

CS-431: Introduction to natural language processing

ME-390: Foundations of artificial intelligence

Related lectures (8)

Tagging (a.k.a. Sequence labeling)

Covers lemmatization, PoS tagging, sequence labeling, and probabilistic PoS tagging using HMMs for performance evaluation.

Part-of-Speech Tagging: Probabilistic Models

Explores Part-of-Speech tagging using probabilistic models like Hidden Markov Models and discusses the resolution of lexical ambiguities.

Named Entity Recognition: Applications and Techniques

Explores Named Entity Recognition, its uses, techniques, and applications in information extraction.

Related publications (21)

Interpreting Language Models Through Knowledge Graph Extraction

Martin Jaggi, Vinitra Swamy, Angeliki Romanou

Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing community and are commonly used as a starting point for downstream tasks. While these models are undeniably useful, it is a c ...

2021

Partially-supervised Mention Detection

James Henderson

Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to de ...

ACL2020

Graph-to-Graph Transformer for Transition-based Dependency Parsing

James Henderson, Alireza Mohammadshahi

We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependenc ...

Association for Computational Linguistics2020

Related concepts (10)

Statistical classification

Word-sense disambiguation

Treebank