Lexical Markup FrameworkLanguage resource management Lexical markup framework (LMF; ISO 24613:2008), is the International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication. The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources.
Lexical resourceIn digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database. Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an ISO standard for encoding lexical resources, comprising an abstract data model and an XML serialization, and OntoLex-Lemon, an RDF vocabulary for publishing lexical resources as knowledge graphs on the web, e.
Speech perceptionSpeech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language.
LanguageLanguage is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and written forms, and may also be conveyed through sign languages. The vast majority of human languages have developed writing systems that allow for the recording and preservation of the sounds or signs of language. Human language is characterized by its cultural and historical diversity, with significant variations observed between cultures and across time.
Lexical analysisLexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for large language model's data preprocessing, that encode text into numerical tokens, using byte pair encoding.
Lexical semanticsLexical semantics (also known as lexicosemantics), as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word. The units of analysis in lexical semantics are lexical units which include not only words but also sub-words or sub-units such as affixes and even compound words and phrases. Lexical units include the catalogue of words in a language, the lexicon.
Language deathIn linguistics, language death occurs when a language loses its last native speaker. By extension, language extinction is when the language is no longer known, including by second-language speakers, when it becomes known as an extinct language. A related term is linguicide, the death of a language from natural or political causes, and, rarely, glottophagy, the absorption or replacement of a minor language by a major language.
Language processing in the brainIn psycholinguistics, language processing refers to the way humans use words to communicate ideas and feelings, and how such communications are processed and understood. Language processing is considered to be a uniquely human ability that is not produced with the same grammatical understanding or systematicity in even human's closest primate relatives. Throughout the 20th century the dominant model for language processing in the brain was the Geschwind-Lichteim-Wernicke model, which is based primarily on the analysis of brain-damaged patients.
Word-sense disambiguationWord-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
Extinct languageAn extinct language is a language that no longer has any speakers, especially if the language has no living descendants. In contrast, a dead language is one that is no longer the native language of any community, even if it is still in use, like Latin. A dormant language is a dead language that still serves as a symbol of ethnic identity to a particular group. These languages are often undergoing a process of revitalisation. Languages that currently have living native speakers are sometimes called modern languages to contrast them with dead languages, especially in educational contexts.