PolysemyPolysemy (pəˈlɪsᵻmi or ˈpɒlᵻˌsiːmi; ) is the capacity for a sign (e.g. a symbol, a morpheme, a word, or a phrase) to have multiple related meanings. For example, a word can have several word senses. Polysemy is distinct from monosemy, where a word has a single meaning. Polysemy is distinct from homonymy—or homophony—which is an accidental similarity between two or more words (such as bear the animal, and the verb bear); whereas homonymy is a mere linguistic coincidence, polysemy is not.
Europarl CorpusThe Europarl Corpus is a corpus (set of documents) that consists of the proceedings of the European Parliament from 1996 to 2012. In its first release in 2001, it covered eleven official languages of the European Union (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, and Swedish). With the political expansion of the EU the official languages of the ten new member states have been added to the corpus data.
Natural language processingNatural language processing (NLP) is an interdisciplinary subfield of linguistics and computer science. It is primarily concerned with processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.
DiscourseDiscourse is a generalization of the notion of a conversation to any form of communication. Discourse is a major topic in social theory, with work spanning fields such as sociology, anthropology, continental philosophy, and discourse analysis. Following pioneering work by Michel Foucault, these fields view discourse as a system of thought, knowledge, or communication that constructs our experience of the world. Since control of discourse amounts to control of how the world is perceived, social theory often studies discourse as a window into power.
Word orderIn linguistics, word order (also known as linear order) is the order of the syntactic constituents of a language. Word order typology studies it from a cross-linguistic perspective, and examines how different languages employ different orders. Correlations between orders found in different syntactic sub-domains are also of interest. The primary word orders that are of interest are the constituent order of a clause, namely the relative order of subject, object, and verb; the order of modifiers (adjectives, numerals, demonstratives, possessives, and adjuncts) in a noun phrase; the order of adverbials.
Minimal pairIn phonology, minimal pairs are pairs of words or phrases in a particular language, spoken or signed, that differ in only one phonological element, such as a phoneme, toneme or chroneme, and have distinct meanings. They are used to demonstrate that two phones represent two separate phonemes in the language. Many phonologists in the middle part of the 20th century had a strong interest in developing techniques for discovering the phonemes of unknown languages, and in some cases, they set up writing systems for the languages.
Speech translationSpeech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.
Entity linkingIn natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris".
Compound (linguistics)In linguistics, a compound is a lexeme (less precisely, a word or sign) that consists of more than one stem. Compounding, composition or nominal composition is the process of word formation that creates compound lexemes. Compounding occurs when two or more words or signs are joined to make a longer word or sign. A compound that uses a space rather than a hyphen or concatenation is called an open compound or a spaced compound; the alternative is a closed compound.
Source textA source text is a text (sometimes oral) from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language. In historiography, distinctions are commonly made between three kinds of source texts: Primary source Primary sources are firsthand written accounts made at the time of an event by someone who was present. They have been described as those sources closest to the origin of the information or idea under study.