Natural languageIn neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that occurs naturally in a human community by a process of use, repetition, and change without conscious planning or premeditation. It can take different forms, namely either a spoken language or a sign language. Natural languages are distinguished from constructed and formal languages such as those used to program computers or to study logic. Natural language can be broadly defined as different from artificial and constructed languages, e.
Natural language generationNatural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information". While it is widely agreed that the output of any NLG process is text, there is some disagreement about whether the inputs of an NLG system need to be non-linguistic.
Subject–verb–object word orderIn linguistic typology, subject–verb–object (SVO) is a sentence structure where the subject comes first, the verb second, and the object third. Languages may be classified according to the dominant sequence of these elements in unmarked sentences (i.e., sentences in which an unusual word order is not used for emphasis). English is included in this group. An example is "Sam ate yogurt." SVO is the second-most common order by number of known languages, after SOV. Together, SVO and SOV account for more than 87% of the world's languages.
Verb–object–subject word orderIn linguistic typology, a verb–object–subject or verb–object–agent language, which is commonly abbreviated VOS or VOA, is one in which most sentences arrange their elements in that order. That would be the equivalent in English to "Drank cocktail Sam." The relatively rare default word order accounts for only 3% of the world's languages. It is the fourth-most common default word order among the world's languages out of the six.
GrammaticalityIn linguistics, grammaticality is determined by the conformity to language usage as derived by the grammar of a particular speech variety. The notion of grammaticality rose alongside the theory of generative grammar, the goal of which is to formulate rules that define well-formed, grammatical, sentences. These rules of grammaticality also provide explanations of ill-formed, ungrammatical sentences.
Code-switchingIn linguistics, code-switching or language alternation occurs when a speaker alternates between two or more languages, or language varieties, in the context of a single conversation or situation. Code-switching is different from plurilingualism in that plurilingualism refers to the ability of an individual to use multiple languages, while code-switching is the act of using multiple languages together. Multilinguals (speakers of more than one language) sometimes use elements of multiple languages when conversing with each other.
Linguistic performanceThe term linguistic performance was used by Noam Chomsky in 1960 to describe "the actual use of language in concrete situations". It is used to describe both the production, sometimes called parole, as well as the comprehension of language. Performance is defined in opposition to "competence"; the latter describes the mental knowledge that a speaker or listener has of language. Part of the motivation for the distinction between performance and competence comes from speech errors: despite having a perfect understanding of the correct forms, a speaker of a language may unintentionally produce incorrect forms.
Sentence embeddingIn natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information. State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token preprended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks.