Named-entity recognitionNamed-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp.
Anaphora (linguistics)In linguistics, anaphora (əˈnæfərə) is the use of an expression whose interpretation depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression that depends specifically upon an antecedent expression and thus is contrasted with cataphora, which is the use of an expression that depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally.
Pro-formIn linguistics, a pro-form is a type of function word or expression that stands in for (expresses the same content as) another word, phrase, clause or sentence where the meaning is recoverable from the context. They are used either to avoid repetitive expressions or in quantification (limiting the variables of a proposition). Pro-forms are divided into several categories, according to which part of speech they substitute: A pronoun substitutes a noun or a noun phrase, with or without a determiner: it, this.
Recurrent neural networkA recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.
Document classificationDocument classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science.
Statistical classificationIn statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient (sex, blood pressure, presence or absence of certain symptoms, etc.). Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or features.
Binding (linguistics)In linguistics, binding is the phenomenon in which anaphoric elements such as pronouns are grammatically associated with their antecedents. For instance in the English sentence "Mary saw herself", the anaphor "herself" is bound by its antecedent "Mary". Binding can be licensed or blocked in certain contexts or syntactic configurations, e.g. the pronoun "her" cannot be bound by "Mary" in the English sentence "Mary saw her". While all languages have binding, restrictions on it vary even among closely related languages.
PluralThe plural (sometimes abbreviated as pl., pl, or ), in many languages, is one of the values of the grammatical category of number. The plural of a noun typically denotes a quantity greater than the default quantity represented by that noun. This default quantity is most commonly one (a form that represents this default quantity of one is said to be of singular number). Therefore, plurals most typically denote two or more of something, although they may also denote fractional, zero or negative amounts.
Self-supervised learningSelf-supervised learning (SSL) is a paradigm in machine learning for processing data of lower quality, rather than improving ultimate outcomes. Self-supervised learning more closely imitates the way humans learn to classify objects. The typical SSL method is based on an artificial neural network or other model such as a decision list. The model learns in two steps. First, the task is solved based on an auxiliary or pretext classification task using pseudo-labels which help to initialize the model parameters.
Grammatical numberIn linguistics, grammatical number is a feature of nouns, pronouns, adjectives and verb agreement that expresses count distinctions (such as "one", "two" or "three or more"). English and other languages present number categories of singular or plural, both of which are cited by using the hash sign (#) or by the numero signs "No." and "Nos." respectively. Some languages also have a dual, trial and paucal number or other arrangements.