Knowledge extractionKnowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, s) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema.
Information extractionInformation extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction Due to the difficulty of the problem, current approaches to IE (as of 2010) focus on narrowly restricted domains.
Entity linkingIn natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris".
CoreferenceIn linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in Bill said Alice would arrive soon, and she did, the words Alice and she refer to the same person. Co-reference is often non-trivial to determine. For example, in Bill said he would come, the word he may or may not refer to Bill.
Personal pronounPersonal pronouns are pronouns that are associated primarily with a particular grammatical person – first person (as I), second person (as you), or third person (as he, she, it, they). Personal pronouns may also take different forms depending on number (usually singular or plural), grammatical or natural gender, case, and formality. The term "personal" is used here purely to signify the grammatical sense; personal pronouns are not limited to people and can also refer to animals and objects (as the English personal pronoun it usually does).
He (pronoun)In Modern English, he is a singular, masculine, third-person pronoun. In Standard Modern English, he has four shapes representing five distinct word forms: he: the nominative (subjective) form him: the accusative (objective) form (also called the oblique case) his: the dependent and independent genitive (possessive) forms himself: the reflexive form Old English had a single third-person pronoun — from the Proto-Germanic demonstrative base *khi-, from PIE *ko- "this" — which had a plural and three genders in the singular.
Named-entity recognitionNamed-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp.
Binding (linguistics)In linguistics, binding is the phenomenon in which anaphoric elements such as pronouns are grammatically associated with their antecedents. For instance in the English sentence "Mary saw herself", the anaphor "herself" is bound by its antecedent "Mary". Binding can be licensed or blocked in certain contexts or syntactic configurations, e.g. the pronoun "her" cannot be bound by "Mary" in the English sentence "Mary saw her". While all languages have binding, restrictions on it vary even among closely related languages.
Natural language processingNatural language processing (NLP) is an interdisciplinary subfield of linguistics and computer science. It is primarily concerned with processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.
Reflexive pronounA reflexive pronoun is a pronoun that refers to another noun or pronoun (its antecedent) within the same sentence. In the English language specifically, a reflexive pronoun will end in -self or -selves, and refer to a previously named noun or pronoun (myself, yourself, ourselves, themselves, etc.). English intensive pronouns, used for emphasis, take the same form. In generative grammar, a reflexive pronoun is an anaphor that must be bound by its antecedent (see binding).