Statistical machine translationStatistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory.
General-purpose modelingGeneral-purpose modeling (GPM) is the systematic use of a general-purpose modeling language to represent the various facets of an object or a system. Examples of GPM languages are: The Unified Modeling Language (UML), an industry standard for modeling software-intensive systems EXPRESS, a data modeling language for product data, standardized as ISO 10303-11 IDEF, a group of languages from the 1970s that aimed to be neutral, generic and reusable Gellish, an industry standard natural language oriented modeling language for storage and exchange of data and knowledge, published in 2005 XML, a data modeling language now beginning to be used to model code (MetaL, Microsoft .
Predictive textPredictive text is an input technology used where one key or button represents many letters, such as on the physical numeric keypads of mobile phones and in accessibility technologies. Each key press results in a prediction rather than repeatedly sequencing through the same group of "letters" it represents, in the same, invariable order. Predictive text could allow for an entire word to be input by single keypress. Predictive text makes efficient use of fewer device keys to input writing into a text message, an e-mail, an address book, a calendar, and the like.
Random fieldIn physics and mathematics, a random field is a random function over an arbitrary domain (usually a multi-dimensional space such as ). That is, it is a function that takes on a random value at each point (or some other domain). It is also sometimes thought of as a synonym for a stochastic process with some restriction on its index set. That is, by modern definitions, a random field is a generalization of a stochastic process where the underlying parameter need no longer be real or integer valued "time" but can instead take values that are multidimensional vectors or points on some manifold.
Text segmentationText segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
Random sample consensusRandom sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed.
Plain textIn computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters.
Connected-component labelingConnected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected-component labeling is not to be confused with . Connected-component labeling is used in computer vision to detect connected regions in s, although s and data with higher dimensionality can also be processed.
Semantic role labelingIn natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result. It serves to find the meaning of the sentence. To do this, it detects the arguments associated with the predicate or verb of a sentence and how they are classified into their specific roles. A common example is the sentence "Mary sold the book to John.
Markov blanketIn statistics and machine learning, when one wants to infer a random variable with a set of variables, usually a subset is enough, and other variables are useless. Such a subset that contains all the useful information is called a Markov blanket. If a Markov blanket is minimal, meaning that it cannot drop any variable without losing information, it is called a Markov boundary. Identifying a Markov blanket or a Markov boundary helps to extract useful features. The terms of Markov blanket and Markov boundary were coined by Judea Pearl in 1988.