Statistical machine translationStatistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory.
General-purpose modelingGeneral-purpose modeling (GPM) is the systematic use of a general-purpose modeling language to represent the various facets of an object or a system. Examples of GPM languages are: The Unified Modeling Language (UML), an industry standard for modeling software-intensive systems EXPRESS, a data modeling language for product data, standardized as ISO 10303-11 IDEF, a group of languages from the 1970s that aimed to be neutral, generic and reusable Gellish, an industry standard natural language oriented modeling language for storage and exchange of data and knowledge, published in 2005 XML, a data modeling language now beginning to be used to model code (MetaL, Microsoft .
Saisie intuitiveLa saisie ou frappe intuitive ou prédictive (de l'anglais predictive text), appelée aussi le dictionnaire, est une technologie conçue afin de simplifier la saisie de texte sur les claviers téléphoniques. Initialement brevetée aux États-Unis en 1985 en tant que méthode de communication téléphonique avec les sourds, la saisie intuitive n'a connu que des applications limitées avant l'avènement de la téléphonie mobile et de son service de messages textuels, où elle a trouvé son utilité principale.
Random fieldIn physics and mathematics, a random field is a random function over an arbitrary domain (usually a multi-dimensional space such as ). That is, it is a function that takes on a random value at each point (or some other domain). It is also sometimes thought of as a synonym for a stochastic process with some restriction on its index set. That is, by modern definitions, a random field is a generalization of a stochastic process where the underlying parameter need no longer be real or integer valued "time" but can instead take values that are multidimensional vectors or points on some manifold.
Text segmentationText segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
RANSACRANSAC, abréviation pour RANdom SAmple Consensus, est une méthode pour estimer les paramètres de certains modèles mathématiques. Plus précisément, c'est une méthode itérative utilisée lorsque l'ensemble de données observées peut contenir des valeurs aberrantes (outliers). Il s'agit d'un algorithme non-déterministe dans le sens où il produit un résultat correct avec une certaine probabilité seulement, celle-ci augmentant à mesure que le nombre d'itérations est grand. L'algorithme a été publié pour la première fois par Fischler et Bolles en 1981.
Texte brutLe texte brut, ou pur ou simple, traduction de l'anglais plain text, est une notion liée à la représentation du texte utilisée entre dispositifs électroniques.
Connected-component labelingConnected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected-component labeling is not to be confused with . Connected-component labeling is used in computer vision to detect connected regions in s, although s and data with higher dimensionality can also be processed.
Semantic role labelingIn natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result. It serves to find the meaning of the sentence. To do this, it detects the arguments associated with the predicate or verb of a sentence and how they are classified into their specific roles. A common example is the sentence "Mary sold the book to John.
Couverture de MarkovEn apprentissage automatique, la couverture de Markov pour un nœud d'un réseau bayésien est l'ensemble des nœuds composés des parents de , de ses enfants et des parents de ses enfants. Dans un réseau de Markov, la couverture de Markov d'un nœud est l'ensemble de ses nœuds voisins. La couverture de Markov peut également être désignée par . Chaque ensemble de nœuds dans le réseau est conditionnellement indépendant de lorsqu'il est conditionné sur l'ensemble , c'est-à-dire lorsqu'elle est déterminée sur la couverture de Markov du nœud .