CaretCaret is the name used familiarly for the character , (the circumflex and a circumflex accent) provided on most QWERTY keyboards by typing . The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreader's caret, a mark used in proofreading to indicate where a punctuation mark, word, or phrase should be inserted into a document. The formal ASCII standard (X3.64.1977) calls it a "circumflex".
String operationsIn computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms. A string is a finite sequence of characters. The empty string is denoted by . The concatenation of two string and is denoted by , or shorter by . Concatenating with the empty string makes no difference: .
Data extractionData extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow. Usually, the term data extraction is applied when (experimental) data is first imported into a computer from primary sources, like measuring or recording devices.
Lemme d'itération pour les langages algébriquesLe lemme d'itération pour les langages algébriques, aussi connu sous le vocable lemme de Bar-Hillel, Perles et Shamir, donne une condition de répétition nécessaire pour les langages algébriques. Sa version simplifiée pour les langages rationnels est le lemme de l'étoile. Une version plus élaborée du lemme d'itération est le lemme d'Ogden. Le lemme indique donc que, dans un langage algébrique, certains facteurs de mots assez longs peuvent être itérés de concert.
Algorithme de ThompsonEn informatique théorique plus précisément en théorie des langages, l'algorithme de Thompson est un algorithme qui, étant donné une expression régulière, crée un automate fini qui reconnaît le langage décrit par cette expression. Il est nommé ainsi d'après Ken Thompson qui l'a décrit en 1968. Le théorème de Kleene affirme que l'ensemble des langages rationnels sur un alphabet est exactement l'ensemble des langages sur reconnaissables par automate fini. Il existe des algorithmes pour passer de l'un à l'autre.
Induction of regular languagesIn computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way (see language identification in the limit), approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction.
Recursive descent parserIn computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes. A predictive parser is a recursive descent parser that does not require backtracking.
MetacharacterA metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine. In POSIX extended regular expressions, there are 14 metacharacters that must be escaped (preceded by a backslash ()) in order to drop their special meaning and be treated literally inside an expression: opening and closing square brackets ([ and ]); backslash (); caret (^); dollar sign ($); period/full stop/dot (.
Construction par sous-ensemblesEn informatique théorique, et notamment en théorie des automates, l'algorithme appelé la construction par sous-ensembles, en anglais « powerset construction » ou « subset construction », est la méthode usuelle pour convertir un automate fini non déterministe (abrégé en « AFN ») en un automate fini déterministe (abrégé en « AFD ») équivalent, c'est-à-dire qui reconnaît le même langage rationnel. L'existence même d'une conversion, et l'existence d'un algorithme pour la réaliser, est remarquable et utile.
Lexical grammarIn computer science, a lexical grammar or lexical structure is a formal grammar defining the syntax of tokens. The program is written using characters that are defined by the lexical structure of the language used. The character set is equivalent to the alphabet used by any written language. The lexical grammar lays down the rules governing how a character sequence is divided up into subsequences of characters, each part of which represents an individual token. This is frequently defined in terms of regular expressions.