Ragel is a finite-state machine compiler and a parser generator. Initially Ragel supported output for C, C++ and Assembly source code, and was expanded to support several other languages including Objective C, D, Go, Ruby, and Java. Additional language support is also in development. It supports the generation of table or control flow driven state machines from regular expressions and/or state charts and can also build lexical analysers via the longest-match method. Ragel specifically targets text parsing and input validation.
Ragel supports the generation of table or control flow driven state machines from regular expressions and/or state charts and can also build lexical analysers via the longest-match method.
A unique feature of Ragel is that user actions can be associated with arbitrary state machine transitions using operators that are integrated into the regular expressions. Ragel also supports visualization of the generated machine via graphviz.
The above graph represents a state-machine that takes user input as a series of bytes representing ASCII characters and control codes. 48..57 is equivalent to the regular expression [0-9] (i.e. any digit), so only sequences beginning with a digit can be recognised. If 10 (line feed) is encountered, we're done. 46 is the decimal point ('.'), 43 and 45 are positive and negative signs ('+', '-') and 69/101 is uppercase/lowercase 'e' (to indicate a number in scientific format). As such it will recognize the following properly:
2
45
055
78.1
2e5
78.3e12
69.0e-3
3e+3
but not:
3
46.
5
3.e2
2e5.1
Ragel's input is a regular expression only in the sense that it describes a regular language; it is usually not written in a concise regular expression, but written out into multiple parts like in Extended Backus–Naur form. For example, instead of supporting POSIX character classes in regex syntax, Ragel implements them as built-in production rules. As with usual parser generators, Ragel allows for handling code for productions to be written with the syntax.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Ragel is a finite-state machine compiler and a parser generator. Initially Ragel supported output for C, C++ and Assembly source code, and was expanded to support several other languages including Objective C, D, Go, Ruby, and Java. Additional language support is also in development. It supports the generation of table or control flow driven state machines from regular expressions and/or state charts and can also build lexical analysers via the longest-match method. Ragel specifically targets text parsing and input validation.
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for large language model's data preprocessing, that encode text into numerical tokens, using byte pair encoding.