In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.
PCCTS 1.00 was announced April 10, 1992.
ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer of that language.
While Version 3 supported generating code in the programming languages
Ada95,
ActionScript,
C,
C#,
Java,
JavaScript,
Objective-C,
Perl,
Python,
Ruby, and Standard ML, Version 4 at present targets
C#,
C++,
Dart,
Java,
JavaScript,
Go,
PHP,
Python (2 and 3),
and Swift.
A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF).
ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate parse trees or abstract syntax trees, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers.
By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler.