Parse tree

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common. Concrete syntax trees reflect the syntax of the input language, making them distinct from the abstract syntax trees used in computer programming. Unlike Reed-Kellogg sentence diagrams used for teaching grammar, parse trees do not use distinct symbol shapes for different types of constituents. Parse trees are usually constructed based on either the constituency relation of constituency grammars (phrase structure grammars) or the dependency relation of dependency grammars. Parse trees may be generated for sentences in natural languages (see natural language processing), as well as during processing of computer languages, such as programming languages. A related c
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related publications (2)

Modèles syntaxiques probabilistes non-génératifs

Antoine Rozenknop

This work deals with models used, or usable in the domain of Automatic Natural Language Processing, when one seeks a syntactic interpretation of a statement. This interpretation can be used as additional information for subsequent treatments, that can aim for instance at producing a semantic representation of the statement. It can also be used as a filter to select utterances belonging to a specific language, among several hypotheses, as done in Automatic Speech Recognition. As the syntactic interpretation of a statement is generally ambiguous with natural languages, the probabilisation of the space of syntactic trees can help in the analysis task : when several analyses are competing, one can then extract the most probable interpretation, or classify interpretations according to their probabilities. We are interested here in the probabilistic versions of Context-Free Grammars (PCFGs) and Substitution Tree Grammar (PTSGs). Syntactic treebanks, which as much as possible account for the language we wish to model, serve as the basis for defining the probabilistic parameters of such grammars. First, we exhibit in this thesis some drawbacks of the usual learning paradigms, due to the use of arbitrary heuristics (STSG DOP model), or to the use of learning criteria that consider these grammars as generative ones (creation of sentences from the grammar) rather than dedicated to analysis (creation of analyses from the sentence). In a second time, we propose new methods for training grammars, based on the traditional Maximum Entropy and Maximum Likelihood criteria. These criteria are instanciated so that they correspond to a syntactic analysis task rather than a language generation task. Specific training algorithms are necessary for their implementation, but traditional algorithms can cope with those models for the task of syntactic analysis. Lastly, we invest the problem of time complexity of syntactic analysis, which is a real issue for the effective use of PTSGs. We describe classes of PTSGs that allow the analysis of a sentence in polynomial complexity. We finally describe a method that enable the extraction of such a PTSG from the set of subtrees of a treebank. The PTSG produced by this method allows us to test our non-generative learning criterium on "realistic" data, and to give a statistical comparison between this criterium and the usual heuristic criterium in term of analysis performance.

Gramatron: Effective Grammar-Aware Fuzzing

Mathias Josef Payer

Fuzzers aware of the input grammar can explore deeper program states using grammar-aware mutations. Existing grammar-aware fuzzers are ineffective at synthesizing complex bug triggers due to: (i) grammars introducing a sampling bias during input generation due to their structure, and (ii) the current mutation operators for parse trees performing localized small-scale changes. Gramatron uses grammar automatons in conjunction with aggressive mutation operators to synthesize complex bug triggers faster. We build grammar automatons to address the sampling bias. It restructures the grammar to allow for unbiased sampling from the input state space. We redesign grammar-aware mutation operators to be more aggressive, i.e., perform large-scale changes. Gramatron can consistently generate complex bug triggers in an efficient manner as compared to using conventional grammars with parse trees. Inputs generated from scratch by Gramatron have higher diversity as they achieve up to 24.2% more coverage relative to existing fuzzers. Gramatron makes input generation 98% faster and the input representations are 24% smaller. Our redesigned mutation operators are 6.4x more aggressive while still being 68% faster at performing these mutations. We evaluate Gramatron across three interpreters with 10 known bugs consisting of three complex bug triggers and seven simple bug triggers against two Nautilus variants. Gramatron finds all the complex bug triggers reliably and faster. For the simple bug triggers, Gramatron outperforms Nautilus four out of seven times. To demonstrate Gramatron's effectiveness in the wild, we deployed Gramatron on three popular interpreters for a 10-day fuzzing campaign where it discovered 10 new vulnerabilities.
Related people

No results

Related units

No results

Related concepts (21)
In linguistics, the grammar of a natural language is its set of structural rules on speakers' or writers' usage and creation of clauses, phrases, and words. The term can also refer to the study of su
In linguistics, syntax (ˈsɪntæks ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical rela
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal
Show more
Related courses (5)
CS-320: Computer language processing
We teach the fundamental aspects of analyzing and interpreting computer languages, including the techniques to build compilers. You will build a working compiler from an elegant functional language into the new web standard for portable binaries called WebAssembly ( )
CS-431: Introduction to natural language processing
The objective of this course is to present the main models, formalisms and algorithms necessary for the development of applications in the field of natural language information processing. The concepts introduced during the lectures will be applied during practical sessions.
EE-608: Deep Learning For Natural Language Processing
The Deep Learning for NLP course provides an overview of neural network based methods applied to text. The focus is on models particularly suited to the properties of human language, such as categorical, unbounded, and structured representations, and very large input and output vocabularies.
Show more