**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Context-free grammar

Summary

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the form
:A\ \to\ \alpha
with A a single nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empty). Regardless of which symbols surround it, the single nonterminal A on the left hand side can always be replaced by \alpha on the right hand side. This distinguishes it from a context-sensitive grammar, which can have production rules in the form \alpha A \beta \rightarrow \alpha \gamma \beta with A a nonterminal symbol and \alpha, \beta, and \gamma strings of terminal and/or nonterminal symbols.
A formal grammar is essentially a set of production rules that describe all possib

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people

No results

Related publications (6)

Related units

No results

Loading

Loading

Loading

Related concepts (48)

Formal grammar

In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to th

Parsing

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal

Regular expression

A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a match pattern in text. Usually such patterns are used b

Grammar-based compression means to find a small grammar that generates a given object. Such a grammar reveals the structure of the object (according to the grammar formalism used); the main advantage of this compression method is that the resulting grammar can often be used in further computations without prior decompression. A linear time bottom-up algorithm is presented which transforms a tree into a particular context-free tree grammar. For common XML documents the algorithm performs well, compressing the tree structure to about 5 per cent of the original size. The validation of an XML document against an XML type can be done without decompression, in linear time w.r.t. the size of the grammar (for a fixed type). While the involved grammars can be double exponentially smaller than the represented trees, testing them for equivalence can be done in polynomial space w.r.t. the sum of their sizes.

2004This work deals with models used, or usable in the domain of Automatic Natural Language Processing, when one seeks a syntactic interpretation of a statement. This interpretation can be used as additional information for subsequent treatments, that can aim for instance at producing a semantic representation of the statement. It can also be used as a filter to select utterances belonging to a specific language, among several hypotheses, as done in Automatic Speech Recognition. As the syntactic interpretation of a statement is generally ambiguous with natural languages, the probabilisation of the space of syntactic trees can help in the analysis task : when several analyses are competing, one can then extract the most probable interpretation, or classify interpretations according to their probabilities. We are interested here in the probabilistic versions of Context-Free Grammars (PCFGs) and Substitution Tree Grammar (PTSGs). Syntactic treebanks, which as much as possible account for the language we wish to model, serve as the basis for defining the probabilistic parameters of such grammars. First, we exhibit in this thesis some drawbacks of the usual learning paradigms, due to the use of arbitrary heuristics (STSG DOP model), or to the use of learning criteria that consider these grammars as generative ones (creation of sentences from the grammar) rather than dedicated to analysis (creation of analyses from the sentence). In a second time, we propose new methods for training grammars, based on the traditional Maximum Entropy and Maximum Likelihood criteria. These criteria are instanciated so that they correspond to a syntactic analysis task rather than a language generation task. Specific training algorithms are necessary for their implementation, but traditional algorithms can cope with those models for the task of syntactic analysis. Lastly, we invest the problem of time complexity of syntactic analysis, which is a real issue for the effective use of PTSGs. We describe classes of PTSGs that allow the analysis of a sentence in polynomial complexity. We finally describe a method that enable the extraction of such a PTSG from the set of subtrees of a treebank. The PTSG produced by this method allows us to test our non-generative learning criterium on "realistic" data, and to give a statistical comparison between this criterium and the usual heuristic criterium in term of analysis performance.

This document gives information on parsing experiments applied to the standard Wall Street Journal corpus (``Standard'' means that this corpus has been widely used for exhibiting parsing tests of various models). The tested syntactic models are : standard Stochastic Context-Free Grammars, standard Tree Substitution Grammars, Gibbsian Context-Free Grammars and Gibbsian Tree Substitution Grammars. The parsing experiments are described with deep details so as to enable reader to easily redo the experiments from scratch (i.e. preparing the database, training and evaluating the models). The programs developped for these experiments are also described.

2004Related courses (10)

CS-431: Introduction to natural language processing

The objective of this course is to present the main models, formalisms and algorithms necessary for the development of applications in the field of natural language information processing. The concepts introduced during the lectures will be applied during practical sessions.

CS-320: Computer language processing

We teach the fundamental aspects of analyzing and interpreting computer languages, including the techniques to build compilers. You will build a working compiler from an elegant functional language into the new web standard for portable binaries called WebAssembly ( https://webassembly.org/ )

CS-452: Foundations of software

The course introduces the foundations on which programs and programming languages are built. It introduces syntax, types and semantics as building blocks that together define the properties of a program part or a language. Students will learn how to apply these concepts in their reasoning.

Related lectures (26)