In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.
A string is a finite sequence of characters.
The empty string is denoted by .
The concatenation of two string and is denoted by , or shorter by .
Concatenating with the empty string makes no difference: .
Concatenation of strings is associative: .
For example, .
A language is a finite or infinite set of strings.
Besides the usual set operations like union, intersection etc., concatenation can be applied to languages:
if both and are languages, their concatenation is defined as the set of concatenations of any string from and any string from , formally .
Again, the concatenation dot is often omitted for brevity.
The language consisting of just the empty string is to be distinguished from the empty language .
Concatenating any language with the former doesn't make any change: ,
while concatenating with the latter always yields the empty language: .
Concatenation of languages is associative: .
For example, abbreviating , the set of all three-digit decimal numbers is obtained as . The set of all decimal numbers of arbitrary length is an example for an infinite language.
The alphabet of a string is the set of all of the characters that occur in a particular string. If s is a string, its alphabet is denoted by
The alphabet of a language is the set of all characters that occur in any string of , formally:
For example, the set is the alphabet of the string ,
and the above is the alphabet of the above language as well as of the language of all decimal numbers.
Let L be a language, and let Σ be its alphabet. A string substitution or simply a substitution is a mapping f that maps characters in Σ to languages (possibly in a different alphabet).