In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.
The construction of such a tree for the string takes time and space linear in the length of . Once constructed, several operations can be performed quickly, for instance locating a substring in , locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc. Suffix trees also provided one of the first linear-time solutions for the longest common substring problem. These speedups come at a cost: storing a string's suffix tree typically requires significantly more space than storing the string itself.
The concept was first introduced by .
Rather than the suffix , Weiner stored in his trie the prefix identifier for each position, that is, the shortest string starting at and occurring only once in . His Algorithm D takes an uncompressed trie for and extends it into a trie for . This way, starting from the trivial trie for , a trie for can be built by successive calls to Algorithm D; however, the overall run time is . Weiner's Algorithm B maintains several auxiliary data structures, to achieve an over all run time linear in the size of the constructed trie. The latter can still be nodes, e.g. for Weiner's Algorithm C finally uses compressed tries to achieve linear overall storage size and run time.
Donald Knuth subsequently characterized the latter as "Algorithm of the Year 1973".
The text book reproduced Weiner's results in a simplified and more elegant form, introducing the term position tree.
was the first to build a (compressed) trie of all suffixes of . Although the suffix starting at is usually longer than the prefix identifier, their path representations in a compressed trie do not differ in size.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "the best of" is a substring of "It was the best of times". In contrast, "Itwastimes" is a subsequence of "It was the best of times", but not a substring. Prefixes and suffixes are special cases of substrings. A prefix of a string is a substring of that occurs at the beginning of ; likewise, a suffix of a string is a substring that occurs at the end of .
In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text. A basic example of string searching is when the pattern and the searched text are arrays of elements of an alphabet (finite set) Σ. Σ may be a human language alphabet, for example, the letters A through Z and other applications may use a binary alphabet (Σ = {0,1}) or a DNA alphabet (Σ = {A,C,G,T}) in bioinformatics.
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achiev ...
Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. mach ...
We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achiev ...