In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.
Levenshtein distance may also be referred to as edit distance, although that term may also denote a larger family of distance metrics known collectively as edit distance. It is closely related to pairwise string alignments.
The Levenshtein distance between two strings (of length and respectively) is given by where
where the of some string is a string of all but the first character of , and is the th character of the string , counting from 0.
Note that the first element in the minimum corresponds to deletion (from to ), the second to insertion and the third to replacement.
This definition corresponds directly to the naive recursive implementation.
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following 3 edits change one into the other, and there is no way to do it with fewer than 3 edits:
kitten → sitten (substitution of "s" for "k"),
sitten → sittin (substitution of "i" for "e"),
sittin → sitting (insertion of "g" at the end).
The Levenshtein distance has several simple upper and lower bounds. These include:
It is at least the absolute value of the difference of the sizes of the two strings.
It is at most the length of the longer string.
It is zero if and only if the strings are equal.
If the strings have the same size, the Hamming distance is an upper bound on the Levenshtein distance. The Hamming distance is the number of positions at which the corresponding symbols in the two strings are different.
The Levenshtein distance between two strings is no greater than the sum of their Levenshtein distances from a third string (triangle inequality).
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The problem of computing longest common subsequences is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics.
In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question.
In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching. A requirement for a string metric (e.g. in contrast to string matching) is fulfillment of the triangle inequality. For example, the strings "Sam" and "Samuel" can be considered to be close.
La Physique Générale I (avancée) couvre la mécanique du point et du solide indéformable. Apprendre la mécanique, c'est apprendre à mettre sous forme mathématique un phénomène physique, en modélisant l
L'objectif du cours est d'introduire les notions de base de l'algèbre linéaire et ses applications.
This course provides the students with 1) a set of theoretical concepts to understand the machine learning approach; and 2) a subset of the tools to use this approach for problems arising in mechanica
Covers error-correcting codes, Hamming distance, and polynomial multiplication in algebraic fields.
Covers Convolutional Codes, focusing on mapping symbols and paths.
Explores detailed modeling of ion channels and neuronal morphologies in in silico neuroscience, covering neuron classification, ion channel kinetics, and experimental observations.
The immobilization of molecular catalysts imposes spatial constraints on their active site. We reveal that in bifunctional catalysis such constraints can also be utilized as an appealing handle to boost intrinsic activity through judicious control of the a ...
2022
,
A hash proof system (HPS) is a form of implicit proof of membership to a language. Out of the very few existing post-quantum HPS, most are based on languages of ciphertexts of code-based or lattice-based cryptosystems and inherently suffer from a gap cause ...
Surface stress drives long-range elastocapillary interactions at the surface of compliant solids, where it has been observed to mediate interparticle interactions and to alter the transport of liquid drops. We show that such an elastocapillary interaction ...