**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Subsequence

Summary

In mathematics, a subsequence of a given sequence is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence is a subsequence of obtained after removal of elements and The relation of one sequence being the subsequence of another is a preorder.
Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as from is a substring. The substring is a refinement of the subsequence.
The list of all subsequences for the word "apple" would be "a", "ap", "al", "ae", "app", "apl", "ape", "ale", "appl", "appe", "aple", "apple", "p", "pp", "pl", "pe", "ppl", "ppe", "ple", "pple", "l", "le", "e", "" (empty string).
Given two sequences and a sequence is said to be a common subsequence of and if is a subsequence of both and For example, if
then is said to be a common subsequence of and
This would be the longest common subsequence, since only has length 3, and the common subsequence has length 4. The longest common subsequence of and is
Subsequences have applications to computer science, especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.
Take two sequences of DNA containing 37 elements, say:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA
The longest common subsequence of sequences 1 and 2 is:
LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA
This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:
SEQ1 = AGGTGAGGAG
SEQ2 = CTAGTTAGTA
Another way to show this is to align the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences:
SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-
| || ||| ||||| | | | | || | || | || | |||
SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA
Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related MOOCs

Loading

Related publications (1)

Loading

Related people

Related units

Related concepts (6)

Related courses (9)

Related lectures (55)

Related MOOCs

MATH-100(a): Advanced analysis I

Nous étudions les concepts fondamentaux de l'analyse, le calcul différentiel et intégral de fonctions réelles d'une variable.

MATH-100(b): Advanced analysis I

Dans ce cours, nous étudierons les notions fondamentales de l'analyse réelle, ainsi que le calcul différentiel et intégral pour les fonctions réelles d'une variable réelle.

MATH-101(a): Analysis I

Étudier les concepts fondamentaux d'analyse et le calcul différentiel et intégral des fonctions réelles d'une variable.

No results

No results

No results

Analysis I: Convergence and Subsequences

Explores convergence, subsequences, and the Bolzano-Weierstrass theorem in sequences.

Limit Superior and Limit Inferior

Explores limsup, liminf, Bolzano-Weierstrass theorem, accumulation points, and bounded sequences.

The Bolzano-Weierstrass Theorem

Explains the Bolzano-Weierstrass Theorem, stating that every bounded sequence has a convergent subsequence.

Real number

In mathematics, a real number is a number that can be used to measure a continuous one-dimensional quantity such as a distance, duration or temperature. Here, continuous means that pairs of values can have arbitrarily small differences. Every real number can be almost uniquely represented by an infinite decimal expansion. The real numbers are fundamental in calculus (and more generally in all mathematics), in particular by their role in the classical definitions of limits, continuity and derivatives.

Subsequence

In mathematics, a subsequence of a given sequence is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence is a subsequence of obtained after removal of elements and The relation of one sequence being the subsequence of another is a preorder. Subsequences can contain consecutive elements which were not consecutive in the original sequence.

String (computer science)

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

In this paper, we propose a novel unsupervised approach for sequence matching by explicitly accounting for the locality properties in the sequences. In contrast to conventional approaches that rely on