In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another. In sequence alignments of proteins, the degree of similarity between amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how conserved a particular region or sequence motif is among lineages. The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose side chains have similar biochemical properties) in a particular region of the sequence, suggest that this region has structural or functional importance. Although DNA and RNA nucleotide bases are more similar to each other than are amino acids, the conservation of base pairs can indicate a similar functional or structural role. Very short or very similar sequences can be aligned by hand. However, most interesting problems require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be aligned solely by human effort. Instead, human knowledge is applied in constructing algorithms to produce high-quality sequence alignments, and occasionally in adjusting the final results to reflect patterns that are difficult to represent algorithmically (especially in the case of nucleotide sequences).

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
BIO-378: Physiology lab I
Le TP de physiologie introduit les approches expérimentales du domaine biomédical, avec les montages de mesure, les capteurs, le conditionnement des signaux, l'acquisition et traitement de données. Le
BIO-379: Physiology lab II
Le TP de physiologie introduit les approches expérimentales du domaine biomédical, avec les montages de mesure, les capteurs, le conditionnement des signaux, l'acquisition et traitement de données. Le
MATH-432: Probability theory
The course is based on Durrett's text book Probability: Theory and Examples.
It takes the measure theory approach to probability theory, wherein expectations are simply abstract integrals.
Show more
Related lectures (78)
Business-IT Alignment: Transcripts and Contextual Inquiry
Discusses Karen Holtzblatt's talk, real contextual interviews, and business/IT alignment solutions.
Enterprise and Service-Oriented Architecture
Explores solutions for business and IT alignment, emphasizing key learning points and the importance of contextual inquiry.
Machining Assembly
Demonstrates the process of assembling machined parts for a secure fit.
Show more
Related publications (413)

Towards improving full-length ribosome density prediction by bridging sequence and graph-based representations

Pierre Vandergheynst, Felix Naef, Cédric Gobet, Francesco Craighero, Mohan Vamsi Nallapareddy

Translation elongation plays an important role in regulating protein concentrations in the cell, and dysregulation of this process has been linked to several human diseases. In this study, we use data from ribo-seq experiments to model ribosome dwell times ...
2024

Benchmarking informatics approaches for virus discovery: caution is needed when combining in silico identification methods

Jaspreet Singh Saini

Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral ...
Washington2024

Alignment of ND3 molecules in dc-electric fields

Andreas Osterwalder, Viet Le Duc, Junwen Zou

The control of movement and orientation of gas-phase molecules has become the focus of many research areas in molecular physics. Here, ND3 molecules are polarized in a segmented, curved electrostatic guide and adiabatically aligned inside a rotatable mass ...
Aip Publishing2024
Show more
Related concepts (35)
Conserved sequence
In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids (DNA and RNA) or proteins across species (orthologous sequences), or within a genome (paralogous sequences), or between donor and receptor taxa (xenologous sequences). Conservation indicates that a sequence has been maintained by natural selection. A highly conserved sequence is one that has remained relatively unchanged far back up the phylogenetic tree, and hence far back in geological time.
Protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
Multiple sequence alignment
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.