In bioinformatics, a sequence logo is a graphical representation of the sequence conservation of nucleotides (in a strand of DNA/RNA) or amino acids (in protein sequences).
A sequence logo is created from a collection of aligned sequences and depicts the consensus sequence and diversity of the sequences.
Sequence logos are frequently used to depict sequence characteristics such as protein-binding sites in DNA or functional units in proteins.
A sequence logo consists of a stack of letters at each position.
The relative sizes of the letters indicate their frequency in the sequences.
The total height of the letters depicts the information content of the position, in bits.
To create sequence logos, related DNA, RNA or protein sequences, or DNA sequences that have common conserved binding sites, are aligned so that the most conserved parts create good alignments. A sequence logo can then be created from the conserved multiple sequence alignment. The sequence logo will show how well residues are conserved at each position: the higher the number of residues, the higher the letters will be, because the better the conservation is at that position. Different residues at the same position are scaled according to their frequency. The height of the entire stack of residues is the information measured in bits. Sequence logos can be used to represent conserved DNA binding sites, where transcription factors bind.
The information content (y-axis) of position is given by:
for amino acids,
for nucleic acids,
where is the uncertainty
(sometimes called the Shannon entropy) of position
Here, is the relative frequency of base or amino acid at position , and is the small-sample correction for an alignment of letters. The height of letter in column is given by
The approximation for the small-sample correction, , is given by:
where is 4 for nucleotides, 20 for amino acids, and is the number of sequences in the alignment.
A consensus logo is a simplified variation of a sequence logo that can be embedded in text format.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Le but du cours est de fournir un aperçu général de la biologie des cellules et des organismes. Nous en discuterons dans le contexte de la vie des cellules et des organismes, en mettant l'accent sur l
En biologie de l'évolution, les séquences conservées sont des séquences d'acides nucléiques (ADN et ARN) ou d'acide aminés identiques ou similaires au sein d'un génome (on parle alors de séquences paralogues) ; à travers les espèces (on parle alors de séquences orthologues), ou bien encore entre un taxon donneur et un taxon récepteur (on parle alors de séquences xénologues). La conservation indique qu'une séquence a été maintenue par la sélection naturelle.
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue. When a sequence motif appears in the exon of a gene, it may encode the "structural motif" of a protein; that is a stereotypical element of the overall structure of the protein.
En bio-informatique, l'alignement de séquences (ou alignement séquentiel) est une manière de représenter deux ou plusieurs séquences de macromolécules biologiques (ADN, ARN ou protéines) les unes sous les autres, de manière à en faire ressortir les régions homologues ou similaires. L'objectif de l'alignement est de disposer les composants (nucléotides ou acides aminés) pour identifier les zones de concordance. Ces alignements sont réalisés par des programmes informatiques dont l'objectif est de maximiser le nombre de coïncidences entre nucléotides ou acides aminés dans les différentes séquences.
In this paper, the low-energy impact behavior of a fully biobased composite made of bio-sourced polyamide 11 resin reinforced with flax fibers was investigated. Different composite laminates were studied in order to determine the stacking sequence effects ...
RNase H is a prototypical example for two-metalion catalysis in enzymes. An RNase H activity cleaving the ribonucleic acid (RNA) backbone of a DNA/RNA hybrid is present not only in important drug targets, such as the HIV-1 reverse transcriptase, but also i ...
Gene regulatory networks (GRNs) determine cellular behaviour, and ultimately the functioning of single- and multicellular organisms. Transcription factors regulate gene expression by binding to DNA or via remodelling chromatin. Recent advances in biotechno ...