Concept

Sequence database

In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. The UniProt database is an example of a protein sequence database. As of 2013 it contained over 40 million sequences and is growing at an exponential rate. Historically, sequences were published in paper form, but as the number of sequences grew, this storage method became unsustainable. Searching in a sequence database involves looking for similarities between a genomic/protein sequence and a query string and, finding the sequence in the database that "best" matches the target sequence (based on criteria which vary depending on the search method). The number of matches/hits is used to formulate a score that determines the similarity between the sequence query and the sequences in the sequence database. The main goal is to have a good balance between the two criteria. The need for sequence databases originated in 1950 when Fredrick Sanger reported the primary structure of insulin. He won his second Nobel Prize for creating methods for sequencing nucleic acids, and his comparative approach is what sparked other protein biochemists to begin collecting amino acid sequences. Thus marking the beginning of molecular databases. In 1965 Margaret Dayhoff and her team at the National Biomedical Research Foundation (NBRF) published "The Atlas of Protein Sequence and Structure". They put all know protein sequences in the Atlas, even unpublished material. This can be seen as the first attempt to create a molecular database. They made use of the newly computerized (1964) Medical Literature Analysis and Retrieval System (MEDLARS) at the National Institutes of Health (NIH). The team used computers to store the data but had to manually type and proofread each sequence, which had a high cost in time and money. In 1966 the team released the second edition of the Atlas, double the size of the first.

Source officielle

https://en.wikipedia.org/wiki/Sequence_database

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Sequence database

Graph Chatbot

Chattez avec Graph Search