Summary
In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The chain-termination method of DNA sequencing ("Sanger sequencing") can only be used for short DNA strands of 100 to 1000 base pairs. Due to this size limit, longer sequences are subdivided into smaller fragments that can be sequenced separately, and these sequences are assembled to give the overall sequence. In shotgun sequencing, DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence. Shotgun sequencing was one of the precursor technologies that was responsible for enabling whole genome sequencing. For example, consider the following two rounds of shotgun reads: In this extremely simplified example, none of the reads cover the full length of the original sequence, but the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally complicated by the great abundance of repetitive sequences, meaning similar short reads could come from completely different parts of the sequence. Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present on average in 12 different reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome, as of 2004.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (15)
BIO-463: Genomics and bioinformatics
This course covers various data analysis approaches associated with applications of DNA sequencing technologies, from genome sequencing to quantifying gene expression, transcription factor binding and
BIO-109: Introduction to life sciences (for IC)
Ce cours présente les principes fondamentaux à l'œuvre dans les organismes vivants. Autant que possible, l'accent est mis sur les contributions de l'Informatique aux progrès des Sciences de la Vie.
BIOENG-430: Selected topics in life sciences
The course presents an overview on how recent advances at the interfaces of biology, biotechnology, engineering, physical sciences, and medicine are 1) shaping the landscape of biomedical research; 2)
Show more