Summary
A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. For example, the most recent human reference genome (assembly GRCh38/hg38) is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser. The length of a genome can be measured in multiple different ways. A simple way to measure genome length is to count the number of base pairs in the assembly. The golden path is an alternative measure of length that omits redundant regions such as haplotypes and pseudoautosomal regions. It is usually constructed by layering sequencing information over a physical map to combine scaffold information. It is a 'best estimate' of what the genome will look like and typically includes gaps, making it longer than the typical base pair assembly. Reference genomes assembly requires reads overlapping, creating contigs, which are contiguous DNA regions of consensus sequences. If there are gaps between contigs, these can be filled by scaffolding, either by contigs amplification with PCR and sequencing or by Bacterial Artificial Chromosome (BAC) cloning. Filling these gaps is not always possible, in this case multiple scaffolds are created in a reference assembly.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Ontological neighbourhood
Related courses (11)
BIO-109: Introduction to life sciences (for IC)
Ce cours présente les principes fondamentaux à l'œuvre dans les organismes vivants. Autant que possible, l'accent est mis sur les contributions de l'Informatique aux progrès des Sciences de la Vie.
BIO-463: Genomics and bioinformatics
This course covers various data analysis approaches associated with applications of DNA sequencing technologies, from genome sequencing to quantifying gene expression, transcription factor binding and
BIOENG-519: Methods: omics in biomedical research
High-throughput methodologies broadly called Omics allow to characterize the complexity and dynamics of any biological system. This course will provide a general description of different methods relat
Show more
Related publications (288)