Sequence assemblyIn bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically, the short fragments (reads) result from shotgun sequencing genomic DNA, or gene transcript (ESTs).
Nucleic acid notationThe nucleic acid notation currently in use was first formalized by the International Union of Pure and Applied Chemistry (IUPAC) in 1970. This universally accepted notation uses the Roman characters G, C, A, and T, to represent the four nucleotides commonly found in deoxyribonucleic acids (DNA). Given the rapidly expanding role for genetic sequencing, synthesis, and analysis in biology, some researchers have developed alternate notations to further support the analysis and manipulation of genetic data.
Whole genome sequencingWhole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast. Whole genome sequencing has largely been used as a research tool, but was being introduced to clinics in 2014.
Reference genomeA reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor.
Nucleic acid sequenceA nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nucleotides. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, with its double helix, there are two possible directions for the notated sequence; of these two, the sense strand is used.
454 Life Sciences454 Life Sciences was a biotechnology company based in Branford, Connecticut that specialized in high-throughput DNA sequencing. It was acquired by Roche in 2007 and shut down by Roche in 2013 when its technology became noncompetitive, although production continued until mid-2016. 454 Life Sciences was founded by Jonathan Rothberg and was originally known as 454 Corporation, a subsidiary of CuraGen. For their method for low-cost gene sequencing, 454 Life Sciences was awarded the Wall Street Journal's Gold Medal for Innovation in the Biotech-Medical category in 2005.
StatisticsStatistics (from German: Statistik, () "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal".
Frequentist probabilityFrequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in many trials (the long-run probability). Probabilities can be found (in principle) by a repeatable objective process (and are thus ideally devoid of opinion). The continued use of frequentist methods in scientific inference, however, has been called into question. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the classical interpretation.
Statistical inferenceStatistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. Inferential statistics can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the assumption that the data come from a larger population.
Genome-wide association studyIn genomics, a genome-wide association study (GWA study, or GWAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms. When applied to human data, GWA studies compare the DNA of participants having varying phenotypes for a particular trait or disease.