Nucleic acid sequenceA nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nucleotides. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, with its double helix, there are two possible directions for the notated sequence; of these two, the sense strand is used.
Amino acidAmino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the α-amino acids, from which proteins are composed. Only 22 α-amino acids appear in the genetic code of all life. Amino acids can be classified according to the locations of the core structural functional groups, as alpha- (α-), beta- (β-), gamma- (γ-) or delta- (δ-) amino acids; other categories relate to polarity, ionization, and side chain group type (aliphatic, acyclic, aromatic, containing hydroxyl or sulfur, etc.
Codon usage biasCodon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons). There are 64 different codons (61 codons encoding for amino acids and 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon.
Escherichia coliEscherichia coli (ˌɛʃəˈrɪkiə_ˈkoʊlaɪ ) is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus Escherichia that is commonly found in the lower intestine of warm-blooded organisms. Most E. coli strains are harmless, but some serotypes such as EPEC, and ETEC are pathogenic and can cause serious food poisoning in their hosts, and are occasionally responsible for food contamination incidents that prompt product recalls.
Sequence analysisIn bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignment, searches against biological databases, and others. Since the development of methods of high-throughput production of gene and protein sequences, the rate of addition of new sequences to the databases increased very rapidly.
Proteinogenic amino acidProteinogenic amino acids are amino acids that are incorporated biosynthetically into proteins during translation. The word "proteinogenic" means "protein creating". Throughout known life, there are 22 genetically encoded (proteinogenic) amino acids, 20 in the standard genetic code and an additional 2 (selenocysteine and pyrrolysine) that can be incorporated by special translation mechanisms.
Single-nucleotide polymorphismIn genetics and bioinformatics, a single-nucleotide polymorphism (SNP snɪp; plural SNPs snɪps) is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population (generally regarded as 1% or more). For example, a G nucleotide present at a specific location in a reference genome may be replaced by an A in a minority of individuals. The two possible nucleotide variations of this SNP – G or A – are called alleles.
Essential amino acidAn essential amino acid, or indispensable amino acid, is an amino acid that cannot be synthesized from scratch by the organism fast enough to supply its demand, and must therefore come from the diet. Of the 21 amino acids common to all life forms, the nine amino acids humans cannot synthesize are valine, isoleucine, leucine, methionine, phenylalanine, tryptophan, threonine, histidine, and lysine.
Coding regionThe coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.
NucleotideNucleotides are organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules within all life-forms on Earth. Nucleotides are obtained in the diet and are also synthesized from common nutrients by the liver. Nucleotides are composed of three subunit molecules: a nucleobase, a five-carbon sugar (ribose or deoxyribose), and a phosphate group consisting of one to three phosphates.