In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides (i.e. A, T, G, and C), k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers (A, G, A, and T), three 2-mers (AG, GA, AT), two 3-mers (AGA and GAT) and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers (e.g. four in the case of DNA). k-mers are simply length subsequences. For example, all the possible k-mers of a DNA sequence are shown below: A method of visualizing k-mers, the k-mer spectrum, shows the multiplicity of each k-mer in a sequence versus the number of k-mers with that multiplicity. The number of modes in a k-mer spectrum for a species's genome varies, with most species having a unimodal distribution. However, all mammals have a multimodal distribution. The number of modes within a k-mer spectrum can vary between regions of genomes as well: humans have unimodal k-mer spectra in 5' UTRs and exons but multimodal spectra in 3' UTRs and introns. The frequency of k-mer usage is affected by numerous forces, working at multiple levels, which are often in conflict. It is important to note that k-mers for higher values of k are affected by the forces affecting lower values of k as well. For example, if the 1-mer A does not occur in a sequence, none of the 2-mers containing A (AA, AT, AG, and AC) will occur either, thereby linking the effects of the different forces. When k = 1, there are four DNA k-mers, i.e., A, T, G, and C. At the molecular level, there are three hydrogen bonds between G and C, whereas there are only two between A and T.