Computational biologyComputational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer engineering which uses bioengineering to build computers. Bioinformatics, the analysis of informatics processes in biological systems, began in the early 1970s.
Phylogenetic treeA phylogenetic tree (also phylogeny or evolutionary tree) is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry. In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common ancestor of those descendants, and the edge lengths in some trees may be interpreted as time estimates.
Euclidean minimum spanning treeA Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.
String metricIn mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching. A requirement for a string metric (e.g. in contrast to string matching) is fulfillment of the triangle inequality. For example, the strings "Sam" and "Samuel" can be considered to be close.
Graph automorphismIn the mathematical field of graph theory, an automorphism of a graph is a form of symmetry in which the graph is mapped onto itself while preserving the edge–vertex connectivity. Formally, an automorphism of a graph G = (V, E) is a permutation σ of the vertex set V, such that the pair of vertices (u, v) form an edge if and only if the pair (σ(u), σ(v)) also form an edge. That is, it is a graph isomorphism from G to itself. Automorphisms may be defined in this way both for directed graphs and for undirected graphs.
BiologistA biologist is a scientist who conducts research in biology. Biologists are interested in studying life on Earth, whether it is an individual cell, a multicellular organism, or a community of interacting populations. They usually specialize in a particular branch (e.g., molecular biology, zoology, and evolutionary biology) of biology and have a specific research focus (e.g., studying malaria or cancer). Biologists who are involved in basic research have the aim of advancing knowledge about the natural world.
Knapsack problemThe knapsack problem is the following problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items.
Worst-case complexityIn computer science (specifically computational complexity theory), the worst-case complexity measures the resources (e.g. running time, memory) that an algorithm requires given an input of arbitrary size (commonly denoted as n in asymptotic notation). It gives an upper bound on the resources required by the algorithm. In the case of running time, the worst-case time complexity indicates the longest running time performed by an algorithm given any input of size n, and thus guarantees that the algorithm will finish in the indicated period of time.
Conserved sequenceIn evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids (DNA and RNA) or proteins across species (orthologous sequences), or within a genome (paralogous sequences), or between donor and receptor taxa (xenologous sequences). Conservation indicates that a sequence has been maintained by natural selection. A highly conserved sequence is one that has remained relatively unchanged far back up the phylogenetic tree, and hence far back in geological time.
Minimum spanning treeA minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.