Feature selectionFeature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction. Feature selection techniques are used for several reasons: simplification of models to make them easier to interpret by researchers/users, shorter training times, to avoid the curse of dimensionality, improve data's compatibility with a learning model class, encode inherent symmetries present in the input space.
Dimensionality reductionDimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable (hard to control or deal with).
Concentration inequalityIn probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables.
Boolean functionIn mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually {true, false}, {0,1} or {-1,1}). Alternative names are switching function, used especially in older computer science literature, and truth function (or logical function), used in logic. Boolean functions are the subject of Boolean algebra and switching theory. A Boolean function takes the form , where is known as the Boolean domain and is a non-negative integer called the arity of the function.
NumeracyNumeracy is the ability to understand, reason with, and to apply simple numerical concepts. The charity National Numeracy states: "Numeracy means understanding how mathematics is used in the real world and being able to apply it to make the best possible decisions...It’s as much about thinking and reasoning as about 'doing sums'". Basic numeracy skills consist of comprehending fundamental arithmetical operations like addition, subtraction, multiplication, and division.
Beta functionIn mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral for complex number inputs such that . The beta function was studied by Leonhard Euler and Adrien-Marie Legendre and was given its name by Jacques Binet; its symbol Β is a Greek capital beta. The beta function is symmetric, meaning that for all inputs and .
Bias–variance tradeoffIn statistics and machine learning, the bias–variance tradeoff is the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters. The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: The bias error is an error from erroneous assumptions in the learning algorithm.
Chernoff boundIn probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential (e.g. sub-Gaussian). It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables. The bound is commonly named after Herman Chernoff who described the method in a 1952 paper, though Chernoff himself attributed it to Herman Rubin.
Upper and lower boundsIn mathematics, particularly in order theory, an upper bound or majorant of a subset S of some preordered set (K, ≤) is an element of K that is greater than or equal to every element of S. Dually, a lower bound or minorant of S is defined to be an element of K that is less than or equal to every element of S. A set with an upper (respectively, lower) bound is said to be bounded from above or majorized (respectively bounded from below or minorized) by that bound.
Integer factorizationIn number theory, integer factorization is the decomposition, when possible, of a positive integer into a product of smaller integers. If the factors are further restricted to be prime numbers, the process is called prime factorization, and includes the test whether the given integer is prime (in this case, one has a "product" of a single factor). When the numbers are sufficiently large, no efficient non-quantum integer factorization algorithm is known. However, it has not been proven that such an algorithm does not exist.