Standard deviationIn statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range. Standard deviation may be abbreviated SD, and is most commonly represented in mathematical texts and equations by the lower case Greek letter σ (sigma), for the population standard deviation, or the Latin letter s, for the sample standard deviation.
Meta-optimizationIn numerical optimization, meta-optimization is the use of one optimization method to tune another optimization method. Meta-optimization is reported to have been used as early as in the late 1970s by Mercer and Sampson for finding optimal parameter settings of a genetic algorithm. Meta-optimization and related concepts are also known in the literature as meta-evolution, super-optimization, automated parameter calibration, hyper-heuristics, etc.
Empirical probabilityIn probability theory and statistics, the empirical probability, relative frequency, or experimental probability of an event is the ratio of the number of outcomes in which a specified event occurs to the total number of trials, i.e., by means not of a theoretical sample space but of an actual experiment. More generally, empirical probability estimates probabilities from experience and observation. Given an event A in a sample space, the relative frequency of A is the ratio \tfrac m n, m being the number of outcomes in which the event A occurs, and n being the total number of outcomes of the experiment.
Kullback–Leibler divergenceIn mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P.
Frequentist inferenceFrequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist-inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded. The primary formulation of frequentism stems from the presumption that statistics could be perceived to have been a probabilistic frequency.
Entropy (information theory)In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to : where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannons"), while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys".
Rate functionIn mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events. Such functions are used to formulate large deviation principle. A large deviation principle quantifies the asymptotic probability of rare events for a sequence of probabilities. A rate function is also called a Cramér function, after the Swedish probabilist Harald Cramér. Rate function An extended real-valued function I : X → [0, +∞] defined on a Hausdorff topological space X is said to be a rate function if it is not identically +∞ and is lower semi-continuous, i.
Logistic distributionIn probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails (higher kurtosis). The logistic distribution is a special case of the Tukey lambda distribution.
Sample size determinationSample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power.
Information gain (decision tree)In information theory and machine learning, information gain is a synonym for Kullback–Leibler divergence; the amount of information gained about a random variable or signal from observing another random variable. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the conditional expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one.