For supervised learning applications in machine learning and statistical learning theory, generalization error (also known as the out-of-sample error or the risk) is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Because learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about predictive ability on new data. Generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of a machine learning algorithm is visualized by plots that show values of estimates of the generalization error through the learning process, which are called learning curves.
Statistical learning theory
In a learning problem, the goal is to develop a function that predicts output values for each input datum . The subscript indicates that the function is developed based on a data set of data points. The generalization error or expected loss or risk of a particular function over all possible values of and is the expected value of the loss function :
where is the unknown joint probability distribution for and .
Without knowing the joint probability distribution , it is impossible to compute . Instead, we can compute the error on sample data, which is called empirical error (or empirical risk). Given data points, the empirical error of a candidate function is:
An algorithm is said to generalize if:
Of particular importance is the generalization error of the data-dependent function that is found by a learning algorithm based on the sample. Again, for an unknown probability distribution, cannot be computed. Instead, the aim of many problems in statistical learning theory is to bound or characterize the difference of the generalization error and the empirical error in probability:
That is, the goal is to characterize the probability that the generalization error is less than the empirical error plus some error bound (generally dependent on and ).
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees
Identification of discrete-time linear models using experimental data is studied. The correlation method and spectral analysis are used to identify nonparametric models and the subspace and prediction
This course provides an introduction to the modeling of matter at the atomic scale, using interactive jupyter notebooks to see several of the core concepts of materials science in action.
In statistics and machine learning, the bias–variance tradeoff is the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters. The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: The bias error is an error from erroneous assumptions in the learning algorithm.
In mathematics, statistics, finance, computer science, particularly in machine learning and inverse problems, regularization is a process that changes the result answer to be "simpler". It is often used to obtain results for ill-posed problems or to prevent overfitting. Although regularization procedures can be divided in many ways, the following delineation is particularly helpful: Explicit regularization is regularization whenever one explicitly adds a term to the optimization problem.
In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitted model is a mathematical model that contains more parameters than can be justified by the data. In a mathematical sense, these parameters represent the degree of a polynomial. The essence of overfitting is to have unknowingly extracted some of the residual variation (i.
Explores gradient descent methods for training artificial neural networks, covering supervised learning, single-layer networks, and modern gradient descent rules.
Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we d ...
A range of behavioral and contextual factors, including eating and drinking behavior, mood, social context, and other daily activities, can significantly impact an individual's quality of life and overall well-being. Therefore, inferring everyday life aspe ...
This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposi ...