Summary
For supervised learning applications in machine learning and statistical learning theory, generalization error (also known as the out-of-sample error or the risk) is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Because learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about predictive ability on new data. Generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of a machine learning algorithm is visualized by plots that show values of estimates of the generalization error through the learning process, which are called learning curves. Statistical learning theory In a learning problem, the goal is to develop a function that predicts output values for each input datum . The subscript indicates that the function is developed based on a data set of data points. The generalization error or expected loss or risk of a particular function over all possible values of and is the expected value of the loss function : where is the unknown joint probability distribution for and . Without knowing the joint probability distribution , it is impossible to compute . Instead, we can compute the error on sample data, which is called empirical error (or empirical risk). Given data points, the empirical error of a candidate function is: An algorithm is said to generalize if: Of particular importance is the generalization error of the data-dependent function that is found by a learning algorithm based on the sample. Again, for an unknown probability distribution, cannot be computed. Instead, the aim of many problems in statistical learning theory is to bound or characterize the difference of the generalization error and the empirical error in probability: That is, the goal is to characterize the probability that the generalization error is less than the empirical error plus some error bound (generally dependent on and ).
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.