Early stopping

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner's performance on data outside of the training set. Past that point, however, improving the learner's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation. This section presents some of the basic machine-learning concepts required for a description of early stopping methods. Overfitting Machine learning algorithms train a model based on a finite set of training data. During this training, the model is evaluated based on how well it predicts the observations contained in the training set. In general, however, the goal of a machine learning scheme is to produce a model that generalizes, that is, that predicts previously unseen observations. Overfitting occurs when a model fits the data in the training set well, while incurring larger generalization error. Regularization (mathematics) Regularization, in the context of machine learning, refers to the process of modifying a learning algorithm so as to prevent overfitting. This generally involves imposing some sort of smoothness constraint on the learned model. This smoothness may be enforced explicitly, by fixing the number of parameters in the model, or by augmenting the cost function as in Tikhonov regularization. Tikhonov regularization, along with principal component regression and many other regularization schemes, fall under the umbrella of spectral regularization, regularization characterized by the application of a filter. Early stopping also belongs to this class of methods.

Graph Chatbot

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Gradient boosting with extreme-value theory for wildfire prediction

Data-Driven Control and Optimization under Noisy and Uncertain Conditions

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Data-Driven Control and Optimization under Noisy and Uncertain Conditions

Gradient boosting with extreme-value theory for wildfire prediction