Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution. RLS is used for two main reasons. The first comes up when the number of variables in the linear system exceeds the number of observations. In such settings, the ordinary least-squares problem is ill-posed and is therefore impossible to fit because the associated optimization problem has infinitely many solutions. RLS allows the introduction of further constraints that uniquely determine the solution. The second reason for using RLS arises when the learned model suffers from poor generalization. RLS can be used in such cases to improve the generalizability of the model by constraining it at training time. This constraint can either force the solution to be "sparse" in some way or to reflect other prior knowledge about the problem such as information about correlations between features. A Bayesian understanding of this can be reached by showing that RLS methods are often equivalent to priors on the solution to the least-squares problem. Consider a learning setting given by a probabilistic space , . Let denote a training set of pairs i.i.d. with respect to . Let be a loss function. Define as the space of the functions such that expected risk: is well defined. The main goal is to minimize the expected risk: Since the problem cannot be solved exactly there is a need to specify how to measure the quality of a solution. A good learning algorithm should provide an estimator with a small risk. As the joint distribution is typically unknown, the empirical risk is taken. For regularized least squares the square loss function is introduced: However, if the functions are from a relatively unconstrained space, such as the set of square-integrable functions on , this approach may overfit the training data, and lead to poor generalization. Thus, it should somehow constrain or penalize the complexity of the function .
Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Elias Abad Rocamora