**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Machine Learning Fundamentals: Regularization and Cross-validation

Description

This lecture covers the concepts of overfitting, regularization, and cross-validation in machine learning. It explains how to handle nonlinear data using polynomial curve fitting and feature expansion. The instructor discusses the importance of higher dimensions and the benefits of polynomial feature expansion. The lecture also delves into kernel functions, the representer theorem, and kernel regression. It concludes with a demonstration of kernel ridge regression and the impact of regularization on linear regression and logistic regression.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Instructor

In course

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

Related concepts (277)

Related lectures (1,000)

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Logistic regression

In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (the coefficients in the linear combination).

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

Regularized least squares

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution. RLS is used for two main reasons. The first comes up when the number of variables in the linear system exceeds the number of observations. In such settings, the ordinary least-squares problem is ill-posed and is therefore impossible to fit because the associated optimization problem has infinitely many solutions.

Ridge regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also known as Tikhonov regularization, named for Andrey Tikhonov, it is a method of regularization of ill-posed problems. It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters.

Document Analysis: Topic ModelingDH-406: Machine learning for DH

Explores document analysis, topic modeling, and generative models for data generation in machine learning.

Neural Networks: Training and ActivationCIVIL-226: Introduction to machine learning for engineers

Explores neural networks, activation functions, backpropagation, and PyTorch implementation.

Introduction to Machine Learning: Supervised LearningCS-233(a): Introduction to machine learning (BA3)

Introduces supervised learning, covering classification, regression, model optimization, overfitting, and kernel methods.

Nonlinear ML AlgorithmsCS-233(a): Introduction to machine learning (BA3)

Introduces nonlinear ML algorithms, covering nearest neighbor, k-NN, polynomial curve fitting, model complexity, overfitting, and regularization.

Cross-validation & RegularizationDH-406: Machine learning for DH

Explores polynomial curve fitting, kernel functions, and regularization techniques, emphasizing the importance of model complexity and overfitting.