**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Cross-validation (statistics)

Summary

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.
Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generaliz

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (100)

Loading

Loading

Loading

Related people (37)

Related concepts (38)

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machin

In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or pre

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variable

Related courses (66)

FIN-472: Computational finance

Participants of this course will master computational techniques frequently used in mathematical finance applications. Emphasis will be put on the implementation and practical aspects.

BIO-322: Introduction to machine learning for bioengineers

Students understand basic concepts and methods of machine learning. They can describe them in mathematical terms and can apply them to data using a high-level programming language (julia/python/R).

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and implement methods to analyze diverse data types, such as images, music and social network data.

Related units (26)

Related lectures (162)

The aim of this work is to set up mathematical models and numerical methods to investigate the mass transfer process occurring during the peritoneal dialysis (PD) therapy. More precisely the final goal is the set up of tools to find for each patient submitted to PD the best therapy profile to purify blood and remove water from the patients. First, we build up specific mathematical models to represent the physical phenomena we are interested in. As discussed in chapter 1, starting from the Kedem-Katchalsky equations, we end up with a systems of nonlinear ordinary differential equations describing the various aspects of the physical problem. Then, we propose a method based on nonlinear programming techniques to solve the inverse problem arising from the need to assess the peritoneal membrane characteristics which are not directly measurable on the patient. Thanks to its flexibility we are able to support the main standard tests nowadays in use to assess the kinetic properties of the peritoneum. We devise a suitable parametrization of the control function (DPD Dynamic Peritoneal Dialysis) that allows to improve the standard PD profiles with a larger set of treatments. Then we propose an optimization algorithm to improve the PD efficiency. Moreover, in the framework of control theory, we devise an algorithm based on the maximum principle of Pontryagin for switched systems to investigate deeply the PD optimal control problem. Afterwards we carry out numerical simulations, investigating the main inputs influencing the peritoneal dialysis efficiency. Specifically we show an extensive comparison between the APD (Automated Peritoneal Dialysis) and DPD (Dynamic Peritoneal Dilaysis) in order to assess the conditions under which DPD allows to improve the PD performance. Then a numerical investigation based on the algorithm devised for switched systems is carried out to assess the adequacy of DPD. Moreover we set up a procedure to reach an efficiency target and minimize the patient's exposure to glucose in order to improve the PD biocompatibility. Finally, we present the validation results of the mathematical model in order to verify its accuracy. A comparison between the APD and DPD is presented. Moreover we carry out a statistical analysis to assess the error distribution related to the most relevant quantities in order to evaluate strengths and weaknesses of this model and to identify the needs for a further improvement.