**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Concept# Loss functions for classification

Summary

In machine learning and mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions in classification problems (problems of identifying which category a particular observation belongs to). Given as the space of all possible inputs (usually ), and as the set of labels (possible outputs), a typical goal of classification algorithms is to find a function which best predicts a label for a given input . However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same to generate different . As a result, the goal of the learning problem is to minimize expected loss (also known as the risk), defined as
where is a given loss function, and is the probability density function of the process that generated the data, which can equivalently be written as
Within classification, several commonly used loss functions are written solely in terms of the product of the true label and the predicted label . Therefore, they can be defined as functions of only one variable , so that with a suitably chosen function . These are called margin-based loss functions. Choosing a margin-based loss function amounts to choosing . Selection of a loss function within this framework impacts the optimal which minimizes the expected risk.
In the case of binary classification, it is possible to simplify the calculation of expected risk from the integral specified above. Specifically,
The second equality follows from the properties described above. The third equality follows from the fact that 1 and −1 are the only possible values for , and the fourth because . The term within brackets is known as the conditional risk.
One can solve for the minimizer of by taking the functional derivative of the last equality with respect to and setting the derivative equal to 0. This will result in the following equation
which is also equivalent to setting the derivative of the conditional risk equal to zero.

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts (6)

Related publications (81)

Related courses (15)

Related people (25)

Related units (4)

Related lectures (52)

Stochastic gradient descent

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).

Boosting (machine learning)

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant (1988, 1989): "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing).

Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

ME-390: Foundations of artificial intelligence

This course provides the students with 1) a set of theoretical concepts to understand the machine learning approach; and 2) a subset of the tools to use this approach for problems arising in mechanica

MGT-448: Statistical inference and machine learning

This course aims to provide graduate students a thorough grounding in the methods, theory, mathematics and algorithms needed to do research and applications in machine learning. The course covers topi

Neural Networks: Training and Activation

Explores neural networks, activation functions, backpropagation, and PyTorch implementation.

Classification Problems: Overview and Loss Functions

Covers classification problems and various loss functions used in machine learning.

Logistic Regression

Covers logistic regression for linear classification and unsupervised dimensionality reduction techniques.

This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine lear ...

2023Volkan Cevher, Kimon Antonakopoulos

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Elias Abad Rocamora

Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly w ...

2024