In machine learning, Platt scaling or Platt calibration is a way of transforming the outputs of a classification model into a probability distribution over classes. The method was invented by John Platt in the context of support vector machines,
replacing an earlier method by Vapnik,
but can be applied to other classification models.
Platt scaling works by fitting a logistic regression model to a classifier's scores.
Consider the problem of binary classification: for inputs x, we want to determine whether they belong to one of two classes, arbitrarily labeled +1 and −1. We assume that the classification problem will be solved by a real-valued function f, by predicting a class label y = sign(f(x)). For many problems, it is convenient to get a probability , i.e. a classification that not only gives an answer, but also a degree of certainty about the answer. Some classification models do not provide such a probability, or give poor probability estimates.
Platt scaling is an algorithm to solve the aforementioned problem. It produces probability estimates
i.e., a logistic transformation of the classifier scores f(x), where A and B are two scalar parameters that are learned by the algorithm. Note that predictions can now be made according to if the probability estimates contain a correction compared to the old decision function y = sign(f(x)).
The parameters A and B are estimated using a maximum likelihood method that optimizes on the same training set as that for the original classifier f. To avoid overfitting to this set, a held-out calibration set or cross-validation can be used, but Platt additionally suggests transforming the labels y to target probabilities
for positive samples (y = 1), and
for negative samples, y = -1.
Here, N+ and N− are the number of positive and negative samples, respectively. This transformation follows by applying Bayes' rule to a model of out-of-sample data that has a uniform prior over the labels. The constants 1 and 2, on the numerator and denominator respectively, are derived from the application of Laplace smoothing.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Explores linear models for classification, including parametric models, regression, and logistic regression, along with model evaluation metrics and maximum margin classifiers.
In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. Probabilistic classifiers provide classification that can be useful in its own right or when combining classifiers into ensembles. Formally, an "ordinary" classifier is some rule, or function, that assigns to a sample x a class label ŷ: The samples come from some set X (e.
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues (Boser et al., 1992, Guyon et al., 1993, Cortes and Vapnik, 1995, Vapnik et al., 1997) SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory proposed by Vapnik (1982, 1995) and Chervonenkis (1974).
The objective of this course is to give an overview of machine learning techniques used for real-world applications, and to teach how to implement and use them in practice. Laboratories will be done i
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
Machine learning and data analysis are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analy
Gaze estimation is a difficult task, even for humans. However, as humans, we are good at understanding a situation and exploiting it to guess the expected visual focus of attention of people, and we usually use this information to retrieve people's gaze. I ...
Test time augmentation has been shown to be an effective approach to combat domain shifts in deep learning. Despite their promising performance levels, the interpretability of the underlying used models is however low. Saliency maps have been widely used i ...
Background: Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatri ...