In statistics, deviance is a goodness-of-fit statistic for a statistical model; it is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals (SSR) in ordinary least squares to cases where model-fitting is achieved by maximum likelihood. It plays an important role in exponential dispersion models and generalized linear models.
Deviance can be related to Kullback-Leibler divergence.
The unit deviance is a bivariate function that satisfies the following conditions:
The total deviance of a model with predictions of the observation is the sum of its unit deviances: .
The (total) deviance for a model M0 with estimates , based on a dataset y, may be constructed by its likelihood as:
Here denotes the fitted values of the parameters in the model M0, while denotes the fitted parameters for the saturated model: both sets of fitted values are implicitly functions of the observations y. Here, the saturated model is a model with a parameter for every observation so that the data are fitted exactly. This expression is simply 2 times the log-likelihood ratio of the full model compared to the reduced model. The deviance is used to compare two models – in particular in the case of generalized linear models (GLM) where it has a similar role to residual sum of squares from ANOVA in linear models (RSS).
Suppose in the framework of the GLM, we have two nested models, M1 and M2. In particular, suppose that M1 contains the parameters in M2, and k additional parameters. Then, under the null hypothesis that M2 is the true model, the difference between the deviances for the two models follows, based on Wilks' theorem, an approximate chi-squared distribution with k-degrees of freedom. This can be used for hypothesis testing on the deviance.
Some usage of the term "deviance" can be confusing. According to Collett:
"the quantity is sometimes referred to as a deviance. This is [...] inappropriate, since unlike the deviance used in the context of generalized linear modelling, does not measure deviation from a model that is a perfect fit to the data.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
Le critère d'information d'Akaike, (en anglais Akaike information criterion ou AIC) est une mesure de la qualité d'un modèle statistique proposée par Hirotugu Akaike en 1973. Lorsque l'on estime un modèle statistique, il est possible d'augmenter la vraisemblance du modèle en ajoutant un paramètre. Le critère d'information d'Akaike, tout comme le critère d'information bayésien, permet de pénaliser les modèles en fonction du nombre de paramètres afin de satisfaire le critère de parcimonie.
Explore l'algorithme IRLS pour l'estimation pondérée des moindres carrés dans GLM.
Explore la vérification du modèle et les résidus dans lanalyse de régression, en soulignant limportance des diagnostics pour assurer la validité du modèle.
Explore l'interprétation des réponses binaires, les fonctions de liaison, la régression logistique et la sélection des modèles à l'aide de déviances et de critères d'information.