Homogeneity and heterogeneity (statistics)In statistics, homogeneity and its opposite, heterogeneity, arise in describing the properties of a dataset, or several datasets. They relate to the validity of the often convenient assumption that the statistical properties of any one part of an overall dataset are the same as any other part. In meta-analysis, which combines the data from several studies, homogeneity measures the differences or similarities between the several studies (see also Study heterogeneity). Homogeneity can be studied to several degrees of complexity.
Quantile normalizationIn statistics, quantile normalization is a technique for making two distributions identical in statistical properties. To quantile-normalize a test distribution to a reference distribution of the same length, sort the test distribution and sort the reference distribution. The highest entry in the test distribution then takes the value of the highest entry in the reference distribution, the next highest entry in the reference distribution, and so on, until the test distribution is a perturbation of the reference distribution.
Correlation coefficientA correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution. Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible agreement and 0 the strongest possible disagreement.
Regression diagnosticIn statistics, a regression diagnostic is one of a set of procedures available for regression analysis that seek to assess the validity of a model in any of a number of different ways. This assessment may be an exploration of the model's underlying statistical assumptions, an examination of the structure of the model by considering formulations that have fewer, more or different explanatory variables, or a study of subgroups of observations, looking for those that are either poorly represented by the model (outliers) or that have a relatively large effect on the regression model's predictions.
Mean squared prediction errorIn statistics the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a smoothing, curve fitting, or regression procedure is the expected value of the squared prediction errors (PE), the square difference between the fitted values implied by the predictive function and the values of the (unobservable) true value g. It is an inverse measure of the explanatory power of and can be used in the process of cross-validation of an estimated model.
Données en coupeIn statistics and econometrics, cross-sectional data is a type of data collected by observing many subjects (such as individuals, firms, countries, or regions) at a single point or period of time. Analysis of cross-sectional data usually consists of comparing the differences among selected subjects, typically with no regard to differences in time. For example, if we want to measure current obesity levels in a population, we could draw a sample of 1,000 people randomly from that population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of that sample is categorized as obese.
Méthode de DelphesLa méthode de Delphes (en anglais Delphi) est une méthode de prévision, utilisée en particulier en gestion de projet ou en prévision économique. Le principe de cette méthode est que des prévisions réalisées par un groupe d'experts structuré sont généralement plus fiables que celles faites par des groupes non structurés ou des individus. Le nom de « Delphes » vient de la ville grecque de Delphes où la pythie, l'Oracle de Delphes, faisait ses prédictions. La méthode de Delphes est une méthode visant à organiser la consultation d’experts sur un sujet précis.
Méthode des doubles différencesLa méthode des doubles différences (ou méthode des différences de différences) est une méthode statistique utilisée pour estimer l'effet d'un traitement et consistant à comparer la différence entre le groupe de contrôle et le groupe traité avant et après l'introduction du traitement. Cette méthode est notamment utilisée en évaluation des politiques publiques pour estimer l'effet d'un traitement dans le cadre théorique du modèle causal de Neyman-Rubin. Expérience naturelle Estimateur de Wald Méthode à contr
Mean absolute percentage errorThe mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics. It usually expresses the accuracy as a ratio defined by the formula: where At is the actual value and Ft is the forecast value. Their difference is divided by the actual value At. The absolute value of this ratio is summed for every forecasted point in time and divided by the number of fitted points n.
Modèle additif généraliséEn statistiques, le modèle additif généralisé (en anglais, generalized additive model ou GAM) est un modèle statistique développé par Trevor Hastie et Rob Tibshirani pour fusionner les propriétés du modèle linéaire généralisé avec celles du modèle additif. Le modèle spécifie une distribution (comme la distribution normale, ou la distribution binomiale) et une fonction de lien g reliant la valeur attendue de la distribution aux prédicteurs, et tentant d'ajuster les fonctions fi pour satisfaire : Les fonctions peuvent être ajustées en utilisant les moyennes non paramétriques ou paramétriques, et fournissant ainsi potentiellement de meilleurs ajustements aux données que les autres méthodes.