Concept

Données de comptage

In statistics, count data is a statistical data type describing countable quantities, data which can take only the counting numbers, non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking. The statistical treatment of count data is distinct from that of binary data, in which the observations can take only two values, usually represented by 0 and 1, and from ordinal data, which may also consist of integers but where the individual values fall on an arbitrary scale and only the relative ranking is important. An individual piece of count data is often termed a count variable. When such a variable is treated as a random variable, the Poisson, binomial and negative binomial distributions are commonly used to represent its distribution. Graphical examination of count data may be aided by the use of data transformations chosen to have the property of stabilising the sample variance. In particular, the square root transformation might be used when data can be approximated by a Poisson distribution (although other transformation have modestly improved properties), while an inverse sine transformation is available when a binomial distribution is preferred. Here the count variable would be treated as a dependent variable. Statistical methods such as least squares and analysis of variance are designed to deal with continuous dependent variables. These can be adapted to deal with count data by using data transformations such as the square root transformation, but such methods have several drawbacks; they are approximate at best and estimate parameters that are often hard to interpret. The Poisson distribution can form the basis for some analyses of count data and in this case Poisson regression may be used. This is a special case of the class of generalized linear models which also contains specific forms of model capable of using the binomial distribution (binomial regression, logistic regression) or the negative binomial distribution where the assumptions of the Poisson model are violated, in particular when the range of count values is limited or when overdispersion is present.

Source officielle

https://fr.wikipedia.org/wiki/Données_de_comptage

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Cours associés (2)

MATH-408: Regression methods

General graduate course on regression methods

MATH-449: Biostatistics

This course covers statistical methods that are widely used in medicine and biology. A key topic is the analysis of longitudinal data: that is, methods to evaluate exposures, effects and outcomes that

Concepts associés (8)

Quasi-likelihood

In statistics, quasi-likelihood methods are used to estimate parameters in a statistical model when exact likelihood methods, for example maximum likelihood estimation, are computationally infeasible. Due to the wrong likelihood being used, quasi-likelihood estimators lose asymptotic efficiency compared to, e.g., maximum likelihood estimators. Under broadly applicable conditions, quasi-likelihood estimators are consistent and asymptotically normal. The asymptotic covariance matrix can be obtained using the so-called sandwich estimator.

Indicateur de dispersion

En statistique, un indicateur de dispersion mesure la variabilité des valeurs d’une série statistique. Il est toujours positif et d’autant plus grand que les valeurs de la série sont étalées. Les plus courants sont la variance, l'écart-type et l'écart interquartile. Ces indicateurs complètent l’information apportée par les indicateurs de position ou de tendance centrale, mesurés par la moyenne ou la médiane. Dans la pratique, c'est-à-dire dans l'industrie, les laboratoires ou en métrologie, où s'effectuent des mesurages, cette dispersion est estimée par l'écart type.

Statistical data type

In statistics, groups of individual data points may be classified as belonging to any of various statistical data types, e.g. categorical ("red", "blue", "green"), real number (1.68, -5, 1.7e+6), odd number (1,3,5) etc. The data type is a fundamental component of the semantic content of the variable, and controls which sorts of probability distributions can logically be used to describe the variable, the permissible operations on the variable, the type of regression analysis used to predict the variable, etc.

Afficher plus

Source officielle

https://fr.wikipedia.org/wiki/Données_de_comptage

À propos de ce résultat

Cours associés (2)

MATH-408: Regression methods

General graduate course on regression methods

MATH-449: Biostatistics

Séances de cours associées (16)

Inférence : vérification du modèle

Couvre les moindres carrés pondérés itératifs, les modèles linéaires généralisés et la vérification des modèles.

Régression moderne: données d'orge de printemps

Couvre l'inférence, les moindres carrés pondérés, l'analyse des données sur l'orge de printemps et les techniques de lissage.

Calcul des modèles de données et analyse des séries chronologiques univariées

Couvre les modèles de données de comptage et la régression de Poisson, puis les transitions vers une analyse univariée des séries chronologiques pour la prévision des variables économiques.

Afficher plus

Publications associées (7)

Counting using deep learning regression gives value to ecological surveys

Devis Tuia, Benjamin Alexander Kellenberger

Many ecological studies rely on count data and involve manual counting of objects of interest, which is time-consuming and especially disadvantageous when time in the field or lab is limited. However, an increasing number of works uses digital imagery, whi ...

Nature Research2021

Fragility Curves for Wide-Flange Steel Columns and Implications on Building-Specific Earthquake-Induced Loss Assessment

Dimitrios Lignos, Ahmed Mohamed Ahmed Elkady, Subash Ghimire

Building-specific loss assessment methodologies utilize component fragility curves to compute the expected losses in the aftermath of earthquakes. Such curves are not available for steel columns assuming they remain elastic due to capacity design considera ...

2018

Applications of Approximate Learning and Inference for Probabilistic Models

Young Jun Ko

We develop approximate inference and learning methods for facilitating the use of probabilistic modeling techniques motivated by applications in two different areas. First, we consider the ill-posed inverse problem of recovering an image from an underdeter ...

EPFL2017

Afficher plus

Concepts associés (8)

Quasi-likelihood

Indicateur de dispersion

Statistical data type

Afficher plus