In statistics, data transformation is the application of a deterministic mathematical function to each point in a data set—that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function. Transforms are usually applied so that the data appear to more closely meet the assumptions of a statistical inference procedure that is to be applied, or to improve the interpretability or appearance of graphs.
Nearly always, the function that is used to transform the data is invertible, and generally is continuous. The transformation is usually applied to a collection of comparable measurements. For example, if we are working with data on peoples' incomes in some currency unit, it would be common to transform each person's income value by the logarithm function.
Guidance for how data should be transformed, or whether a transformation should be applied at all, should come from the particular statistical analysis to be performed. For example, a simple way to construct an approximate 95% confidence interval for the population mean is to take the sample mean plus or minus two standard error units. However, the constant factor 2 used here is particular to the normal distribution, and is only applicable if the sample mean varies approximately normally. The central limit theorem states that in many situations, the sample mean does vary normally if the sample size is reasonably large. However, if the population is substantially skewed and the sample size is at most moderate, the approximation provided by the central limit theorem can be poor, and the resulting confidence interval will likely have the wrong coverage probability. Thus, when there is evidence of substantial skew in the data, it is common to transform the data to a symmetric distribution before constructing a confidence interval. If desired, the confidence interval can then be transformed back to the original scale using the inverse of the transformation that was applied to the data.
Data can also be transformed to make them easier to visualize.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Computer environments such as educational games, interactive simulations, and web services provide large amounts of data, which can be analyzed and serve as a basis for adaptation. This course will co
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
En théorie des probabilités et en statistiques, la loi de Poisson est une loi de probabilité discrète qui décrit le comportement du nombre d'événements se produisant dans un intervalle de temps fixé, si ces événements se produisent avec une fréquence moyenne ou espérance connue, et indépendamment du temps écoulé depuis l'événement précédent. gauche|vignette|Chewing gums sur un trottoir. Le nombre de chewing gums sur un pavé est approximativement distribué selon une loi de Poisson.
En statistique, l'on parle d'hétéroscédasticité lorsque les variances des résidus des variables examinées sont différentes. Le mot provient du grec, composé du préfixe hétéro- (« autre »), et de skedasê (« dissipation»). Une collection de variables aléatoires est hétéroscédastique s'il y a des sous-populations qui ont des variabilités différentes des autres. La notion d'hétéroscédasticité s'oppose à celle d'homoscédasticité. Dans le second cas, la variance de l'erreur des variables est constante i.e. .
Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of learning, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection.
Couvre l'apprentissage supervisé, la classification, la régression, les limites de décision, le surajustement, Perceptron, SVM et la régression logistique.
Couvre les applications linéaires, les matrices, les transformations et le principe de superposition.
,
This research aimed to evaluate the clinical features and computed tomography (CT) scans associated with poor outcomes in COVID-19 patients with acute kidney injury (AKI). A total of 351 COVID-19 patients (100 AKI, 251 non-AKI) hospitalized at Imam Hossein ...
IRANIAN SOC NEPHROLGY2023
, , ,
Ion-sensors play a major role in physiology and healthcare monitoring since they are capable of continuously collecting biological data from body fluids. Nevertheless, ion interference from background electrolytes present in the sample is a paramount chall ...
2021
, ,
Motivated by the recent successes of neural networks that have the ability to fit the data perfectly \emph{and} generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs ...