Data dredgingvignette|Exemple de Data dredging. Le data dredging (littéralement le dragage de données mais mieux traduit comme étant du triturage de données) est une technique statistique qui . Une des formes du data dredging est de partir de données ayant un grand nombre de variables et un grand nombre de résultats, et de choisir les associations qui sont « statistiquement significatives », au sens de la valeur p (on parle aussi de p-hacking).
Test des étendues de TukeyEn statistique, le test des étendues de Tukey aussi appelé test de Tukey, méthode de Tukey, méthode de Tukey-Kramer ou test DSH (différence significative honnête) de Tukey, nommé d'après John Tukey, est un test statistique permettant d'effectuer une en une seule étape. Il peut être utilisé dans le cadre d'une ANOVA ou bien sur des données brutes pour évaluer par exemple si des moyennes sont significativement différentes l'une de l'autre.
Scheffé's methodIn statistics, Scheffé's method, named after the American statistician Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons. It is particularly useful in analysis of variance (a special case of regression analysis), and in constructing simultaneous confidence bands for regressions involving basis functions. Scheffé's method is a single-step multiple comparison procedure which applies to the set of estimates of all possible contrasts among the factor level means, not just the pairwise differences considered by the Tukey–Kramer method.
False positive rateIn statistics, when performing multiple comparisons, a false positive ratio (also known as fall-out or false alarm ratio) is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as the ratio between the number of negative events wrongly categorized as positive (false positives) and the total number of actual negative events (regardless of classification). The false positive rate (or "false alarm rate") usually refers to the expectancy of the false positive ratio.
False coverage rateIn statistics, a false coverage rate (FCR) is the average rate of false coverage, i.e. not covering the true parameters, among the selected intervals. The FCR gives a simultaneous coverage at a (1 − α)×100% level for all of the parameters considered in the problem. The FCR has a strong connection to the false discovery rate (FDR). Both methods address the problem of multiple comparisons, FCR from confidence intervals (CIs) and FDR from P-value's point of view. FCR was needed because of dangers caused by selective inference.
Testing hypotheses suggested by the dataIn statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true. This is because circular reasoning (double dipping) would be involved: something seems true in the limited data set; therefore we hypothesize that it is true in general; therefore we wrongly test it on the same, limited data set, which seems to confirm that it is true.
False positives and false negativesA false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test, in contrast to the two kinds of correct result (a and a ).
Post hoc analysisIn a scientific study, post hoc analysis (from Latin post hoc, "after this") consists of statistical analyses that were specified after the data were seen. They are usually used to uncover specific differences between three or more group means when an analysis of variance (ANOVA) test is significant. This typically creates a multiple testing problem because each potential analysis is effectively a statistical test. Multiple testing procedures are sometimes used to compensate, but that is often difficult or impossible to do precisely.
False discovery rateIn statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejections of the null). Equivalently, the FDR is the expected ratio of the number of false positive classifications (false discoveries) to the total number of positive classifications (rejections of the null).
Family-wise error rateIn statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests. John Tukey developed in 1953 the concept of a familywise error rate as the probability of making a Type I error among a specified group, or "family," of tests. Ryan (1959) proposed the related concept of an experimentwise error rate, which is the probability of making a Type I error in a given experiment.