In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.
The more inferences are made, the more likely erroneous inferences become. Several statistical techniques have been developed to address that problem, typically by requiring a stricter significance threshold for individual comparisons, so as to compensate for the number of inferences being made.
The problem of multiple comparisons received increased attention in the 1950s with the work of statisticians such as Tukey and Scheffé. Over the ensuing decades, many procedures were developed to address the problem. In 1996, the first international conference on multiple comparison procedures took place in Tel Aviv.
Multiple comparisons arise when a statistical analysis involves multiple simultaneous statistical tests, each of which has a potential to produce a "discovery". A stated confidence level generally applies only to each test considered individually, but often it is desirable to have a confidence level for the whole family of simultaneous tests. Failure to compensate for multiple comparisons can have important real-world consequences, as illustrated by the following examples:
Suppose the treatment is a new way of teaching writing to students, and the control is the standard way of teaching writing. Students in the two groups can be compared in terms of grammar, spelling, organization, content, and so on. As more attributes are compared, it becomes increasingly likely that the treatment and control groups will appear to differ on at least one attribute due to random sampling error alone.
Suppose we consider the efficacy of a drug in terms of the reduction of any one of a number of disease symptoms. As more symptoms are considered, it becomes increasingly likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Data dredging (also known as data snooping or p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.
Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test, is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other. Named after John Tukey, it compares all possible pairs of means, and is based on a studentized range distribution (q) (this distribution is similar to the distribution of t from the t-test. See below).
In statistics, Scheffé's method, named after the American statistician Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons. It is particularly useful in analysis of variance (a special case of regression analysis), and in constructing simultaneous confidence bands for regressions involving basis functions. Scheffé's method is a single-step multiple comparison procedure which applies to the set of estimates of all possible contrasts among the factor level means, not just the pairwise differences considered by the Tukey–Kramer method.
This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to eva
Inference from the particular to the general based on probability models is central to the statistical method. This course gives a graduate-level account of the main ideas of statistical inference.
This course is neither an introduction to the mathematics of statistics nor an introduction to a statistics program such as R. The aim of the course is to understand statistics from its experimental d
Delves into hypothesis testing, covering test statistics, critical regions, power functions, p-values, multiple testing, and non-parametric statistics.
Information theory has allowed us to determine the fundamental limit of various communication and algorithmic problems, e.g., the channel coding problem, the compression problem, and the hypothesis testing problem. In this work, we revisit the assumptions ...
EPFL2024
Conversational tutoring systems (CTSs) offer a promising avenue for individualized learning support, especially in domains like persuasive writing. Although these systems have the potential to enhance the learning process, the specific role of learner cont ...
Psychotic symptoms are among the most debilitating and challenging presentations of severe psychiatric diseases, such as schizophrenia, schizoaffective, and bipolar disorder. A pathophysiological understanding of intrinsic brain activity underlying psychos ...