In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejections of the null). Equivalently, the FDR is the expected ratio of the number of false positive classifications (false discoveries) to the total number of positive classifications (rejections of the null). The total number of rejections of the null include both the number of false positives (FP) and true positives (TP). Simply put, FDR = FP / (FP + TP). FDR-controlling procedures provide less stringent control of Type I errors compared to family-wise error rate (FWER) controlling procedures (such as the Bonferroni correction), which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power, at the cost of increased numbers of Type I errors. The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals (e.g., the expression level of each of 10,000 different genes in 100 different persons). By the late 1980s and 1990s, the development of "high-throughput" sciences, such as genomics, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform a very high number of statistical tests on a given data set. The technology of microarrays was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions. As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes (e.g. few individuals being tested) and large numbers of variables being measured per sample (e.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (6)
MATH-474: Statistics for genomic data analysis
After a short introduction to basic molecular biology and genomic technologies, this course covers the most useful statistical concepts and methods for the analysis of genomic data.
MATH-413: Statistics for data science
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
CS-401: Applied data analysis
This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat
Afficher plus
Séances de cours associées (33)
Problème de tests multiples
Explore les défis que posent les essais multiples dans l'analyse des données génomiques, y compris le contrôle des taux d'erreur, les valeurs de p ajustées, les tests de permutation et les pièges dans les essais d'hypothèses.
Compréhension des courbes ROC
Explore la courbe ROC, le taux vrai positif, le taux faux positif et les probabilités de prédiction dans les modèles de classification.
Probabilité : Quiz
Couvre les scénarios de probabilité et inclut un quiz utilisant Kahoot.
Afficher plus
Publications associées (90)

Accurate Diagnosis of Cortical and Infratentorial Lesions in Multiple Sclerosis Using Accelerated Fluid and White Matter Suppression Imaging

Tobias Kober

Objectives: The precise location of multiple sclerosis (MS) cortical lesions can be very challenging at 3 T, yet distinguishing them from subcortical lesions is essential for the diagnosis and prognosis of the disease. Compressed sensing-accelerated fluid ...
LIPPINCOTT WILLIAMS & WILKINS2023

Evaluation and optimization of novel extraction algorithms for the automatic detection of atrial activations recorded within the pulmonary veins during atrial fibrillation

Jean-Marc Vesin, Adrian Luca, Yann Prudat, Sasan Yazdani, Etienne Pruvot

Background and objective The automated detection of atrial activations (AAs) recorded from intracardiac electrograms (IEGMs) during atrial fibrillation (AF) is challenging considering their various amplitudes, morphologies and cycle length. Activation time ...
BMC2022

Social Learning with Disparate Hypotheses

Ali H. Sayed, Stefan Vlaski, Virginia Bordignon, Konstantinos Ntemos

In this paper we study the problem of social learning under multiple true hypotheses and self-interested agents. In this setup, each agent receives data that might be generated from a different hypothesis (or state) than the data other agents receive. In c ...
IEEE2022
Afficher plus
Concepts associés (6)
Multiple comparisons problem
In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences are made, the more likely erroneous inferences become. Several statistical techniques have been developed to address that problem, typically by requiring a stricter significance threshold for individual comparisons, so as to compensate for the number of inferences being made.
Valeur p
vignette|redresse=1.5|Illustration de la valeur-p. X désigne la loi de probabilité de la statistique de test et z la valeur calculée de la statistique de test. Dans un test statistique, la valeur-p (en anglais p-value pour probability value), parfois aussi appelée p-valeur, est la probabilité pour un modèle statistique donné sous l'hypothèse nulle d'obtenir une valeur au moins aussi extrême que celle observée. L'usage de la valeur-p est courant dans de nombreux domaines de recherche comme la physique, la psychologie, l'économie et les sciences de la vie.
Data dredging
vignette|Exemple de Data dredging. Le data dredging (littéralement le dragage de données mais mieux traduit comme étant du triturage de données) est une technique statistique qui . Une des formes du data dredging est de partir de données ayant un grand nombre de variables et un grand nombre de résultats, et de choisir les associations qui sont « statistiquement significatives », au sens de la valeur p (on parle aussi de p-hacking).
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.