Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group. Sample sizes may be chosen in several ways: using experience – small samples, though sometimes unavoidable, can result in wide confidence intervals and risk of errors in statistical hypothesis testing. using a target variance for an estimate to be derived from the sample eventually obtained, i.e., if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator. using a target for the power of a statistical test to be applied once the sample is collected. using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement). Larger sample sizes generally lead to increased precision when estimating unknown parameters. For example, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more precise estimate of this proportion if we sampled and examined 200 rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the law of large numbers and the central limit theorem. In some situations, the increase in precision for larger sample sizes is minimal, or even non-existent.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (31)
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
MICRO-110: Design of experiments
This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to eva
MSE-352: Introduction to microscopy + Laboratory work
Ce cours d'introduction à la microscopie a pour but de donner un apperçu des différentes techniques d'analyse de la microstructure et de la composition des matériaux, en particulier celles liées aux m
Afficher plus
Séances de cours associées (190)
Résistance mécanique statistique: Céramique
Couvre la résistance mécanique statistique en céramique, y compris la statistique de Weibull et le comportement stable de fissuration en compression.
Théorème des limites centrales: Illustration et applications
Explore le théorème de la limite centrale et ses implications statistiques dans les variables aléatoires.
Comprendre les statistiques et le design expérimental
Fournit un aperçu de la théorie des probabilités de base, de l'ANOVA, des tests t, du théorème de limite centrale, des métriques, des intervalles de confiance et des tests non paramétriques.
Afficher plus
Publications associées (693)

On the Generalization of Stochastic Gradient Descent with Momentum

Volkan Cevher, Kimon Antonakopoulos

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...
Brookline2024

DATED: GUIDELINES FOR CREATING SYNTHETIC DATASETS FOR ENGINEERING DESIGN APPLICATIONS

Jürg Alexander Schiffmann, Cyril Picard, Faez Ahmed

Exploiting the recent advancements in artificial intelligence, showcased by ChatGPT and DALL-E, in real-world applications necessitates vast, domain-specific, and publicly accessible datasets. Unfortunately, the scarcity of such datasets poses a significan ...
Amer Soc Mechanical Engineers2023
Afficher plus
Concepts associés (24)
Réplication (statistique)
In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "... the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate." Replication is not the same as repeated measurements of the same item: they are dealt with differently in statistical experimental design and data analysis.
Échantillonnage (statistiques)
thumb|Exemple d'échantillonnage aléatoire En statistique, l'échantillonnage désigne les méthodes de sélection d'un sous-ensemble d'individus (un échantillon) à l'intérieur d'une population pour estimer les caractéristiques de l'ensemble de la population. Cette méthode présente plusieurs avantages : une étude restreinte sur une partie de la population, un moindre coût, une collecte des données plus rapide que si l'étude avait été réalisé sur l'ensemble de la population, la réalisation de contrôles destructifs Les résultats obtenus constituent un échantillon.
Sampling error
In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. It can produced biased results. Since the sample does not include all members of the population, statistics of the sample (often known as estimators), such as means and quartiles, generally differ from the statistics of the entire population (known as parameters). The difference between the sample statistic and population parameter is considered the sampling error.
Afficher plus
MOOCs associés (4)
Synchrotrons and X-Ray Free Electron Lasers (part 1)
Synchrotrons and X-Ray Free Electron Lasers (part 1)
Synchrotrons and X-Ray Free Electron Lasers (part 2)
The first MOOC to provide an extensive introduction to synchrotron and XFEL facilities and associated techniques and applications.
Cement Chemistry and Sustainable Cementitious Materials
Learn the basics of cement chemistry and laboratory best practices for assessment of its key properties.
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.