Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group. Sample sizes may be chosen in several ways: using experience – small samples, though sometimes unavoidable, can result in wide confidence intervals and risk of errors in statistical hypothesis testing. using a target variance for an estimate to be derived from the sample eventually obtained, i.e., if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator. using a target for the power of a statistical test to be applied once the sample is collected. using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement). Larger sample sizes generally lead to increased precision when estimating unknown parameters. For example, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more precise estimate of this proportion if we sampled and examined 200 rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the law of large numbers and the central limit theorem. In some situations, the increase in precision for larger sample sizes is minimal, or even non-existent.
Volkan Cevher, Kimon Antonakopoulos
Jürg Alexander Schiffmann, Cyril Picard, Faez Ahmed
Jian Wang, Matthias Finger, Qian Wang, Yiming Li, Matthias Wolf, Varun Sharma, Yi Zhang, Konstantin Androsov, Jan Steggemann, Leonardo Cristella, Xin Chen, Davide Di Croce, Rakesh Chawla, Matteo Galli, Anna Mascellani, João Miguel das Neves Duarte, Tagir Aushev, Tian Cheng, Yixing Chen, Werner Lustermann, Andromachi Tsirou, Alexis Kalogeropoulos, Andrea Rizzi, Ioannis Papadopoulos, Paolo Ronchese, Hua Zhang, Siyuan Wang, Tao Huang, David Vannerom, Michele Bianco, Sebastiana Gianì, Sun Hee Kim, Kun Shi, Abhisek Datta, Jian Zhao, Federica Legger, Gabriele Grosso, Ji Hyun Kim, Donghyun Kim, Zheng Wang, Sanjeev Kumar, Wei Li, Yong Yang, Geng Chen, Ajay Kumar, Ashish Sharma, Georgios Anagnostou, Joao Varela, Csaba Hajdu, Muhammad Ahmad, Ekaterina Kuznetsova, Ioannis Evangelou, Milos Dordevic, Meng Xiao, Sourav Sen, Xiao Wang, Kai Yi, Jing Li, Rajat Gupta, Hui Wang, Seungkyu Ha, Long Wang, Pratyush Das, Anton Petrov, Xin Sun, Xin Gao, Valérie Scheurer, Giovanni Mocellin, Muhammad Ansar Iqbal, Lukas Layer