**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Chi-squared test

Summary

A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table) are independent in influencing the test statistic (values within the table). The test is valid when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof. Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. For contingency tables with smaller sample sizes, a Fisher's exact test is used instead.
In the standard applications of this test, the observations are classified into mutually exclusive classes. If the null hypothesis that there are no differences between the classes in the population is true, the test statistic computed from the observations follows a χ2 frequency distribution. The purpose of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.
Test statistics that follow a χ2 distribution occur when the observations are independent. There are also χ2 tests for testing the null hypothesis of independence of a pair of random variables based on observations of the pairs.
Chi-squared tests often refers to tests for which the distribution of the test statistic approaches the χ2 distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true) of the test statistic approximates a chi-squared distribution more and more closely as sample sizes increase.
In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a normal distribution, such as Sir George Airy and Mansfield Merriman, whose works were criticized by Karl Pearson in his 1900 paper.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (2)

Related lectures (290)

Related concepts (29)

Related courses (49)

Probability Distributions in Environmental Studies

Explores probability distributions for random variables in air pollution and climate change studies, covering descriptive and inferential statistics.

Statistical Hypothesis Testing

Covers statistical hypothesis testing, confidence intervals, p-values, and significance levels in hypothesis testing.

Hypothesis Testing: State of Nature

Explores hypothesis testing, emphasizing the state of nature and the importance of choosing the most powerful test.

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience.

A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table) are independent in influencing the test statistic (values within the table). The test is valid when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof.

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test).

MATH-233: Probability and statistics

The course gives an introduction to probability and statistics for physicists.

CS-411: Digital education

This course addresses the relationship between specific technological features and the learners' cognitive processes. It also covers the methods and results of empirical studies on this topic: do stud

MATH-236: Probability and statistics II

Linear statistical methods, analysis of experiments, logistic regression.

In this thesis, we treat robust estimation for the parameters of the Ornstein–Uhlenbeck process, which are the mean, the variance, and the friction. We start by considering classical maximum likelihoo

Tobias Kober, Tom Hilbert, Meng Yang, Yong Zhang, Jie Liu

Background The application value of T(2)mapping in evaluating cervical cancer (CC) features remains unclear. Purpose To investigate the role of T(2)values in evaluating CC classification, grade, and l