**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Chi-squared test

Summary

A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table) are independent in influencing the test statistic (values within the table). The test is valid when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof. Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. For contingency tables with smaller sample sizes, a Fisher's exact test is used instead.
In the standard applications of this test, the observations are classified into mutually exclusive classes. If the null hypothesis that there are no differences between the classes in the population is true, the test statistic computed from the observations follows a χ2 frequency distribution. The purpose of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.
Test statistics that follow a χ2 distribution occur when the observations are independent. There are also χ2 tests for testing the null hypothesis of independence of a pair of random variables based on observations of the pairs.
Chi-squared tests often refers to tests for which the distribution of the test statistic approaches the χ2 distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true) of the test statistic approximates a chi-squared distribution more and more closely as sample sizes increase.
In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a normal distribution, such as Sir George Airy and Mansfield Merriman, whose works were criticized by Karl Pearson in his 1900 paper.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (88)

Related people (25)

Related units (16)

Related concepts (18)

Related courses (32)

Related lectures (81)

MATH-233: Probability and statistics

Le cours fournit une initiation à la théorie des probabilités et aux méthodes statistiques pour physiciens.

CS-411: Digital education

This course addresses the relationship between specific technological features and the learners' cognitive processes. It also covers the methods and results of empirical studies on this topic: do stud

MATH-236: Probability and statistics II

Linear statistical methods, analysis of experiments, logistic regression.

Pearson's chi-squared test

Pearson's chi-squared test () is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., Yates, likelihood ratio, portmanteau test in time series, etc.) – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900.

P-value

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience.

Goodness of fit

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test).

Probability Distributions in Environmental StudiesENV-400: Air pollution and climate change

Explores probability distributions for random variables in air pollution and climate change studies, covering descriptive and inferential statistics.

Statistical Hypothesis TestingMATH-232: Probability and statistics

Covers statistical hypothesis testing, confidence intervals, p-values, and significance levels in hypothesis testing.

Hypothesis Testing: State of NatureMATH-232: Probability and statistics

Explores hypothesis testing, emphasizing the state of nature and the importance of choosing the most powerful test.

Background: Quantification of the T2 signal by means of T2 mapping in acute pancreatitis (AP) has the potential to quantify the parenchymal edema. Quantitative T2 mapping may overcome the limitations of previously reported scoring systems for reliable asse ...

Dimitri Stelio Wyss, Francesca Carocci, Giulio Orecchia

We define p-adic BPS or pBPS invariants for moduli spaces M-beta,M-chi of one-dimensional sheaves on del Pezzo and K3 surfaces by means of integration over a non-archimedean local field F. Our definition relies on a canonical measure mu can on the F-analyt ...

Daniel Patrick Collins, Subhadeep Banik, Willi Meier

A near collision attack against the Grain v1 stream cipher was proposed by Zhang et al. in Eurocrypt 18. The attack uses the fact that two internal states of the stream cipher with very low hamming distance between them, produce similar keystream sequences ...

2023