Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In causal inference, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system. Confounds are threats to internal validity. Confounding is defined in terms of the data generating model. Let X be some independent variable, and Y some dependent variable. To estimate the effect of X on Y, the statistician must suppress the effects of extraneous variables that influence both X and Y. We say that X and Y are confounded by some other variable Z whenever Z causally influences both X and Y. Let be the probability of event Y = y under the hypothetical intervention X = x. X and Y are not confounded if and only if the following holds: for all values X = x and Y = y, where is the conditional probability upon seeing X = x. Intuitively, this equality states that X and Y are not confounded whenever the observationally witnessed association between them is the same as the association that would be measured in a controlled experiment, with x randomized. In principle, the defining equality can be verified from the data generating model, assuming we have all the equations and probabilities associated with the model. This is done by simulating an intervention (see Bayesian network) and checking whether the resulting probability of Y equals the conditional probability . It turns out, however, that graph structure alone is sufficient for verifying the equality . Controlling for a variable Consider a researcher attempting to assess the effectiveness of drug X, from population data in which drug usage was a patient's choice.
, , , ,
,