Beliefs depend on the available information. This idea is formalized in probability theory by conditioning. Conditional probabilities, conditional expectations, and conditional probability distributions are treated on three levels: discrete probabilities, probability density functions, and measure theory. Conditioning leads to a non-random result if the condition is completely specified; otherwise, if the condition is left random, the result of conditioning is also random.
Example: A fair coin is tossed 10 times; the random variable X is the number of heads in these 10 tosses, and Y is the number of heads in the first 3 tosses. In spite of the fact that Y emerges before X it may happen that someone knows X but not Y.
Conditional probability
Given that X = 1, the conditional probability of the event Y = 0 is
More generally,
One may also treat the conditional probability as a random variable, — a function of the random variable X, namely,
The expectation of this random variable is equal to the (unconditional) probability,
namely,
which is an instance of the law of total probability
Thus, may be treated as the value of the random variable corresponding to X = 1. On the other hand, is well-defined irrespective of other possible values of X.
Conditional expectation
Given that X = 1, the conditional expectation of the random variable Y is More generally,
(In this example it appears to be a linear function, but in general it is nonlinear.) One may also treat the conditional expectation as a random variable, — a function of the random variable X, namely,
The expectation of this random variable is equal to the (unconditional) expectation of Y,
namely,
or simply
which is an instance of the law of total expectation
The random variable is the best predictor of Y given X. That is, it minimizes the mean square error on the class of all random variables of the form f(X). This class of random variables remains intact if X is replaced, say, with 2X. Thus, It does not mean that rather, In particular, More generally, for every function g that is one-to-one on the set of all possible values of X.