From conditional probability to conditional distribution to conditional expectation, and back
I can’t count how many times I have looked up the formal (measure theoretic) definitions of conditional probability distribution or conditional expectation (even though it’s not that hard ) Another such occasion was yesterday. This time I took some notes.
From conditional probability → to conditional distribution → to conditional expectation
Let and be two realvalued random variables.
Conditional probability
For a fixed set (Feller, 1966, p. 157) defines conditional probability of an event for given as follows.
By (in words, “a conditional probability of the event for given ”) is meant a function such that for every set
where is the marginal distribution of .
(where and are both Borel sets on .)
That is, the conditional probability can be defined as something that, when integrated with respect to the marginal distribution of , results in the joint probability of and .
Moreover, note that if then the above formula yields , the marginal probability of the event .
Example
For example, if the joint distribution of two random variables and is the following bivariate normal distribution
then by sitting down with a pen and paper for some amount of time, it is not hard to verify that the function
in this case satisfies the above definition of .
Conditional distribution
Later on (Feller, 1966, p. 159) follows up with the notion of conditional probability distribution:
By a conditional probability distribution of for given is meant a function of two variables, a point and a set , such that
for a fixed set
is a conditional probability of the event for given .
is for each a probability distribution.
It is also pointed out that
In effect a conditional probability distribution is a family of ordinary probability distributions and so the whole theory carries over without change.
(Feller, 1966)
When I first came across this viewpoint, I found it incredibly enlightening to regard the conditional probability distribution as a family of ordinary probability distributions.
Example
For example, assume that is an integervalued and nonnegative random variable, and that the conditional probability distribution of for given is an Fdistribution (denoted ) with and degrees of freedom. Then the conditional probability distribution of can be regarded as a family of probability distributions for , whose probability density functions look like this:
In addition, as pointed out above, if we know the marginal distribution of , then the conditional probability distribution of can be used to obtain the marginal probability distribution of , or to randomly sample from the marginal distribution. Practically it means that if we randomly generate a value of according to its probability distribution, and use this value to randomly generate a value of according to the conditional distribution of for the given , then the observations resulting from this procedure follow the marginal distribution of . Continuing the previous example, assume that follows a binomial distribution with parameters and . Then the described simulation procedure estimates the following shape for the probability density function of , the marginal distribution of :
Conditional expectation
Finally, (Feller, 1966, p. 159) introduces the notion of conditional expectation. By the above, for given a value we have that
(here denotes the Borel algebra on ), and therefore, a conditional probability distribution can be viewed as a family of ordinary probability distributions (represented by for different s). Thus, as (Feller, 1966, p. 159) points out, if is given then the conditional expectation “introduces a new notation rather than a new concept.”
A conditional expectation is a function of assuming at the value
provided the integral converges.
Note that, because is a function of , it is a random variable, whose value at an individual point is given by the above definition. Moreover, from the above definitions of conditional probability and conditional expectation it follows that
Example [cont.]
We continue with the last example. From the properties of the Fdistribution we know that under this example’s assumptions on the conditional distribution, it holds that
A rather boring strictly decreasing function of converging to as .
Thus, under the example’s assumption on the distribution of , the conditional expectation is a discrete random variable, which has nonzero probability mass at the values and .
From conditional expectation → to conditional probability
An alternative approach is to define the conditional expectation first, and then to define conditional probability as the conditional expectation of the indicator function. This approach seems less intuitive to me. However, it is more flexible and more general, as we see below.
Conditional expectation
A definition in 2D
Let and be two realvalued random variables, and let denote the Borel algebra on . Recall that and can be represented as mappings and over some measure space . We can define , the conditional expectation of given , as follows.
A measurable function is the conditional expectation of for given , i.e.,
if for all sets it holds that
where is the marginal probability distribution of .
Interpretation in 2D
If and are realvalued onedimensional, then the pair can be viewed as a random vector in the plane. Each set consists of parallels to the axis, and we can define a algebra induced by as the collection of all sets on the plane, where is a Borel set on the line. The collection of all such sets forms a algebra on the plane, which is contained in the algebra of all Borel sets in . is called the algebra generated by the random variable .
Then can be equivalently defined as a random variable such that
where denotes the indicator function of the set .
A more general definition of conditional expectation
The last paragraph illustrates that one could generalize the definition of the conditional expectation of given to the conditional expectation of given an arbitrary algebra (not necessarily the algebra generated by ). This leads to the following general definition, which is stated in (Feller, 1966, pp. 160161) in a slightly different notation.
Let be a random variable, and let be a algebra of sets.

A random variable is called a conditional expectation of relative to , or , if it is measurable and

If is the algebra generated by a random variable , then .
Back to conditional probability and conditional distributions
Let be a random variable that is equal to one if and only if . The conditional probability of given can be defined in terms of a conditional expectation as
Under certain regularity conditions the above defines the conditional probability distribution of .
References
 Feller, W. (1966). An introduction to probability theory and its applications (Vol. 2). John Wiley & Sons.