Beliefs depend on the available information. This idea is formalized in probability theory by conditioning. Conditional probabilities, conditional expectations and conditional distributions are treated on three levels: discrete probabilities, probability density functions, and measure theory. Conditioning leads to a non-random result if the condition is completely specified; otherwise, if the condition is left random, the result of conditioning is also random.
This article concentrates on interrelations between various kinds of conditioning, shown mostly by examples.
Conditioning on the discrete level
Example. A fair coin is tossed 10 times; the random variable
is the number of heads in these 10 tosses, and
— the number of heads in the first 3 tosses. In spite of the fact that
emerges before
it may happen that someone knows
but not
.
Conditional probability
Given that
the conditional probability of the event
is
More generally,
![{\displaystyle \mathbb {P} (Y=0|X=x)={\frac {\binom {7}{x}}{\binom {10}{x}}}={\frac {7!(10-x)!}{(7-x)!10!}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a6e7e8b63ca6a61a564f0982b3af0ff97f3db249)
for
otherwise (for
),
One may also treat the conditional probability as a random variable, — a function of the random variable
, namely,
![{\displaystyle \mathbb {P} (Y=0|X)={\begin{cases}{\binom {7}{X}}/{\binom {10}{X}}&{\text{for }}X\leq 7,\\0&{\text{for }}X>7.\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c5b9109ec92d6c3c37cf14e51247a2d9717a8d5f)
The expectation of this random variable is equal to the (unconditional) probability,
![{\displaystyle \mathbb {E} (\mathbb {P} (Y=0|X))=\sum _{x}\mathbb {P} (Y=0|X=x)\mathbb {P} (X=x)=\mathbb {P} (Y=0),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/85e579fc3764b275f373a6823850bba6bfd333bc)
namely,
![{\displaystyle \sum _{x=0}^{7}{\frac {\binom {7}{x}}{\binom {10}{x}}}\cdot {\frac {1}{2^{10}}}{\binom {10}{x}}={\frac {1}{8}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/41d5176b4bd8f38fb64050c4bba3421d3eaabcee)
which is an instance of the law of total probability
Thus,
may be treated as the value of the random variable
corresponding to
On the other hand,
is well-defined irrespective of other possible values of
.
Conditional expectation
Given that
the conditional expectation of the random variable
is
More generally,
![{\displaystyle \mathbb {E} (Y|X=x)={\frac {3}{10}}x}](https://wikimedia.org/api/rest_v1/media/math/render/svg/690038984793df20a05aecde254ea5287891e16c)
for
(In this example it appears to be a linear function, but in general it is nonlinear.) One may also treat the conditional expectation as a random variable, — a function of the random variable
, namely,
![{\displaystyle \mathbb {E} (Y|X)={\frac {3}{10}}X.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/71cff9b7bd83443176815263b7ed748a5861ba95)
The expectation of this random variable is equal to the (unconditional) expectation of
,
![{\displaystyle \mathbb {E} (\mathbb {E} (Y|X))=\sum _{x}\mathbb {E} (Y|X=x)\mathbb {P} (X=x)=\mathbb {E} (Y),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/12edd7f8208802eae358794373c573e8e3eaa694)
namely,
or simply ![{\displaystyle \mathbb {E} {\Big (}{\frac {3}{10}}X{\Big )}={\frac {3}{10}}\mathbb {E} (X)={\frac {3}{10}}\cdot 5={\frac {3}{2}}\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/39cfecbe850bd5eaa80bcf7ec727c82fd8ab77fd)
which is an instance of the law of total expectation
The random variable
is the best predictor of
given
. That is, it minimizes the mean square error
on the class of all random variables of the form
This class of random variables remains intact if
is replaced, say, with
Thus,
It does not mean that
rather,
In particular,
More generally,
for every function
that is one-to-one on the set of all possible values of
. The values of
are irrelevant; what matters is the partition (denote it α
)
![{\displaystyle \Omega =\{X=x_{1}\}\uplus \{X=x_{2}\}\uplus \dots }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7c979b70fc4c779cf2633de2ba1d1dd52c5cb748)
of the sample space
into disjoint sets
(Here
are all possible values of
.) Given an arbitrary partition
of
, one may define the random variable
Still,
Conditional probability may be treated as a special case of conditional expectation. Namely,
if
is the indicator of
. Therefore the conditional probability also depends on the partition
generated by
rather than on
itself;
On the other hand, conditioning on an event
is well-defined, provided that
irrespective of any partition that may contain
as one of several parts.
Conditional distribution
Given
the conditional distribution of
is
![{\displaystyle \mathbb {P} (Y=y|X=x)={\frac {{\binom {3}{y}}{\binom {7}{x-y}}}{\binom {10}{x}}}={\frac {{\binom {x}{y}}{\binom {10-x}{3-y}}}{\binom {10}{3}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/65e540c73e7e8150c6d719fb31a23468472d2992)
for
It is the hypergeometric distribution
or equivalently,
The corresponding expectation
obtained from the general formula
for
is nothing but the conditional expectation
Treating
as a random distribution (a random vector in the four-dimensional space of all measures on
one may take its expectation, getting the unconditional distribution of
, — the binomial distribution
This fact amounts to the equality
![{\displaystyle \sum _{x=0}^{10}\mathbb {P} (Y=y|X=x)\mathbb {P} (X=x)=\mathbb {P} (Y=y)={\frac {1}{2^{3}}}{\binom {3}{y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73eff3609d8a46217462c70b95c903b98e4b17da)
for
just the law of total probability.
Conditioning on the level of densities
Example. A point of the sphere
is chosen at random according to the uniform distribution on the sphere. The random variables
,
,
are the coordinates of the random point. The joint density of
,
,
does not exist (since the sphere is of zero volume), but the joint density
of
,
exists,
![{\displaystyle f_{X,Y}(x,y)={\begin{cases}{\frac {1}{2\pi {\sqrt {1-x^{2}-y^{2}}}}}&{\text{if }}x^{2}+y^{2}<1,\\0&{\text{otherwise}}.\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/072f030c6b9060f89ccb6b5914dcf9a9fae16ec8)
(The density is non-constant because of a non-constant angle between the sphere and the plane.) The density of
may be calculated by integration,
![{\displaystyle f_{X}(x)=\int _{-\infty }^{+\infty }f_{X,Y}(x,y)\,\mathrm {d} y=\int _{-{\sqrt {1-x^{2}}}}^{+{\sqrt {1-x^{2}}}}{\frac {\mathrm {d} y}{2\pi {\sqrt {1-x^{2}-y^{2}}}}}\,;}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f95aede632836bb669bef232f5c7de33dc3bb7b6)
surprisingly, the result does not depend on
in (-1,1),
![{\displaystyle f_{X}(x)={\begin{cases}0.5&{\text{for }}-1<x<1,\\0&{\text{otherwise}},\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c459eeb4335384d88ef18e70432a511fae2da9ea)
which means that
is distributed uniformly on
The same holds for
and
(and in fact, for
whenever
Conditional probability
Calculation
Given that
the conditional probability of the event
is the integral of the conditional density,
![{\displaystyle {\begin{aligned}&f_{Y|X=0.5}(y)={\frac {f_{X,Y}(0.5,y)}{f_{X}(0.5)}}={\begin{cases}{\frac {1}{\pi {\sqrt {0.75-y^{2}}}}}&{\text{for }}-{\sqrt {0.75}}<y<{\sqrt {0.75}},\\0&{\text{otherwise}};\end{cases}}\\&\mathbb {P} (Y\leq 0.75|X=0.5)=\int _{-\infty }^{0.75}f_{Y|X=0.5}(y)\,\mathrm {d} y=\\&=\int _{-{\sqrt {0.75}}}^{0.75}{\frac {\mathrm {d} y}{\pi {\sqrt {0.75-y^{2}}}}}={\frac {1}{2}}+{\frac {1}{\pi }}\arcsin {\sqrt {0.75}}={\frac {5}{6}}\,.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/24d1dac2f4504daa16410afc132eb54def977afb)
More generally,
![{\displaystyle \mathbb {P} (Y\leq y|X=x)={\frac {1}{2}}+{\frac {1}{\pi }}\arcsin {\frac {y}{\sqrt {1-x^{2}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53d9bf0310d877df9acf21d878d6f0ea1ba88183)
for all
and
such that
(otherwise the denominator
vanishes) and
(otherwise the conditional probability degenerates to 0 or 1). One may also treat the conditional probability as a random variable, — a function of the random variable
, namely,
![{\displaystyle \mathbb {P} (Y\leq y|X)={\begin{cases}0&{\text{for }}X^{2}\geq 1-y^{2}{\text{ and }}y<0,\\{\frac {1}{2}}+{\frac {1}{\pi }}\arcsin {\frac {y}{\sqrt {1-X^{2}}}}&{\text{for }}X^{2}<1-y^{2},\\1&{\text{for }}X^{2}\geq 1-y^{2}{\text{ and }}y>0.\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/04d00ed049b5101ab16d23494af22230b99990ec)
The expectation of this random variable is equal to the (unconditional) probability,
-
which is an instance of the law of total probability
Interpretation
The conditional probability
cannot be interpreted as
since the latter gives 0/0. Accordingly,
cannot be interpreted via empirical frequencies, since the exact value
has no chance to appear at random, not even once during an infinite sequence of independent trials.
The conditional probability can be interpreted as a limit,
-
Conditional expectation
The conditional expectation
is of little interest; it vanishes just by symmetry. It is more interesting to calculate
treating |
| as a function of
,
:
![{\displaystyle {\begin{aligned}&|Z|=h(X,Y)={\sqrt {1-X^{2}-Y^{2}}}\,;\\&\mathrm {E} (|Z||X=0.5)=\int _{-\infty }^{+\infty }h(0.5,y)f_{Y|X=0.5}(y)\,\mathrm {d} y=\\&=\int _{-{\sqrt {0.75}}}^{+{\sqrt {0.75}}}{\sqrt {0.75-y^{2}}}\cdot {\frac {\mathrm {d} y}{\pi {\sqrt {0.75-y^{2}}}}}={\frac {2}{\pi }}{\sqrt {0.75}}\,.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/332ab261c75f9cb24d875261e0a89bbc9c61b780)
More generally,
![{\displaystyle \mathbb {E} (|Z||X=x)={\frac {2}{\pi }}{\sqrt {1-x^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8b18ee932a336c026e83749032be18b95aaa8a31)
for
One may also treat the conditional expectation as a random variable, — a function of the random variable X, namely,
![{\displaystyle \mathbb {E} (|Z||X)={\frac {2}{\pi }}{\sqrt {1-X^{2}}}\,.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/858b402531c8497a0c9a58fd5b188ca4caa74ba0)
The expectation of this random variable is equal to the (unconditional) expectation of
![{\displaystyle \mathbb {E} (\mathbb {E} (|Z||X))=\int _{-\infty }^{+\infty }\mathbb {E} (|Z||X=x)f_{X}(x)\,\mathrm {d} x=\mathbb {E} (|Z|)\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0247563ab118ce77179f2a1f5d1cca494f644ffd)
namely,
![{\displaystyle \int _{-1}^{+1}{\frac {2}{\pi }}{\sqrt {1-x^{2}}}\cdot {\frac {\mathrm {d} x}{2}}={\frac {1}{2}}\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a7e3ebcee4610e86cb8dfbda190f0fac3bed8c25)
which is an instance of the law of total expectation
The random variable
is the best predictor of
given
. That is, it minimizes the mean square error
on the class of all random variables of the form
Similarly to the discrete case,
for every measurable function
that is one-to-one on
Conditional distribution
Given
the conditional distribution of
, given by the density
is the (rescaled) arcsin distribution; its cumulative distribution function is
![{\displaystyle F_{Y|X=x}(y)=\mathbb {P} (Y\leq y|X=x)={\frac {1}{2}}+{\frac {1}{\pi }}\arcsin {\frac {y}{\sqrt {1-x^{2}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f7726572fe878a4a93b1de609fca083672655b65)
for all
and
such that
The corresponding expectation of
is nothing but the conditional expectation
The mixture of these conditional distributions, taken for all
(according to the distribution of
) is the unconditional distribution of
. This fact amounts to the equalities
![{\displaystyle {\begin{aligned}&\int _{-\infty }^{+\infty }f_{Y|X=x}(y)f_{X}(x)\,\mathrm {d} x=f_{Y}(y)\,,\\&\int _{-\infty }^{+\infty }F_{Y|X=x}(y)f_{X}(x)\,\mathrm {d} x=F_{Y}(y)\,,\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c7c051e7934934e1363d64b587910440fa2ae8f7)
the latter being the instance of the law of total probability mentioned above.
What conditioning is not
On the discrete level conditioning is possible only if the condition is of nonzero probability (one cannot divide by zero). On the level of densities, conditioning on
is possible even though
This success may create the illusion that conditioning is always possible. Regretfully, it is not, for several reasons presented below.
Geometric intuition: caution
The result
mentioned above, is geometrically evident in the following sense. The points
of the sphere
satisfying the condition
are a circle
of radius
on the plane
The inequality
holds on an arc. The length of the arc is 5/6 of the length of the circle, which is why the conditional probability is equal to 5/6.
This successful geometric explanation may create the illusion that the following question is trivial.
- A point of a given sphere is chosen at random (uniformly). Given that the point lies on a given plane, what is its conditional distribution?
It may seem evident that the conditional distribution must be uniform on the given circle (the intersection of the given sphere and the given plane). Sometimes it really is, but in general it is not. Especially,
is distributed uniformly on
and independent of the ratio
thus,
On the other hand, the inequality
holds on an arc of the circle
(for any given
). The length of the arc is 2/3 of the length of the circle. However, the conditional probability is 3/4, not 2/3. This is a manifestation of the classical Borel paradox[1] [2].
"Appeals to symmetry can be misleading if not formalized as invariance arguments." Pollard[3]
Another example. A random rotation of the three-dimensional space is a rotation by a random angle around a random axis. Geometric intuition suggests that the angle is independent of the axis and distributed uniformly. However, the latter is wrong; small values of the angle are less probable.
The limiting procedure
Given an event
of zero probability, the formula
is useless, however, one can try
for an appropriate sequence of events
of nonzero probability such that
(that is,
and
). One example is given above. Two more examples are Brownian bridge and Brownian excursion.
In the latter two examples the law of total probability is irrelevant, since only a single event (the condition) is given. In contrast, in the example above the law of total probability applies, since the event
is included into a family of events
where
runs over
and these events are a partition of the probability space.
In order to avoid paradoxes (such as the Borel's paradox), the following important distinction should be taken into account. If a given event is of nonzero probability then conditioning on it is well-defined (irrespective of any other events), as was noted above. In contrast, if the given event is of zero probability then conditioning on it is ill-defined unless some additional input is provided. Wrong choice of this additional input leads to wrong conditional probabilities (expectations, distributions). In this sense, "the concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible." (Kolmogorov; quoted in [3]).
The additional input may be (a) a symmetry (invariance group); (b) a sequence of events
such that
(c) a partition containing the given event. Measure-theoretic conditioning (below) investigates Case (c), discloses its relation to (b) in general and to (a) when applicable.
Some events of zero probability are beyond the reach of conditioning. An example: let
be independent random variables distributed uniformly on
and
the event "
as
"; what about
Does it tend to 1, or not? Another example: let
be a random variable distributed uniformly on
and
the event "
is a rational number"; what about
The only answer is that, once again, "the concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible." (Kolmogorov, quoted in [3]).
Conditioning on the level of measure theory
Example. Let
be a random variable distributed uniformly on
and
where
is a given function. Two cases are treated below:
and
where
is the continuous piecewise-linear function
![{\displaystyle f_{1}(y)={\begin{cases}3y&{\text{for }}0\leq y\leq 1/3,\\1.5(1-y)&{\text{for }}1/3\leq y\leq 2/3,\\0.5&{\text{for }}2/3\leq y\leq 1,\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/74d4ca89aaa95608eaf5efba7a139e1139230c43)
and
is the everywhere continuous but nowhere differentiable Weierstrass function.
Geometric intuition: caution
In the case
given
two values of
are possible, 0.25 and 0.5. It may seem evident that both values are of conditional probability 0.5 just because one point is congruent to another point. However, this is an illusion; see below.
Conditional probability
The conditional probability
may be defined as the best predictor of the indicator
![{\displaystyle I={\begin{cases}1&{\text{if }}Y\leq 1/3,\\0&{\text{otherwise}},\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bdbed3165113dcf496ec417f92425cafaf97ec6e)
given X. That is, it minimizes the mean square error
on the class of all random variables of the form
In the case
the corresponding function
may be calculated explicitly,[4]
![{\displaystyle g_{1}(x)={\begin{cases}1&{\text{for }}0<x<0.5,\\0&{\text{for }}x=0.5,\\1/3&{\text{for }}0.5<x<1.\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ceae66c9f9fa369a9817d31c753e871ea4daf665)
Alternatively, the limiting procedure may be used,
![{\displaystyle g_{1}(x)=\lim _{\varepsilon \to 0+}\mathbb {P} (Y\leq 1/3|x-\varepsilon \leq X\leq x+\varepsilon )\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4bfdcc1191fa3df15a3ebf2d1d84975c91973085)
giving the same result.
Thus,
The expectation of this random variable is equal to the (unconditional) probability,
namely,
![{\displaystyle 1\cdot \mathbb {P} (X<0.5)+0\cdot \mathbb {P} (X=0.5)+{\frac {1}{3}}\cdot \mathbb {P} (X>0.5)=1\cdot {\frac {1}{6}}+0\cdot {\frac {1}{3}}+{\frac {1}{3}}\cdot {\Big (}{\frac {1}{6}}+{\frac {1}{3}}{\Big )}={\frac {1}{3}}\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9f5c2cf0b41af8b7df4b5c0f5a94c830447d0b6a)
which is an instance of the law of total probability
In the case
the corresponding function
probably cannot be calculated explicitly. Nevertheless it exists, and can be computed numerically. Indeed, the space
of all square integrable random variables is a Hilbert space; the indicator
is a vector of this space; and random variables of the form
are a (closed, linear) subspace. The orthogonal projection of this vector to this subspace is well-defined. It can be computed numerically, using finite-dimensional approximations to the infinite-dimensional Hilbert space.
Once again, the expectation of the random variable
is equal to the (unconditional) probability,
namely,
![{\displaystyle \int _{0}^{1}g_{2}(f_{2}(y))\,\mathrm {d} y={\frac {1}{3}}\,.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0ed5c3635fce552fceef9501ccd3cc746ab52044)
However, the Hilbert space approach treats
as an equivalence class of functions rather than an individual function. Measurability of
is ensured, but continuity (or even Riemann integrability) is not. The value
is determined uniquely, since the point 0.5 is an atom of the distribution of
. Other values
are not atoms, thus, corresponding values
are not determined uniquely. Once again, "the concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible." (Kolmogorov; quoted in [3]).
Alternatively, the same function
(be it
or
) may be defined as the Radon-Nikodym derivative
![{\displaystyle g={\frac {\mathrm {d} \nu }{\mathrm {d} \mu }}\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/81f1874866263b1f4d7dc28f00303686e0b62e4e)
where measures μ, ν are defined by
![{\displaystyle {\begin{aligned}\mu (B)&=\mathbb {P} (X\in B)\,,\\\nu (B)&=\mathbb {P} (X\in B,\,Y\leq 1/3)\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9123c4b753d7b870c1fbe3521d0be62901f691ba)
for all Borel sets
That is, μ is the (unconditional) distribution of
, while ν is one third of its conditional distribution,
![{\displaystyle \nu (B)=\mathbb {P} (X\in B|Y\leq 1/3)\mathbb {P} (Y\leq 1/3)={\frac {1}{3}}\mathbb {P} (X\in B|Y\leq 1/3)\,.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d52abc46ceacde046b4cb48b867acc122863fdfb)
Both approaches (via the Hilbert space, and via the Radon-Nikodym derivative) treat
as an equivalence class of functions; two functions
and
are treated as equivalent, if
almost surely. Accordingly, the conditional probability
is treated as an equivalence class of random variables; as usual, two random variables are treated as equivalent if they are equal almost surely.
Conditional expectation
The conditional expectation
may be defined as the best predictor of
given
. That is, it minimizes the mean square error
on the class of all random variables of the form
In the case
the corresponding function
may be calculated explicitly,[5]
![{\displaystyle h_{1}(x)={\begin{cases}x/3&{\text{for }}0<x<0.5,\\5/6&{\text{for }}x=0.5,\\(2-x)/3&{\text{for }}0.5<x<1.\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/298a1c3a0794112acaae4fcbc29fa0912452d8c8)
Alternatively, the limiting procedure may be used,
![{\displaystyle h_{1}(x)=\lim _{\varepsilon \to 0+}\mathbb {E} (Y|x-\varepsilon \leq X\leq x+\varepsilon )\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c18d58d8444aa481e9671f4cac97635000963fb8)
giving the same result.
Thus,
The expectation of this random variable is equal to the (unconditional) expectation,
namely,
![{\displaystyle {\begin{aligned}&\int _{0}^{1}h_{1}(f_{1}(y))\,\mathrm {d} y=\int _{0}^{1/6}{\frac {3y}{3}}\,\mathrm {d} y+\\&\quad +\int _{1/6}^{1/3}{\frac {2-3y}{3}}\,\mathrm {d} y+\int _{1/3}^{2/3}{\frac {2-1.5(1-y)}{3}}\,\mathrm {d} y+\int _{2/3}^{1}{\frac {5}{6}}\,\mathrm {d} y={\frac {1}{2}}\,,\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/038c5a6f30f51f0d94b50632559515b8a5a608d9)
which is an instance of the law of total expectation
In the case
the corresponding function
probably cannot be calculated explicitly. Nevertheless it exists, and can be computed numerically in the same way as
above, — as the orthogonal projection in the Hilbert space. The law of total expectation holds, since the projection cannot change the scalar product by the constant function 1 belonging to the subspace.
Alternatively, the same function
(be it
or
) may be defined as the Radon-Nikodym derivative
![{\displaystyle h={\frac {\mathrm {d} \nu }{\mathrm {d} \mu }}\,,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2ac7a51d26b57089ef43423ffbe3a20efb9569f7)
where measures
are defined by
![{\displaystyle {\begin{aligned}\mu (B)&=\mathbb {P} (X\in B)\,,\\\nu (B)&=\mathbb {E} (Y,\,X\in B)\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/826480b1c4aea14f2ddf09add752594ecbf25514)
for all Borel sets
Here
is the restricted expectation, not to be confused with the conditional expectation
Conditional distribution
In the case
the conditional cumulative distribution function may be calculated explicitly, similarly to
The limiting procedure gives
![{\displaystyle {\begin{aligned}&F_{Y|X=0.75}(y)=\mathbb {P} (Y\leq y|X=0.75)=\\&=\lim _{\varepsilon \to 0+}\mathbb {P} (Y\leq y|0.75-\varepsilon \leq X\leq 0.75+\varepsilon )=\\&={\begin{cases}0&{\text{for }}-\infty <y<1/4,\\1/6&{\text{for }}y=1/4,\\1/3&{\text{for }}1/4<y<1/2,\\2/3&{\text{for }}y=1/2,\\1&{\text{for }}1/2<y<\infty ,\end{cases}}\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/37a3339c51ad533dc2b328ce5e04e4283079ee3e)
which cannot be correct, since a cumulative distribution function must be right-continuous!
This paradoxical result is explained by measure theory as follows. For a given
the corresponding
is well-defined (via the Hilbert space or the Radon-Nikodym derivative) as an equivalence class of functions (of
). Treated as a function of
for a given
it is ill-defined unless some additional input is provided. Namely, a function (of
) must be chosen within every (or at least almost every) equivalence class. Wrong choice leads to wrong conditional cumulative distribution functions.
A right choice can be made as follows. First,
is considered for rational numbers
only. (Any other dense countable set may be used equally well.) Thus, only a countable set of equivalence classes is used; all choices of functions within these classes are mutually equivalent, and the corresponding function of rational
is well-defined (for almost every
). Second, the function is extended from rational numbers to real numbers by right continuity.
In general the conditional distribution is defined for almost all
(according to the distribution of
), but sometimes the result is continuous in
, in which case individual values are acceptable. In the considered example this is the case; the correct result for
![{\displaystyle {\begin{aligned}&F_{Y|X=0.75}(y)=\mathbb {P} (Y\leq y|X=0.75)=\\&={\begin{cases}0&{\text{for }}-\infty <y<1/4,\\1/3&{\text{for }}1/4\leq y<1/2,\\1&{\text{for }}1/2\leq y<\infty \end{cases}}\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d473edf81265ac8f7dacfeabcd013cc555bcacbd)
shows that the conditional distribution of
given
consists of two atoms, at 0.25 and 0.5, of probabilities 1/3 and 2/3 respectively.
Similarly, the conditional distribution may be calculated for all
in
or
The value
is an atom of the distribution of
, thus, the corresponding conditional distribution is well-defined and may be calculated by elementary means (the denominator does not vanish); the conditional distribution of
given
is uniform on
Measure theory leads to the same result.
The mixture of all conditional distributions is the (unconditional) distribution of
.
The conditional expectation
is nothing but the expectation with respect to the conditional distribution.
In the case
the corresponding
probably cannot be calculated explicitly. For a given
it is well-defined (via the Hilbert space or the Radon-Nikodym derivative) as an equivalence class of functions (of
). The right choice of functions within these equivalence classes may be made as above; it leads to correct conditional cumulative distribution functions, thus, conditional distributions. In general, conditional distributions need not be atomic or absolutely continuous (nor mixtures of both types). Probably, in the considered example they are singular (like the Cantor distribution).
Once again, the mixture of all conditional distributions is the (unconditional) distribution, and the conditional expectation is the expectation with respect to the conditional distribution.
Notes
- ↑ Pollard 2002, Sect. 5.5, Example 17 on page 122
- ↑ Durrett 1996, Sect. 4.1(a), Example 1.6 on page 224
- ↑ 3.0 3.1 3.2 3.3 Pollard 2002, Sect. 5.5, page 122
- ↑
Proof:
![{\displaystyle {\begin{aligned}&\mathbb {E} (I-g(X))^{2}=\\&=\int _{0}^{1/3}(1-g(3y))^{2}\,\mathrm {d} y+\int _{1/3}^{2/3}g^{2}(1.5(1-y))\,\mathrm {d} y+\int _{2/3}^{1}g^{2}(0.5)\,\mathrm {d} y=\\&=\int _{0}^{1}(1-g(x))^{2}{\frac {\mathrm {d} x}{3}}+\int _{0.5}^{1}g^{2}(x){\frac {\mathrm {d} x}{1.5}}+{\frac {1}{3}}g^{2}(0.5)=\\&={\frac {1}{3}}\int _{0}^{0.5}(1-g(x))^{2}\,\mathrm {d} x+{\frac {1}{3}}g^{2}(0.5)+{\frac {1}{3}}\int _{0.5}^{1}((1-g(x))^{2}+2g^{2}(x))\,\mathrm {d} x\,;\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/aeeec78a17b7ec00902d9e615796755ddc8c0efd)
it remains to note that
is minimal at
- ↑
Proof:
it remains to note that
is minimal at
and
is minimal at
References
- Durrett, Richard (1996), Probability: theory and examples (Second ed.)
- Pollard, David (2002), A user's guide to measure theoretic probability, Cambridge University Press