Measure Theory and Sigma Algebras

Table of contents

  1. Prelude on Sets
  2. Motivation for Sigma-Algebras and Measurability
  3. Measures
    1. Lebesgue Measure
  4. Sigma Algebras
    1. Generators and the Generation of Sigma-Algebras
  5. Algebras, Monotone Classes, and Pi, and Lambda Systems
    1. Monotone Class and Monotone Class Theorem
  6. Extension of Measures
    1. Caratheodory’s Extension Theorem
    2. Technical Remark of Caratheodory’s Extension Results
  7. Measurable Functions
    1. Induced Measures

$\newcommand{\reals}{\mathbb{R}}$ $\newcommand{\pr}{\mathbb{P}}$ $\newcommand{\cv}[1]{\mathcal{#1}}$

Prelude on Sets

A set is nothing but a collection of objects. In probability theory, these objects are often times events. Furthermore, sets in probability theory, much like other branches of math, are usually constructed in a way such that the objects in the sets are endowed with some sort of logic. Taking an example from analysis, $X = \{x \in \mathbb{R} : x^2 < 2\}$, the set $X$ is precisely all real numbers who square is less than 2. The logical struture endowed on each object in the set is precisely described as follows: each object inside $X$ must be between $-\sqrt{2}$ and $\sqrt{2}$ and cannot be exactly $\pm \sqrt{2}$. In this sense, a set is constructed to define a logical structure endowed upon all points within the set.

In a more advanced manner, we can consider the set of all open intervals in $\mathbb{R}$. In written form, $B := \{(a,b) : -\infty \leq a < b \leq \infty\}$. Every point in the set $\mathcal{B}$ is an interval (not a number). (The $\sigma$-algebra generated by this set $\mathcal{B} = \sigma(B)$ is called the Borel algebra on the real numbers - more on this later.) For now, the key takeaway is that where a set is defined, then a logic is defined.

Motivation for Sigma-Algebras and Measurability

In probability, one fundamental object of interest is the outcome space denoted by $\Omega$. This is the set of all possible observable outcomes. Any subset of $\Omega$ is called an event For example, in flipping a coin twice, the outcome space is

\[\Omega := \{HH,HT,TH,TT\}\]

This is a countable set of events, and in their entirety, forms the outcome space $\Omega$. Here $\Omega$ has a cardinality of 4, and it is a finite outcome space. Things are easy to calculate here. For example $\mathbb{P}(\text{at least one head}) = \frac{\|\{HH,HT,TH\}\|}{\|\Omega\|} = 3/4$. That is, the “size” of the event “at least one head” is 75 percent of the space of all possible events. Essentially we normalize the measurement of the event of interest to the measurement of the size of the whole outcome space.

In the example above, the outcome space is finite, and all possible subsets of the outcome space is enumerable and we can assign probabilities to each event in $2^\Omega$. However in an infinite outcome space, it is no longer possible to enumerate all the events of interest with probabilities and since $2^\mathbb{N}$ is clearly uncountable (any $f: 2^\mathbb{N} \to \mathbb{N}$ is a surjection). Instead, we focus our attention to subsets of “interest.” In this context, a set is interesting if these sets can have a probability assigned to it and we wish to evaluate the probability. In any outcome space we would be interested in events as enumerated below:

  1. The empty set, which has probability of 0
  2. If we know a given event exists, its complement of the event should also be of interest
  3. If we know a given set of events exist, their unions should be also of interest

These three qualities completely define all sets that are of interest to us. Of course, you might wonder, what about set intersections, using the coin flipping example, we might be interested in the quantity $\{HH,HT,TH\} \cap \{HT, HT, TT\}$. Namely, this is the intersection of the event where we get at least one head, and the event where we get at least one tail. The intersection of these two events is $\{HT, TH\}$, which is clearly a set of interest, and it has probability $1/2$. Surprisingly, this set intersection example is already covered by the 3 qualities of “interesting” events.

1, 2, and 3 motivate the structure of a $\sigma$-algebra. Recall that the goal of probability is to assign a “size” of these sets of “interest” and the larger the size, the more “probable” the event can occur. In order to understand size and measurability, we should study measures and why a $\sigma$-algebra is critical in the development of a measure.

Measures

A measure function (a measurable function is a different concept), assigns a number to a measurable set. Intuitively, a good measure function should exhibit the following properties in measurable space $(\Omega, \mathcal{F})$:

  1. $\mu : 2^{\mathbb{R}} \to [0,\infty]$, any subset of the real numbers should have a measure
  2. $\mu([a,b]) = b-a$, the measure of an interval is simply its length
  3. $\mu(A) = \mu(A + c)$ for some $c\in\mathbb{R}$, shifting an interval by a constant $c$ does not affect its measure
  4. $\mu(\cup_{n\geq 1} A_n) = \sum_{n\geq 1} A_n$, for a disjoint collection of intervals, the measure of its union should be the sum of each measure

One proposed measure is the outer measure, which attempts to look at the universe of all coverings of sets that can be composed as a union of such compact intervals (we are only considering $\mathbb{R}^1$). That is, by Heine-Borel, for any open covering of $[a,b]$, there is a finite subcover that still covers $[a,b]$. From a review of analysis, it should be clear that a union of compact sets is still compact, thus we can find from any open cover, a finite subcover that covers a union of compact sets. Then the outer measure can be given by:

\[m^*(A) := \inf \left\lbrace \sum_{i=1}^\infty \ell(I_i) : \forall \{I\}_{i>1} \text{ s.t. } A \subseteq \bigcup_{n=1}^\infty I_i \right\rbrace\]

where $I_i$ is a compact interval of the form $[a,b]$ and we define $\ell(I) = b-a$. Intuitively, by taking the outer measure, we are trying to find the “smallest” covering of $A$ such that we still fully cover $A$, and we sum the lengths of the intervals $I_i$ to get the outer measure. The outer measure suffices several of the 4 properties, but assuming any 3 properties will imply that the remaining property cannot hold. A detailed treatement on this is given in Chapter 2 of Sheldon Axler’s Measure, Integration and Real Analysis. However, we do see some interesting properties about the measure function that hold for the outer measure. A key property is monotonicity.

(Note that we can also define an inner measure by finding the size of the “largest” subset within $A$)

Monotonicity of outer measure If $A \subseteq B$ then $m^{*}(A) \leq m^{*}(B)$

Proof : If $A \subseteq B$ then all coverings of $B$ will also cover $A$. So:

\[C_A := \left\lbrace \sum_{i=1}^\infty \ell(I_i) : \forall \{I\}_{i>1} \text{ s.t. } A \subseteq \bigcup_{n=1}^\infty I_i \right\rbrace \subseteq \left\lbrace \sum_{i=1}^\infty \ell(I_i) : \forall \{I\}_{i>1} \text{ s.t. } B \subseteq \bigcup_{n=1}^\infty I_i \right\rbrace =: C_B\]

Then $\inf C_A \leq \inf C_B$ so $m^{*}(A) \leq m^{*}(B)$. $\tag*{∎}$

Lebesgue Measure

To rememdy the issues mentioned above with $m^{*}(\cdot)$, we must relax one of the 4 properties above. Property 2, 3, and 4 are desirable properties of a measure and we cannot give them up. That is, we wish for the length of an interval $[a,b]$ to be $b-a$, to be translational invariant, and to be additive. Then the only property to relax is 1, in the sense that we cannot assign measure to all subsets of $\mathbb{R}$. The natural question to ask is “By how much can we relax property 1?” To begin asking this question, note that there are sets in $2^\mathbb{R}$ that cause $m^{*}(\cdot)$ to disobey the 4 properties of measure. These sets are called non-measurable and trying to assign a measure on these sets is futile. (It is complicated to show the existence of a non-measurable set. One way to do it is to accept the Axiom of Choice and define an equivalence relation on the real numbers that differ by some rational number. This produces the Vitali set.) Instead, we can only define a measure on measurable sets. At the most fundamental level, we can define a measure on $[a,b]$ which is $b-a$ (property 2). This is the Lebesgue measure defined for an interval, thereby defining a measure on the subset of all subsets of $\mathbb{R}$.

So instead of defining a measure on all subsets of $\mathbb{R}$, a measure should be a function $\mu: \mathcal{F} \to [0,\infty]$ where $\mathcal{F}$ is some subset of $2^\mathbb{R}$. It is a function that assigns a positive number to some “nice” set that is a subset of $2^{\mathbb{R}}$. This measure function must satisfy the following properties.

Definition: Measure functions

  1. $\mu(A) \geq 0$ for any $A \in \mathcal{F}$
  2. $\mu(\varnothing) = 0$
  3. If $A_1,…,A_n,…$ are all disjoint, then $\mu(\cup_{n=1}^\infty A_n) = \sum_{n=1}^\infty \mu(A_n)$

Note the first point goes without saying when considering a function with a range of $[0,\infty]$. As mentioned above, a set is measaurable if and only if it is a “nice” set, which are elements of $\mathcal{F}$ over an outcome space $\Omega$. Thus we call $(\Omega, \mathcal{F})$ a measurable space. If a particular measure function $\mu$ is endowed upon this measurable space, then the triple $(\Omega, \mathcal{F}, \mu)$ denotes a measure space.

These “nice” sets are called measurable sets, and these are the sets defined to exist in collections which are called $\sigma$-algebras.

Exercise: Given the measure space $(\Omega, \mathcal{F}, \mu)$, show that for any $B\in\mathcal{F}$, $\mu(B\cap A) = \mu_B(A)$ is a measure function hence $(\Omega, \mathcal{F}, \mu_B)$ forms a measure space.

Sigma Algebras

Points 1, 2, and 3 in Motivation for Sigma-Algebras and Measurability defines the “nice” sets of “interest.” We can translate the characteristics of an “interesting” set into mathematics. That is, a $\sigma$-algebra (for our purposes, this is also a $\sigma$-field or $\sigma$-ring) defined on $\Omega$ is the collection of events $\mathcal{F} \subseteq 2^\Omega$ such that

Definition: $\sigma$-algebra on $\Omega$:

  1. $\varnothing \in \mathcal{F}$
  2. If $A \in \mathcal{F}$ then $A^C \in \mathcal{F}$, (so by 1. we have $\Omega \in \mathcal{F}$)
  3. For any countable (need not be finite) sequence of sets $A_1,…,A_n,…\in \mathcal{F}$, then $\cup_{i=1}^\infty A_i \in \mathcal{F}$

At this point, we should reflect back on the prelude to set theory. A $\sigma$-algebra really is a logical structure, defining what it means to be a set of “interest” or a “nice” set. Specifically, given any set of elements in $\mathcal{F}$ we know their complements are also in $\mathcal{F}$, and their unions are also in $\mathcal{F}$. Again, we call any set in the $\sigma$-algebra to be measurable.

Exercise: Show that countable intersections are closed under $\mathcal{F}$, i.e. if $A_1,…,A_n,…$ then $\cap_{i=1}^\infty A_i \in \mathcal{F}$ (Hint: use De Morgan’s Laws.)

Exercise: Write the $\sigma$-algebra of the outcome space of flipping two coins. (Hint: There should be 16 elements in the resulting sigma algebra.)

As an example consider the outcome space of rolling a six-sided die. The $\sigma$-algebra of events is the power set of $2^{\{1,…,6\}}$ which has $2^6 = 64$ elements. Now lets make the game more complicated, and suppose the computer rolls the dice for us, and it will not tell us the result of the roll. Instead, it will check if the number is 1 or 2 in which it will tell us ‘A’. If the roll was a 3 or 4, the computer will tell us ‘B’. And finally if the roll was a 5 or 6, the computer will tell us ‘C’. So we never observe any numbers, and we only observe the resulting ‘A’, ‘B’, or ‘C’. Our $\sigma$-algebra will look like:

\[\Omega := \{\varnothing, \{1,2\}, \{3,4\}, \{5,6\}, \{1,2,3,4\}, \{3,4,5,6\}, \{1,2,5,6\}, \{1,2,3,4,5,6\} \}\]

You might have noticed, these $\sigma$-algebras have different cardinality, and that the $\sigma$-algebra depends on how we observe outcomes. For example if we record $\{1,2\}$ as the same event (in the previous example, this would be the event of observing ‘A’) or recording $\{1\}, \{2\}$ as two seperate events, the $\sigma$-algebras would be different. Specifically, in the system where we do not differentiate between pairs of numbers, the $\sigma$-algebra has a cardinality of 8. However, if each outcome can be observed, the $\sigma$-algebra would have a cardinality of 64. $\sigma$-algebras convey the idea of “level of information.” Smaller $\sigma$-algebras convey less information.

Denote $\mathcal{A}_1 = \sigma(\{1,2,3,4,5,6\})$ to be the $\sigma$-algebra “generated” by $\{1,2,3,4,5,6\}$. Denote also $\mathcal{A}_2 = \sigma(\{\{1,2\},\{3,4\},\{5,6\}\})$ to be the $\sigma$-algebra “generated” by $\{\{1,2\},\{3,4\},\{5,6\}\}$.

Exercise: Show that $\mathcal{A}_1 \supseteq \mathcal{A}_2$ as defined above. (Hint: Both are $\sigma$-algebras, and it suffices to show all elements in $\mathcal{A}_2$ are in $\mathcal{A}_1$. To be thorough, find an element that is in $\mathcal{A}_2$ but not in $\mathcal{A}_1$.)

If you were successful in the exercise above, then you have shown that one $\sigma$-algebra is “smaller” than the other. Essentially, the scenario that produced the collection of sets of interest, $\mathcal{A}_2$, yields less information about your data generating process to you, when compared to the information from the scenario producing $\mathcal{A}_1$. That is, an observer in the scenario producing $\mathcal{A}_1$ will can in theory measure in all events in $\mathcal{A}_2$ but an observer in the scenario producing $\mathcal{A}_2$ will not have the granularity to observer some events in $\mathcal{A}_1$. If we cannot be granular enough to observe an event, we cannot measure the event. Therefore, a $\sigma$-algebra tells us everything we need to know about a game, system, or scenario, at least with respect to measurablility.

Generators and the Generation of Sigma-Algebras

Clearly, we have seen that granularity is important when creating a $\sigma$-algebras. Thus far, we have described a situation first, and then described the resulting $\sigma$-algebra as sets to be measured. But as previously mentioned, we only need to consider the $\sigma$-algebra when dealing with measurability. Suppose we know a situtation or scenario with an outcome space $\mathcal{X}$ which is part of a larger outcome space $\Omega$. It would be useful to consider a $\sigma$-algebra that only includes events that can be deduced from sets in $\mathcal{X}$, while ignoring events that can be deduced from some alternative subset of $\Omega$. To do so, we generate a $\sigma$-algebra using $\mathcal{X}$. Specifically, this means:

Definition: Sigma-Algebra generation Let $\mathcal{F}_n$ denote a $\sigma$-algebra that contains $\mathcal{X}$, and we index each $\sigma$-algebra with $n$. So for all $n$, the $\sigma$-algebra $\mathcal{F}_n$ contains $\mathcal{X}$. Then the $\sigma$-algebra generated by $X$ is defined as:

\[\sigma(X) = \cap \mathcal{F}\_n\]

That is, we intersect all $\sigma$-algebras containing $\mathcal{X}$. The result is the generated $\sigma$-algebra. It is also called the smallest $\sigma$-algebra contianing $\mathcal{X}$ as $\sigma(X) \subseteq \mathcal{F}_n$ for any $\mathcal{F}_n$ - $\sigma$-algebras containing $\mathcal{X}$.

Recall from above, we defined a Borel $\sigma$-algebra $\mathcal{B}$ to be the $\sigma$-algebra generated by the open intervals in $\mathbb{R}$ also known as $B := \{(a,b) : -\infty \leq a < b \leq \infty\}$. Some examples of sets that are in $\mathcal{B}$:

  1. $(1,2) \cap (4,5) \in \mathcal{B}$ since $(1,2)$ and $(4,5)$ are each in $B$ and $\mathcal{B}$ is closed under intersection
  2. $(1,2] \in \mathcal{B}$ since we can take $(1,2+1/n)$, which is in $B$, no matter how large or small $n$ is. And since $2$ is always in $(1,2+1/n)$ for all $n$, then taking intersections over all $n\in\mathcal{N}$, the resulting set is $(1,2]$. As $\mathcal{B}$ should be closed under countable intersection, then $(1,2]$ must be in $\mathcal{B}$
  3. $\{2\} \in \mathcal{B}$ since we can take $(2-1/n, 2+1/n)$ and by the same logic as above and by closure under countable intersection, $\{2\}\in\mathcal{B}$

Exercise: Show that $\mathcal{B}$ as generated by $B$, can also be generated by the following set: $B_\mathbb{Q} := \{(a,b): -\infty \leq a < b \leq \infty\, \forall a,b\in\mathbb{Q}\}$. (Hint: The real numbers are a completion of the rational numbers.)

Exercise: Show that $\mathcal{B}$ as generated by $B$, can also be generated by the following set: $B_C := \{(a,b]: -\infty \leq a < b \leq \infty\}$.

Exercise: Show that if $X \subseteq Y$ then $\sigma(X) \subseteq \sigma(Y)$. (Hint: Try to show $X$ is in $\sigma(Y)$ first.)

Exercise: Show that $\sigma(X)$ is indeed still a $\sigma$-algebra. (Hint: Check the three conditions of a $\sigma$-algebra.)

Algebras, Monotone Classes, and Pi, and Lambda Systems

The notion of a $\sigma$-algebra is rather strong in that all countably infinite sequences of sets must be closed under countable unions in the $\sigma$-algebra. If we relax this rule, and instead impose closure under finitely many unions, then we have an algebra. So:

Definition: Algebra on $\Omega$:

  1. $\varnothing \in \mathcal{F}$
  2. If $A \in \mathcal{F}$ then $A^C \in \mathcal{F}$, (so by 1. we have $\Omega \in \mathcal{F}$)
  3. For any countably finite sequence of sets $A_1,…,A_n \in \mathcal{F}$, then $\cup_{i=1}^n A_i \in \mathcal{F}$

In the case of the finite outcome space $\Omega$, the $\sigma$-algebra on $\Omega$ is the same as the algebra on $\Omega$. However in an infinite outcome space, The algebra is not the same as the $\sigma$-algebra. Try and come up with an example in the following exercise.

Exercise: Let $\mathcal{A}_1 \subseteq … \subseteq \mathcal{A}_n \subseteq …$. Show that $\cap_{i=1}^N \mathcal{A}_i$ for a finite $N$ forms a $\sigma$-algebra, and find a counter example to show that $\cap_{i=1}^\infty \mathcal{A}_i$ is not a $\sigma$-algebra. (Hint: Consider a generating $\sigma$-algebra on specific open intervals, where for each index $n$, we partition the generating set of open intervals to further subintervals. Use $\Omega = [0,1]$ for simplicity.)

These logical structures of algebras are less restrictive than the $\sigma$-algebras. These “weaker” structures when coupled with the idea of $\sigma$-algebra generation on these weaker structures, yield powerful results. To foreshadow the power of generation on simpler logical structures, consider the following example, which is, we may show that two measure functions $\mu, \lambda$ that agree on an algebra will also agree on a $\sigma$-algebra generated on said algebra (we may make the same statement for generation on a $\pi$-systems too). To make sense of this example, we need to introduce and discuss some properties of these simpler structures. The following introductions may seem unmotivated and unconnected at first, but rest assured that they will be useful very soon. Treat the following terms as a list of definitions for now. First, we introduce a $\pi$-system which is a logical structure defined as:

Definition: $\pi$-system $P$ on $\Omega$:

  1. $P$ is non-empty
  2. If $A, B \in P$ then $A \cap B \in P$

A $\pi$-system can also be generated, for example $\pi(\{A,B\})$ is the intersection of all $\pi$-systems that contain the sets $\{A,B\}$. $\pi(\{A,B\}) = \{A,B, A\cap B\}$.

Now we introduce a $\lambda$-system which is sometimes called a Dynkin system named after Eugene Dynkin, another Soviet mathematician.

Definition: $\lambda$-system $L$ on $\Omega$:

  1. $L$ contains the empty set
  2. If $A, B \in L$ then $A \cap B \in L$
  3. If $A_1,…,A_n,…$ are disjoint then $\cup_{i=1}^\infty A_i \in L$

Likewise, a $\lambda$-system can be generated. $\lambda(\{A,B\})$ is the intersection of all $\lambda$-systems containing the sets $\{A,B\}$. This is not a very illustrative example since we must have closure under nested increasing sets in $\lambda$ which is impossible to write out in this manner. Do not confuse $\lambda(\cdot)$ as a measure function with $\lambda(\cdot)$ as a generator. In this context, it is clear we are talking about a $\lambda$-system generated by some set. Typically, $\lambda(\cdot)$ refers to a measure function.

Exercise: Enumerate $\lambda(\{A,B\})$ and justify why each element in this $\lambda$-system should be in the collection. Assume subsets $A, B$ are both in $\Omega$.

Exercise: Show that a $\lambda$-system that is also a $\pi$-system is a $\sigma$-algebra

Monotone Class and Monotone Class Theorem

Definition: Monotone class $\mathcal{M}$:

  1. If $A_1 \subseteq … \subseteq A_n \subseteq …$ then $\cup_{i=1}^\infty A_i \in \mathcal{M}$
  2. If $A_1 \supseteq … \supseteq A_n \supseteq …$ then $\cap_{i=1}^\infty A_i \in \mathcal{M}$

Likewise, monotone classes can be generated, and $\mathcal{M}(X)$ is defined as the intersection of all monotone classes containing $X$.

Exercise: Show that if $A$ is an algebra, then $\mathcal{M}(A)$ is a $\sigma$-algebra.

Exercise (Monotone Class Theorem): Furthermore, show that $\mathcal{M}(A) = \sigma(A)$. (Hint: showing $\sigma(A) \subseteq \mathcal{M}(A)$ is straightforward and follows from the previous exercise. To show $\mathcal{M}(A) \subseteq \sigma(A)$, collect sets in $\mathcal{M}(A)$ such that the collection is closed under intersection to show that $\mathcal{M}(A)$ is closed under intersections, making it a $\pi$-system. Then collect sets in $\mathcal{M}(A)$ such that properties of the $\lambda$-system hold. Thus $\mathcal{M}(A) \subseteq \sigma(A)$. Refer to the “principle of good sets”).

The above exercise tells us that $\mathcal{M}(A) = \sigma(A)$. Thus if we want to show some property is true for all elements of $\sigma(A)$, it suffices to consider the property on $\mathcal{M}(A)$ where we only need to check for closure under nested unions and intersections, provided $A$ is an algebra of sets.

The definitions of the collections and logical structures provide a powerful mechanism to help us understand how to “extend” a measure, as we have not yet properly defined a measure function on a $\sigma$-algebra.

Extension of Measures

So far, we have not properly defined what it means to measure a set in $\mathcal{B}$, the Borel $\sigma$-algebra. $[a,b]$ is surely in $\mathcal{B}$ and we defined the measure of the interval to be $\tilde{\lambda}([a,b]) = \ell([a,b]) = b-a$. (Here, $\tilde{\lambda}(\cdot)$) denotes the Lebesgue measure defined on intervals $[a,b]$.) Note that $[a,b] \cap [c,d]$ will also form a compact interval if not empty. If you do not believe this, try all cases while maintaining $a<b$ and $c<d$. For example, when $c<a<d<b$ the intersection of intervals gives $[a,d]$. This should ring a bell, the logical structure for which intervals where $\tilde{\lambda}([a,b]) = \ell([a,b]) = b-a$ forms a $\pi$-system. The following result is critical in developing the Lebesgue measure.

Caratheodory’s Extension Theorem

More generally speaking, knowing that two measures agree on a $\pi$-system is sufficient to say that the two measures will also agree on a $\sigma$-algebra. This is known as Caratheodory’s Extension Theorem, named after Constantin Caratheodory. His extension theorem effectively states that if $\tilde{\lambda}$ is defined on an algebra $A$, then there exists another measure function $\lambda$ defined on sets in $\sigma(A)$ such that $\tilde{\lambda}=\lambda$ on $A$. The statement of this theorem is far more important than its proof. In effect, this says that so long as we can define a measure function on an algebra or a $\pi$-system, then we know there exists another measure function defined on the generated $\sigma$-algebra, and these measures coincide when measuring sets in the algebra or $\pi$-system. Furthermore, the extended measure is unique.

Here is a sketch of the proof of the uniqueness of the extension theorem which is sometimes called Dynkin $\pi-\lambda$ theorem.

Proof : Consider the following:.

\[\Pi := \{A \in \mathcal{A} : \lambda(A) = \tilde{\lambda}(A)\}\, \mathcal{A} \text{ is a }\pi\text{-system containing }\Omega\]

Clearly $\Pi \subseteq \mathcal{A}$. We first show that $\Pi$ is a $\lambda$-system. $\Omega \in \Pi$ since $\lambda(\Omega) = \tilde{\lambda}(\Omega)$. If $A, B \in \Pi$ then $\lambda(A\setminus B) = \lambda(A) - \lambda(A \cap B) = \tilde{\lambda}(A) - \tilde{\lambda}(A\cap B) = \tilde{\lambda}(A \setminus B)$ thus $A \setminus B \in \Pi$ (as we have agreement on a $\pi$-system). Finally, if disjoint sequence $A_1,…, A_n, … \in \Pi$ then $\lambda(\cup A_n) = \tilde{\lambda}(\cup A_n) = \sum \lambda(A_n) = \sum \tilde{\lambda}(A_n)$ by equivalence of measures. Thus $\Pi$ is a $\lambda$-system and it is a $\lambda$-system that obviously contains $\Pi$. Thus $\lambda(\mathcal{A}) \subseteq \Pi$ as the generated system is always smaller than another other system of the same type. This means, the collection of sets with agreeing measures form at least a $\lambda$-system generated by $\mathcal{A}$.

Now we show that $\Pi$ is also a $\pi$ system. Let $B$ be another set in the $\pi$-system that includes $\Omega$ as defined above as $\mathcal{A}$. Define the collection:

\[\mathcal{D} := \{A \in \lambda(\mathcal{A}) : (A\cap B) \in \lambda(\mathcal{A})\}\subseteq \lambda(\mathcal{A})\]

We will show that $\mathcal{D}$ is a $\lambda$-system of $\mathcal{A}$ that contains $\lambda(\mathcal{A})$ meaning it is exactly $\lambda(\mathcal{A})$ ($\mathcal{D} \subseteq \supseteq \lambda(\mathcal{A})$). Further, it is closed under intersection. This shows that $\lambda(\mathcal{A})$ is also a $\pi$-system and thus a $\sigma$-algebra containing $\mathcal{A}$. First, we freely accept $\Omega \in \mathcal{D}$ as $\Omega \in \mathcal{A}$ and $\Omega \cap B = B$. Second, if $X,Y \in \mathcal{D}$ then $X \cap B \in \lambda(\mathcal{A})$ and $Y\cap B \in \lambda(\mathcal{A})$. Because of that, $X \cap B \cap (Y \cap B)^C \in \lambda(\mathcal{A})$ since it is a $\lambda$-system. Then $(X\cap B \cap Y^C) \cup (X\cap B \cap B^C) = X \cap (Y^C \cap B) = X \setminus Y \cap B \in \lambda(\mathcal{A})$ therefore making $X\setminus Y \in \mathcal{D}$. Third, if we have a disjoint sequence of $X_1,…,X_n,… \in \mathcal{D}$ then $X_k \cap B \in \lambda(\mathcal{A}) \forall k$ and it is easy to show that $(\cup_k X_k)\cap B \in \lambda(\mathcal{A})$ so $(\cup_k X_k) \in \mathcal{D}$.

Thus $\mathcal{D}$ is a $\lambda$-system containing $\lambda(\mathcal{A})$. So $\mathcal{D} \supseteq \lambda(\mathcal{A})$. But since $\mathcal{D} \subseteq \lambda(\mathcal{A})$ by definition, then $\mathcal{D} = \lambda(\mathcal{A})$. Therefore, $\lambda(\mathcal{A})$ is also a $\pi$-system as it is closed under finite intersections. By the $\pi$-$\lambda$ theorem, $\lambda(\mathcal{A})\supset \sigma(\mathcal{A})$ since the $\lambda(\mathcal{A})$ is also $\sigma$-algebra that contains $\mathcal{A}$. From above, $\Pi \supseteq \lambda(\mathcal{A}) \supset \sigma(\mathcal{A})$ so any element in $\sigma(\mathcal{A})$ is also in $\Pi$. This means that any element in $\sigma(\mathcal{A})$ will produce agreement in measure between the $\lambda$ and $\tilde{\lambda}$ measure functions, as being an element of $\Pi$ means the two measures on the element of $\Pi$ should agree. $\tag*{∎}$

In essense, the proof followed the following steps:

  1. Define a logic on a set of events called $\Pi$, namely the logic is that $\Pi$ contains events where the $\lambda(\cdot)$ and $\tilde{\lambda}(\cdot)$ measures agree, and the events come from a $\pi$-system $\mathcal{A}$
  2. Show that the set of events is a $\lambda$-system containing $\mathcal{A}$ therefore it is at least as large as the $\lambda$-system generated by the $\mathcal{A}$
  3. Show that the $\lambda$-system that contains a $\pi$-system is also a $\pi$-system (by defining $\mathcal{D}$ and using the principle of good sets)
  4. Step 3, if successful will imply that $\mathcal{D}$ is exactly the $\lambda$-system containing $\mathcal{A}$.
  5. By Dynkin’s $\pi-\lambda$ theorem, we show that $\mathcal{D}$ is is also a $\sigma$-algebra containing $\mathcal{A}$, therefore it is a $\sigma$-algebra at least as large as $\sigma(\mathcal{A})$
  6. Thus $\Pi \supseteq \mathcal{D} \supseteq \sigma(\mathcal{A})$ therefore all sets in the $\sigma(\mathcal{A})$ must also be in $\Pi$, the set of all items that have agreeing measures $\lambda$ and $\tilde{\lambda}$.

A critical implication of this theorem is that we are now able to generalize our measure function $\lambda$ to a $\sigma$-algebra, by solely defining the measure on a $\pi$-system. This provides a mechanism to extend the Lebesgue measure on intervals, to the $\sigma$-algebra generated by intervals. In the context of probability theory, we can define a probability measure on a $\pi$-system and we will be able to extend this measure to all events in a $\sigma$-algebra generated from said $\pi$-system.

Technical Remark of Caratheodory’s Extension Results

In some situations, we are unable to fully define a measure for every possible set in a $\sigma$-algebra as they may be very large. In many cases, it is futile to try and enumerate all elements in a $\sigma$-algebra. If we take the Borel $\sigma$-algebra of $[0,1]$ there are an uncountably many different sets in the $\sigma$-algebra. However, if we do know how to define a measure $\tilde{\lambda}$ on a $\pi$-system $\mathcal{A}$, a measure exists on sets in $\sigma(\mathcal{A})$ and will agree with $\tilde{\lambda}$ as also measured on sets in $\sigma(\mathcal{A})$. This implies the existence of a measure on a $\sigma$-algebra and its uniqueness.

As taken from Wikipedia: “If this result does not seem very remarkable, consider the fact that it usually is very difficult or even impossible to fully describe every set in the 𝜎-algebra, and so the problem of equating measures would be completely hopeless without such a tool.”

Exercise: Show that any Borel set $B$ in $\mathbb{R}$ can be expressed as a $S \subset B \subset T$ where $\lambda(T\setminus B) < \varepsilon$ and $\lambda(B\setminus S) < \delta$ for any $\varepsilon, \delta > 0$ and $S, T$ are disjoint countable unions of compact sets in $\mathbb{R}$.

Measurable Functions

Once the idea of measurable spaces is established, we are ready to talk about measurable functions. A general function is simply a map from one space to another space. We call $f$ a measurable function if the pre-image of a measurable set $B \in \mathcal{B}$ is in $\mathcal{F}$ ($\mathcal{F}-measurable$), the $\sigma$-algebra defined over the domain of $f$. In math, this means that for a function $f: X \to Y$ to be measurable, where we endow $\sigma$-algebras onto $X$ and $Y$, i.e. $(X, \mathcal{A})$ and $(Y, \mathcal{B})$ are general measure spaces, $f$ is measurable if

\[\forall B \in \sigma(Y), \{A : f(A) = B\} \in \mathcal{A}\]

That is, the preimage of any measurable set with respect to $\mathcal{B}$ is going to be measurable with respect to $\mathcal{A}$. One example of a measurable function is a function $X$ mapping a measurable outcome space $(\Omega, \mathbb{F})$ to $(\mathbb{R}, \mathcal{B})$, called a random variable. The sole requirement is that $X$ must be measurable. In some texts, they refer to $X$ as being $\mathcal{F}/\mathcal{B}$-measurable.

To show a function is $\mathcal{A}/\mathcal{B}$-measurable, and $\mathcal{B}$ is generated by $B$, then it suffices to show that the preimage of all sets in $B$ (the generating set) is an element of $\mathcal{A}$. This is another critical result showing that we only need to consider the generating set to establish measurability. Formally stated, if $T: \mathcal{A} \to \mathcal{B}$ where $\mathcal{B} := \sigma(\mathcal{F})$ is generated by $\mathcal{F}$, then if $T^{-1}(F) \in \mathcal{A}$ for all $F \in \mathcal{F}$, $T$ is $\mathcal{A}/\mathcal{F}$-measurable.

Proof : Consider the set $\mathcal{G} := \{F \in \mathcal{F} : T^{-1}(F) \in \mathcal{A}\}$. This $\mathcal{G}$ collects all measurable sets in the image of $T$ that has a pre-image in $\mathcal{A}$, the measurable sets in the domain. If we can show $\mathcal{G}$ is also a $\sigma$-algebra, it will be one that contains $\mathcal{F}$. So if the collection of measurable images of $T$, given by $\mathcal{F}$, have preimages in $\mathcal{A}$, then the collection of measurable images of $T$ is also a $\sigma$-algebra that is at least as large as $\sigma(\mathcal{F})$ - making $T$ measurable.

First, $T^{-1}(\varnothing) = \varnothing \in \mathcal{A}$ so $\varnothing \in \mathcal{G}$. Second, if $A\in\mathcal{G}$, then $T^{-1}(A) \in \mathcal{A}$. As $\mathcal{A}$ is a $\sigma$-algebra, $(T^{-1}(A))^C \in \mathcal{A}$ and by property of pre-images $T^{-1}(A^C) \in \mathcal{A}$ implying $A^C \in \mathcal{G}$. If $A_1,…,A_n,…$ are in $\mathcal{G}$ then $T^{-1}(A_n) \in \mathcal{A}$ for all $n$. As $\mathcal{A}$ is a $\sigma$-algebra, $\cup T^{-1}(A_n) \in \mathcal{A}$ and by property of pre-images, it is equivalent to state $T^{-1}(\cup A_n) \in \mathcal{A}$. Thus $\cup A_n \in \mathcal{G}$. Therefore $\mathcal{G}$ is a $\sigma$-algebra containing $\mathcal{F}$ so $\mathcal{G} \supseteq \sigma(\mathcal{F}) = \mathcal{B}$.

So the set of all images in the generating set that have a measurable pre-image is a $\sigma$-algebra if the function is measurable. This means that checking domain measurability of the preimages of the generating set is equivalent to checking domain measurability of the preimages of the image $\sigma$-algebra. Thus $T$ is measurable provided that its generating set has a pre-image that is measurable with respect to the domain $\sigma$-algebra. $\tag*{∎}$

As an example, suppose we have an outcome space $\Omega$ of tossing two coins and a $\sigma$-algebra generated $\sigma(\Omega)$. Denote $Y: (\Omega, \sigma(\Omega)) \to (\{0,1,2\}, \sigma(\{0,1,2\}))$. If we take any element in the generating $\{0,1,2\}$, note it maps to some event in $\sigma(\Omega)$. This makes $Y$ a measurable random variable, since checking measurability of the pre-image of the generating set is enough. The checking is left as an exercise to the reader.

Exercise: Show that a continuous function $f: \mathbb{R} \to \mathbb{R}$ is measurable $\mathcal{B}/\mathcal{B}$-measurable.

Exercise: Given two measurable functions $f: (X, \sigma(X)) \to (Y, \sigma(Y))$ and $g: (Y, \sigma(Y)) \to (Z, \sigma(Z))$, consider the composition $f\circ g = g(f) : (X,\sigma(X))\to (Z,\sigma(Z))$ is $\sigma(X)/\sigma(Z)$ measurable.

Exercise: Let $X_n :(\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B})$ be a sequence of measurable functions and $X_n$ converges to $X$. Show that $X$ is $\mathcal{F}/\mathcal{B}$-measurable.

Induced Measures

Suppose we have a random variable $X(\omega) : (\Omega, \cv{F}, \pr) \to (\reals, \cv{B})$ defined on a probability space $(\Omega, \cv{F}, \pr)$. To say $\pr(X \in A)$ for some set $A \in \cv{B}$, we are really taking the probability measure of $\{\omega : X(\omega) = A\}$ events. This can be written as $\{X^{-1}(A)\}$ so $\pr(X \in A) = \pr(X^{-1}(A))$. Note that this is simply a composition of functions, $(X^{-1}\circ \pr)(A)$. $X^{-1} \circ \pr$ (we can write it as $\lambda$) is also a measure, specifically a measure on sets in $(\reals, \cv{B})$, since we arrived at this measure from $\pr(X^{-1}(A))$. We call this an induced measure by $X$ and now the image of the random variable has an induced measure space $(\reals, \cv{B}, \lambda)$.

Exercise: Given random variable $X : (\Omega, \cv{F}, \pr) \to (\reals, \cv{B})$, show that the induced measure on $(\reals, \cv{B})$ via $X$ is also a measure.

At this point, we now rest the measure theory and in the next part, we will begin talking about probability as a measure function and random variables as a measurable function.