Theory of functions of a real variable. Shlomo Sternberg May 10, 2005  2 Introduction. I have taught the beginning graduate course in real variables and functional analysis three times in the last five years, and this book is the result. The course assumes that the student has seen the basics of real variable theory and point set topology. The elements of the topology of metrics spaces are presented (in the nature of a rapid review) in Chapter I. The course itself consists of two parts: 1) measure theory and integration, and 2) Hilbert space theory, especially the spectral theorem and its applications. In Chapter II I do the basics of Hilbert space theory, i.e. what I can do without measure theory or the Lebesgue integral. The hero here (and perhaps for the first half of the course) is the Riesz representation theorem. Included is the spectral theorem for compact self-adjoint operators and applications of this theorem to elliptic partial differential equations. The pde material follows closely the treatment by Bers and Schecter in Partial Differential Equations by Bers, John and Schecter AMS (1964) Chapter III is a rapid presentation of the basics about the Fourier transform. Chapter IV is concerned with measure theory. The first part follows Caratheodory's classical presentation. The second part dealing with Hausdorff measure and di- mension, Hutchinson's theorem and fractals is taken in large part from the book by Edgar, Measure theory, Topology, and Fractal Geometry Springer (1991). This book contains many more details and beautiful examples and pictures. Chapter V is a standard treatment of the Lebesgue integral. Chapters VI, and VIII deal with abstract measure theory and integration. These chapters basically follow the treatment by Loomis in his Abstract Har- monic Analysis. Chapter VII develops the theory of Wiener measure and Brownian motion following a classical paper by Ed Nelson published in the Journal of Mathemat- ical Physics in 1964. Then we study the idea of a generalized random process as introduced by Gelfand and Vilenkin, but from a point of view taught to us by Dan Stroock. The rest of the book is devoted to the spectral theorem. We present three proofs of this theorem. The first, which is currently the most popular, derives the theorem from the Gelfand representation theorem for Banach algebras. This is presented in Chapter IX (for bounded operators). In this chapter we again follow Loomis rather closely. In Chapter X we extend the proof to unbounded operators, following Loomis and Reed and Simon Methods of Modern Mathematical Physics. Then we give Lorch's proof of the spectral theorem from his book Spectral Theory. This has the flavor of complex analysis. The third proof due to Davies, presented at the end of Chapter XII replaces complex analysis by almost complex analysis. The remaining chapters can be considered as giving more specialized in- formation about the spectral theorem and its applications. Chapter XI is de- voted to one parameter semi-groups, and especially to Stone's theorem about the infinitesimal generator of one parameter groups of unitary transformations. Chapter XII discusses some theorems which are of importance in applications of  3 the spectral theorem to quantum mechanics and quantum chemistry. Chapter XIII is a brief introduction to the Lax-Phillips theory of scattering.  It,  Contents 1 The topology of metric spaces 13 1.1 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Completeness and completion.. . . . . . . . . . . . . . . . . . . . 16 1.3 Normed vector spaces and Banach spaces. . . . . . . . . . . . . . 17 1.4 Compactness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5 Total Boundedness.. . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6 Separability.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Second Countability. . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.8 Conclusion of the proof of Theorem 1.5.1. . . . . . . . . . . . . . 20 1.9 Dini's lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.10 The Lebesgue outer measure of an interval is its length. . . . . . 21 1.11 Zorn's lemma and the axiom of choice. . . . . . . . . . . . . . . . 23 1.12 The Baire category theorem. . . . . . . . . . . . . . . . . . . . . 24 1.13 Tychonoff's theorem. . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.14 Urysohn's lemma.. . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.15 The Stone-Weierstrass theorem.. . . . . . . . . . . . . . . . . . . 27 1.16 Machado's theorem. . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.17 The Hahn-Banach theorem. . . . . . . . . . . . . . . . . . . . . . 32 1.18 The Uniform Boundedness Principle. . . . . . . . . . . . . . . . . 35 2 Hilbert Spaces and Compact operators. 37 2.1 Hilbert space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.1 Scalar products. . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.2 The Cauchy-Schwartz inequality. . . . . . . . . . . . . . . 38 2.1.3 The triangle inequality . . . . . . . . . . . . . . . . . . . . 39 2.1.4 Hilbert and pre-Hilbert spaces. . . . . . . . . . . . . . . . 40 2.1.5 The Pythagorean theorem. . . . . . . . . . . . . . . . . . 41 2.1.6 The theorem of Apollonius. . . . . . . . . . . . . . . . . . 42 2.1.7 The theorem of Jordan and von Neumann. . . . . . . . . 42 2.1.8 Orthogonal projection. . . . . . . . . . . . . . . . . . . . . 45 2.1.9 The Riesz representation theorem. . . . . . . . . . . . . . 47 2.1.10 What is L2(T)?. . . . . . . . . . . . . . . . . . . . . . . . 48 2.1.11 Projection onto a direct sum. . . . . . . . . . . . . . . . . 49 2.1.12 Projection onto a finite dimensional subspace.. . . . . . . 49 5  6 CONTENTS 2.1.13 Bessel's inequality. . . . . . . . . . . . . . . . . . . . . . . 49 2.1.14 Parseval's equation. . . . . . . . . . . . . . . . . . . . . . 50 2.1.15 Orthonormal bases. . . . . . . . . . . . . . . . . . . . . . 50 2.2 Self-adjoint transformations.. . . . . . . . . . . . . . . . . . . . . 51 2.2.1 Non-negative self-adjoint transformations. . . . . . . . . . 52 2.3 Compact self-adjoint transformations. . . . . . . . . . . . . . . . 54 2.4 Fourier's Fourier series. . . . . . . . . . . . . . . . . . . . . . . . 57 2.4.1 Proof by integration by parts.. . . . . . . . . . . . . . . . 57 2.4.2 Relation to the operator .. 60 2.4.3 Girding's inequality, special case.. . . . . . . . . . . . . . 62 2.5 The Heisenberg uncertainty principle. . . . . . . . . . . . . . . . 64 2.6 The Sobolev Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.7 G Arding's inequality. . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.8 Consequences of Girding's inequality. . . . . . . . . . . . . . . . 76 2.9 Extension of the basic lemmas to manifolds. . . . . . . . . . . . . 79 2.10 Example: Hodge theory. . . . . . . . . . . . . . . . . . . . . . . . 80 2.11 The resolvent.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3 The Fourier Transform. 85 3.1 Conventions, especially about 27r. . . . . . . . . . . . . . . . . . . 85 3.2 Convolution goes to multiplication. . . . . . . . . . . . . . . . . . 86 3.3 Scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.4 Fourier transform of a Gaussian is a Gaussian. . . . . . . . . . . 86 3.5 The multiplication formula. . . . . . . . . . . . . . . . . . . . . . 88 3.6 The inversion formula. . . . . . . . . . . . . . . . . . . . . . . . . 88 3.7 Plancherel's theorem . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.8 The Poisson summation formula. . . . . . . . . . . . . . . . . . . 89 3.9 The Shannon sampling theorem. . . . . . . . . . . . . . . . . . . 90 3.10 The Heisenberg Uncertainty Principle. . . . . . . . . . . . . . . . 91 3.11 Tempered distributions. . . . . . . . . . . . . . . . . . . . . . . . 92 3.11.1 Examples of Fourier transforms of elements of S'.. . . . . 93 4 Measure theory. 95 4.1 Lebesgue outer measure. . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Lebesgue inner measure. . . . . . . . . . . . . . . . . . . . . . . . 98 4.3 Lebesgue's definition of measurability. . . . . . . . . . . . . . . . 98 4.4 Caratheodory's definition of measurability.. . . . . . . . . . . . . 102 4.5 Countable additivity. . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.6 o-fields, measures, and outer measures.. . . . . . . . . . . . . . . 108 4.7 Constructing outer measures, Method I. . . . . . . . . . . . . . . 109 4.7.1 A pathological example. . . . . . . . . . . . . . . . . . . . 110 4.7.2 Metric outer measures.. . . . . . . . . . . . . . . . . . . . 111 4.8 Constructing outer measures, Method II.. . . . . . . . . . . . . . 113 4.8.1 An example. . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.9 Hausdorff measure. . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.10 Hausdorff dimension... ..... . . .. .. .. .. .. .. .. .. .... 117  CONTENTS 7 4.11 Push forward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.12 The Hausdorff dimension of fractals . . . . . . . . . . . . . . . . 119 4.12.1 Similarity dimension.. . . . . . . . . . . . . . . . . . . . . 119 4.12.2 The string model. . . . . . . . . . . . . . . . . . . . . . . 122 4.13 The Hausdorff metric and Hutchinson's theorem. . . . . . . . . . 124 4.14 Affine examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.14.1 The classical Cantor set.. . . . . . . . . . . . . . . . . . . 126 4.14.2 The Sierpinski Gasket . . . . . . . . . . . . . . . . . . . . 128 4.14.3 Moran's theorem . . . . . . . . . . . . . . . . . . . . . . . 129 5 The Lebesgue integral. 133 5.1 Real valued measurable functions. . . . . . . . . . . . . . . . . . 134 5.2 The integral of a non-negative function. . . . . . . . . . . . . . . 134 5.3 Fatou's lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.4 The monotone convergence theorem. . . . . . . . . . . . . . . . . 140 5.5 The space L1(X, R). . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6 The dominated convergence theorem.. . . . . . . . . . . . . . . . 143 5.7 Riemann integrability. . . . . . . . . . . . . . . . . . . . . . . . . 144 5.8 The Beppo - Levi theorem. . . . . . . . . . . . . . . . . . . . . . 145 5.9 C1 is complete. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.10 Dense subsets of 11(R, R). . . . . . . . . . . . . . . . . . . . . . 147 5.11 The Riemann-Lebesgue Lemma. . . . . . . . . . . . . . . . . . . 148 5.11.1 The Cantor-Lebesgue theorem. . . . . . . . . . . . . . . . 150 5.12 Fubini's theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.12.1 Product o--fields. . . . . . . . . . . . . . . . . . . . . . . . 151 5.12.2 7r-systems and A-systems. . . . . . . . . . . . . . . . . . . 152 5.12.3 The monotone class theorem. . . . . . . . . . . . . . . . . 153 5.12.4 Fubini for finite measures and bounded functions. . . . . 154 5.12.5 Extensions to unbounded functions and to o-finite measures. 156 6 The Daniell integral. 157 6.1 The Daniell Integral . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.2 Monotone class theorems. . . . . . . . . . . . . . . . . . . . . . . 160 6.3 Measure.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.4 H6lder, Minkowski , L' and L . . . . . . . . . . . . . . . . . . . . 163 6.5 | - is the essential sup norm.. . . . . . . . . . . . . . . . . . . 166 6.6 The Radon-Nikodym Theorem. . . . . . . . . . . . . . . . . . . . 167 6.7 The dual space of L'. . . . . . . . . . . . . . . . . . . . . . . . . 170 6.7.1 The variations of a bounded functional. . . . . . . . . . . 171 6.7.2 Duality of L' and Lq when p(S) < oc. . . . . . . . . . . . 172 6.7.3 The case where p (S) = oc. . . . . . . . . . . . . . . . . . 173 6.8 Integration on locally compact Hausdorff spaces. . . . . . . . . . 175 6.8.1 Riesz representation theorems. . . . . . . . . . . . . . . . 175 6.8.2 Fubini's theorem. . . . . . . . . . . . . . . . . . . . . . . . 176 6.9 The Riesz representation theorem redux.. . . . . . . . . . . . . . 177 6.9.1 Statement of the theorem.... .. .. .. .. .. .. .. .... 177  8 6.9.2 Propositions in topology. . . . . . . . . . . . . 6.9.3 Proof of the uniqueness of the p restricted to B( 6.10 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10.1 Definition.. . . . . . . . . . . . . . . . . . . . . 6.10.2 Measurability of the Borel sets. . . . . . . . . . 6.10.3 Compact sets have finite measure. . . . . . . . 6.10.4 Interior regularity. . . . . . . . . . . . . . . . . 6.10.5 Conclusion of the proof. . . . . . . . . . . . . . ( CONTENTS . . . . . . 178 X) . . . 180 . . . . . . 180 . . . . . . 180 . 182 . . . . . . 183 . . . . . . 183 . . . . . . 184 7 Wiener measure, Brownian motion and white noise. 7.1 Wiener measure. . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 The Big Path Space . . . . . . . . . . . . . . . . . . 7.1.2 The heat equation . . . . . . . . . . . . . . . . . . . 7.1.3 Paths are continuous with probability one. . . . . 7.1.4 Embedding in S'.. . . . . . . . . . . . . . . . . . . 7.2 Stochastic processes and generalized stochastic processes. 7.3 Gaussian measures.. . . . . . . . . . . . . . . . . . . . . . 7.3.1 Generalities about expectation and variance. . . . 7.3.2 Gaussian measures and their variances. . . . . . . 7.3.3 The variance of a Gaussian with density.. . . . . . 7.3.4 The variance of Brownian motion. . . . . . . . . . 7.4 The derivative of Brownian motion is white noise.. . . . . 187 . . 187 . . 187 . . 189 . . 190 . . 194 . . 195 . . 196 . . 196 . . 198 . . 199 .. ..200 202 8 Haar measure. 8.1 Examples . . . . . . . . . . . . . . . . . . . . 8.1.1 R . . . . . . . . . . . . . . . . . . . 8.1.2 Discrete groups. . . . . . . . . . . . 8.1.3 Lie groups. . . . . . . . . . . . . . . 8.2 Topological facts . . . . . . . . . . . . . . . . 8.3 Construction of the Haar integral. . . . . . 8.4 Uniqueness . . . . . . . . . . . . . . . . . . . 8.5 p(G) o), d:X xX -R>0 such that for all x, y, z E X 1. d(x, y) = d(y, x) 2. d(x, z) r + s by virtue of the triangle inequality. Suppose that this intersection is not empty and that w e Br(x) n Bs(z). If y E X is such that d(y, w) < min[r - d(x, w), s - d(z, w)] then the triangle inequality implies that y E Br(x) n Bs(z). Put another way, if we set t min[r - d(x, w), s - d(z, w)] then Bt(w) C Br(x) n Bs(z). Put still another way, this says that the intersection of two (open) balls is either empty or is a union of open balls. So if we call a set in X open if either it is empty, or is a union of open balls, we conclude that the intersection of any finite number of open sets is open, as is the union of any number of open sets. In technical language, we say that the open balls form a base for a topology on X. A map f : X - Y from one pseudo-metric space to another is called con- tinuous if the inverse image under f of any open set in Y is an open set in X. Since an open set is a union of balls, this amounts to the condition that the inverse image of an open ball in Y is a union of open balls in X, or, to use the familiar c, 6 language, that if f(x) = y then for every c > 0 there exists a 6 = 6(x, c) > 0 such that f(B6(x)) c BE(y). Notice that in this definition 6 is allowed to depend both on x and on E. The map is called uniformly continuous if we can choose the 6 independently of X. An even stronger condition on a map from one pseudo-metric space to an- other is the Lipschitz condition. A map f : X - Y from a pseudo-metric space (X, dx) to a pseudo-metric space (Y, dy) is called a Lipschitz map with Lipschitz constant C if dy(f (x1), f (x2)) < Cdx(x1, x2) Vx1, x2 E X. Clearly a Lipschitz map is uniformly continuous. For example, suppose that A is a fixed subset of a pseudo-metric space X. Define the function d(A, -) from X to R by d(A, x) :=inf{d(x, w), w E A}.  1.1. METRIC SPACES 15 The triangle inequality says that d(x,w) < d(x,y) +d(y,w) for all w, in particular for w e A, and hence taking lower bounds we conclude that d(A, x) < d(x, y) + d(A, y). or d(A, x) - d(A, y) d(x, y). Reversing the roles of x and y then gives d(A, x) - d(A, y)| d(x, y). Using the standard metric on the real numbers where the distance between a and b is a - b this last inequality says that d(A,.-) is a Lipschitz map from X to R with C 1. A closed set is defined to be a set whose complement is open. Since the inverse image of the complement of a set (under a map f) is the complement of the inverse image, we conclude that the inverse image of a closed set under a continuous map is again closed. For example, the set consisting of a single point in R is closed. Since the map d(A, -) is continuous, we conclude that the set {x d(A, x) = 01 consisting of all points at zero distance from A is a closed set. It clearly is a closed set which contains A. Suppose that S is some closed set containing A, and y g S. Then there is some r > 0 such that Br (y) is contained in the complement of S, which implies that d(y, w) > r for all w E S. Thus {x d(A, x) =01 c S. In short {x d(A, x) =01 is a closed set containing A which is contained in all closed sets containing A. This is the definition of the closure of a set, which is denoted by A. We have proved that A = {x d(A, x) = 0}. In particular, the closure of the one point set {x} consists of all points u such that d(u, x) = 0. Now the relation d(x, y) = 0 is an equivalence relation, call it R. (Transitiv- ity being a consequence of the triangle inequality.) This then divides the space X into equivalence classes, where each equivalence class is of the form {x}, the closure of a one point set. If u E {x} and v e {y} then d(u, v) d(u, x) + d(x, y) + d(y, v) = d(x, y). since x E {u} and y e {v} we obtain the reverse inequality, and so d(u, v) = d(x, y).  16 CHAPTER 1. THE TOPOLOGY OF METRIC SPACES In other words, we may define the distance function on the quotient space X/R, i.e. on the space of equivalence classes by d({x}, {y}) := d(u, v), u E {x}, v E {y} and this does not depend on the choice of u and v. Axioms 1)-3) for a metric space continue to hold, but now d({x}, {y})= 0 {x} = {y}. In other words, X/R is a metric space. Clearly the projection map x H {x} is an isometry of X onto X/R. (An isometry is a map which preserves distances.) In particular it is continuous. It is also open. In short, we have provided a canonical way of passing (via an isometry) from a pseudo-metric space to a metric space by identifying points which are at zero distance from one another. A subset A of a pseudo-metric space X is called dense if its closure is the whole space. From the above construction, the image A/R of A in the quotient space X/R is again dense. We will use this fact in the next section in the following form: If f : Y -> X is an isometry of Y such that f(Y) is a dense set of X, then f descends to a map F of Y onto a dense set in the metric space X/R. 1.2 Completeness and completion. The usual notion of convergence and Cauchy sequence go over unchanged to metric spaces or pseudo-metric spaces Y. A sequence {ym} is said to converge to the point y if for every c > 0 there exists an N = N(E) such that d(yn, y) < E V n > N. A sequence {ym} is said to be Cauchy if for any c > 0 there exists an N = N(E) such that d(yn, ym) < E V m, n > N. The triangle inequality implies that every convergent sequence is Cauchy. But not every Cauchy sequence is convergent. For example, we can have a sequence of rational numbers which converge to an irrational number, as in the approxi- mation to the square root of 2. So if we look at the set of rational numbers as a metric space Q in its own right, not every Cauchy sequence of rational numbers converges in Q. We must "complete" the rational numbers to obtain R, the set of real numbers. We want to discuss this phenomenon in general. So we say that a (pseudo-)metric space is complete if every Cauchy sequence converges. The key result of this section is that we can always "complete" a metric or pseudo-metric space. More precisely, we claim that  1.3. NORMED VECTOR SPACES AND BANACH SPACES. 17 Any metric (or pseudo-metric) space can be mapped by a one to one isometry onto a dense subset of a complete metric (or pseudo-metric) space. By the italicized statement of the preceding section, it is enough to prove this for a pseudo-metric spaces X. Let Xseq denote the set of Cauchy sequences in X, and define the distance between the Cauchy sequences {x} and {y} to be d({xn}, {yn}) := lim d(zn, yn). n-oo It is easy to check that d defines a pseudo-metric on Xseq. Let f : X - Xseq be the map sending x to the sequence all of whose elements are x; f (x) = (x, x, x, x, --). It is clear that f is one to one and is an isometry. The image is dense since by definition lim d(f (xn), {xn}) = 0. Now since f(X) is dense in Xseq, it suffices to show that any Cauchy sequence of points of the form f(xn) converges to a limit. But such a sequence converges to the element {xn}. QED 1.3 Normed vector spaces and Banach spaces. Of special interest are vector spaces which have a metric which is compatible with the vector space properties and which is complete: Let V be a vector space over the real or complex numbers. A norm is a real valued function v s ||v on V which satisfies 1. v >0and>0ifv# 0, 2. cv =c|||v for any real (or complex) number c, and 3. v+w < v +||w|| V v,w E V. Then d(v, w) := v-w is a metric on V, which satisfies d(v+u, w+u) = d(v, w) for all v, w, u E V. The ball of radius r about the origin is then the set of all v such that v 0 there are finitely many open balls of radius c which cover X. Theorem 1.5.1 The following assertions are equivalent for a metric space: 1. X is compact. 2. Every sequence in X has a convergent subsequence. 3. X is totally bounded and complete. Proof that 1. - 2. Let {y2} be a sequence of points in X. We first show that there is a point x with the property for every c > 0, the open ball of radius c centered at x contains the points yi for infinitely many i. Suppose not. Then for any z e X there is an c > 0 such that the ball BE(z) contains only finitely many yi. Since z e BE(z), the set of such balls covers X. By compactness, finitely many of these balls cover X, and hence there are only finitely many i, a contradiction. Now choose i1 so that yi1 is in the ball of radius 2 centered at x. Then choose i2 > i1 so that yi2 is in the ball of radius 1 centered at x and keep going. We have constructed a subsequence so that the points yik converge to z. Thus we have proved that 1. implies 2.  1.6. SEPARABILITY. 19 Proof that 2. - 3. If {x3} is a Cauchy sequence in X, it has a convergent subsequence by hypothesis, and the limit of this subsequence is (by the triangle inequality) the limit of the original sequence. Hence X is complete. We must show that it is totally bounded. Given c > 0, pick a point Y1 e X and let BE(y1) be open ball of radius c about Y1. If BE6(y1) = X there is nothing further to prove. If not, pick a point Y2 e X - BE(y1) and let B6(y2) be the ball of radius c about Y2. If BE(y1) U BE(y2) = X there is nothing to prove. If not, pick a point y3 E X - (BE(y1) U BE(y2)) etc. This procedure can not continue indefinitely, for then we will have constructed a sequence of points which are all at a mutual distance > E from one another, and this sequence has no Cauchy subsequence. Proof that 3. - 2. Let {x3} be a sequence of points in X which we relabel as {x1,3}. Let B1,2 , ... , B1 be a finite number of balls of radius 2 which cover X. Our hypothesis 3. asserts that such a finite cover exists. Infinitely many of the j must be such that the x1,j all lie in one of these balls. Relabel this subsequence as {x2,3}. Cover X by finitely many balls of radius j. There must be infinitely many j such that all the X2,j lie in one of the balls. Relabel this subsequence as { X3,3}. Continue. At the ith stage we have a subsequence {x,3} of our original sequence (in fact of the preceding subsequence in the construction) all of whose points lie in a ball of radius 1/i. Now consider the "diagonal" subsequence X1,1, X2,2, X3,3,.-- All the points from x2,2 on lie in a fixed ball of radius 1/i so this is a Cauchy sequence. Since X is assumed to be complete, this subsequence of our original sequence is convergent. We have shown that 2. and 3. are equivalent. The hard part of the proof consists in showing that these two conditions imply 1. For this it is useful to introduce some terminology: 1.6 Separability. A metric space X is called separable if it has a countable subset {x3 } of points which are dense. For example R is separable because the rationals are countable and dense. Similarly, R" is separable because the points all of whose coordinates are rational form a countable dense subset. Proposition 1.6.1 Any subset Y of a separable metric space X is separable (in the induced metric). Proof. Let {x3 } be a countable dense sequence in X. Consider the set of pairs (j, n) such that B1/2n (xy )nY 0. For each such (j, n) let yj,n be any point in this non-empty intersection. We claim that the countable set of points yj,n are dense in Y. Indeed, let y be any point of Y. Let n be any positive integer. We can find a point x such that d(xy, y) < 1/2r since the x" are dense in X. But then d(y,y,) < 1/n7 by the triangle inequality. QED  20 CHAPTER 1. THE TOPOLOGY OF METRIC SPACES Proposition 1.6.2 Any totally bounded metric space X is separable. Proof. For each n let {x1,,... , xi,} be the centers of balls of radius 1/n (finite in number) which cover X. Put all of these together into one sequence which is clearly dense. QED A base for the open sets in a topology on a space X is a collection B of open set such that every open set of X is the union of sets of B Proposition 1.6.3 A family B is a base for the topology on X if and only if for every x E X and every open set U containing x there is a V e B such that x e V and V C U. Proof. If B is a base, then U is a union of members of B one of which must therefore contain x. Conversely, let U be an open subset of X. For each x E U there is a V, c U belonging to B. The union of these over all x E U is contained in U and contains all the points of U, hence equals U. So B is a base. QED 1.7 Second Countability. A topological space X is said to be second countable or to satisfy the second axiom of countability if it has a base B which is (finite or ) countable. Proposition 1.7.1 A metric space X is second countable if and only if it is separable. Proof. Suppose X is separable with a countable dense set {x2}. The open balls of radius 1/n about the x2 form a countable base: Indeed, if U is an open set and x E U then take n sufficiently large so that B2/n(x) c U. Choose j so that d(xy, x) < 1/n. Then V := B1/n(xj) satisfies x E V c U so by Proposition 1.6.3 the set of balls Bl/ (xj) form a base and they constitute a countable set. Conversely, let B be a countable base, and choose a point xg E U3 for each U3 E B. If x is any point of X, the ball of radius c > 0 about x includes some U3 and hence contains xz. So the xz form a countable dense set. QED Proposition 1.7.2 Lindelof's theorem. Suppose that the topological space X is second countable. Then every open cover has a countable subcover. Let U be a cover, not necessarily countable, and let B be a countable base. Let C c B consist of those open sets V belonging to B which are such that V c U where U E U. By Proposition 1.6.3 these form a (countable) cover. For each V E C choose a UV E U such that V C Uv. Then the {Uv}vEC form a countable subset of U which is a cover. QED 1.8 Conclusion of the proof of Theorem 1.5.1. Suppose that condition 2. and 3. of the theorem hold for the metric space X. By Proposition 1.6.2, X is separable, and hence by Proposition 1.7.1, X is  1.9. DINI'S LEMMA. 21 second countable. Hence by Proposition 1.7.2, every cover U has a countable subcover. So we must prove that if U1, U2, U3, ... is a sequence of open sets which cover X, then X = U1 U U2 U - .-. U Urn for some finite integer m. Suppose not. For each m choose Xm E X with Xm U1 U ... U Um. By condition 2. of Theorem 1.5.1, we may choose a subsequence of the {x3} which converge to some point x. Since U1 U ... U Urn is open, its complement is closed, and since X g U1iU - - - U Um for j > m we conclude that x g U1iU - - - U Um for any m. This says that the {U3} do not cover X, a contradiction. QED Putting the pieces together, we see that a closed bounded subset of Rm is compact. This is the famous Heine-Borel theorem. So Theorem 1.5.1 can be considered as a far reaching generalization of the Heine-Borel theorem. 1.9 Dini's lemma. Let X be a metric space and let L denote the space of real valued continuous functions of compact support. So f E L means that f is continuous, and the closure of the set of all x for which f(x) > 0 is compact. Thus L is a real vector space, and f e L fl EL. Thus if f E L and g E L then f+g e L and also max (f,g) = 2(f+g+f-g) E L and min (f,g) = 2(f +g-f -g) E L. For a sequence of elements in L (or more generally in any space of real valued functions) we write fn { 0 to mean that the sequence of functions is monotone decreasing, and at each x we have f,(x) -- 0. Theorem 1.9.1 Dini's lemma. If f, e L and f, { 0 then fm|| -- 0. In other words, monotone decreasing convergence to 0 implies uniform convergence to zero for elements of L. Proof. Given c > 0, let Cn = {x fn(x) > E}. Then the Cn are compact, C, D Cn+1 and Ok Ck _ 0. Hence a finite intersection is already empty, which means that Cn = 0 for some n. This means that |f| < e for some n, and hence, since the sequence is monotone decreasing, for all subsequent n. QED 1.10 The Lebesgue outer measure of an interval is its length. For any subset A c R we define its Lebesgue outer measure by m*(A) := inf1f (In) : In are intervals with A C U'I . (1.1) Here the length £(I) of any interval I = [a, b] is b - a with the same definition for half open intervals (a, b] or [a, b), or open intervals. Of course if a = -oc and b is finite or +oc, or if a is finite and b =+oo the length is infinite. So the infimum in (1.1) is taken over all covers of A by intervals. By the usual E/2n trick, i.e. by replacing each Is [as, b3] by (a3 - e/23±1, b3 + c/23+1) we may  22 CHAPTER 1. THE TOPOLOGY OF METRIC SPACES assume that the infimum is taken over open intervals. (Equally well, we could use half open intervals of the form [a, b), for example.). It is clear that if A c B then m*(A) m*(B) since any cover of B by intervals is a cover of A. Also, if Z is any set of measure zero, then m* (A U Z) m*(A). In particular, m*(Z) = 0 if Z has measure zero. Also, if A = [a, b] is an interval, then we can cover it by itself, so m*([a, b]) < b - a, and hence the same is true for (a, b], [a, b), or (a, b). If the interval is infinite, it clearly can not be covered by a set of intervals whose total length is finite, since if we lined them up with end points touching they could not cover an infinite interval. We still must prove that m*(I) =_£(I) (1.2) if I is a finite interval. We may assume that I = [c, d] is a closed interval by what we have already said, and that the minimization in (1.1) is with respect to a cover by open intervals. So what we must show is that if [c, d] c U(ai, b) then d-c Z(bi -a). i We first apply Heine-Borel to replace the countable cover by a finite cover. (This only decreases the right hand side of preceding inequality.) So let n be the number of elements in the cover. We want to prove that if [cd] Cl (a1, bi) then d - c < (bi - ai). i=1 i=1 We shall do this by induction on n. If n = 1 then a1 < c and b1 > d so clearly b1- a1 > d - c. Suppose that n ;> 2 and we know the result for all covers (of all intervals [c, d] ) with at most n - 1 intervals in the cover. If some interval (ai, bi) is disjoint from [c, d] we may eliminate it from the cover, and then we are in the case of n - 1 intervals. So every (ai, bi) has non-empty intersection with [c, d]. Among the the intervals (ai, bi) there will be one for which a2 takes on the minimum possible value. By relabeling, we may assume that this is (a1, b1). Since c is covered, we must have a1 < c. If bi > d then (a1, b1) covers [c, d] and there is nothing further to do. So assume b1 d. We must have b1 > c since (a1, bi) n [c, d] #0. Since b1 E [c, d], at least one of the intervals (ai, bi), i > 1 contains the point b1. By relabeling, we may assume that it is (a2, b2). But now we have a cover of [c, d] by n - 1 intervals: i=3  1.11. ZORN'S LEMMA AND THE AXIOM OF CHOICE. 23 So by induction d-c< (b2-ai)+Z(bi -a). i=3 But b2 - a1 < (b2 - a2) + (b1 - a1) since a2 < b1. QED 1.11 Zorn's lemma and the axiom of choice. In the first few sections we repeatedly used an argument which involved "choos- ing" this or that element of a set. That we can do so is an axiom known as The axiom of choice. If F is a function with domain D such that F(x) is a non-empty set for every x e D, then there exists a function f with domain D such that f(x) e F(x) for every x e D. It has been proved by G6del that if mathematics is consistent without the axiom of choice (a big "if"!) then mathematics remains consistent with the axiom of choice added. In fact, it will be convenient for us to take a slightly less intuitive axiom as out starting point: Zorn's lemma. Every partially ordered set A has a maximal linearly ordered subset. If every linearly ordered subset of A has an upper bound, then A contains a maximum element. The second assertion is a consequence of the first. For let B be a maximum linearly ordered subset of A, and x an upper bound for B. Then x is a maximum element of A, for if y >- x then we could add y to B to obtain a larger linearly ordered set. Thus there is no element in A which is strictly larger than x which is what we mean when we say that x is a maximum element. Zorn's lemma implies the axiom of choice. Indeed, consider the set A of all functions g defined on subsets of D such that g(x) E F(x). We will let dom(g) denote the domain of definition of g. The set A is not empty, for if we pick a point xo E D and pick yo E F(xo), then the function g whose domain consists of the single point xo and whose value g(xo) = yo gives an element of A. Put a partial order on A by saying that g - h if dom(g) C dom(h) and the restriction of h to dom g coincides with g. A linearly ordered subset means that we have an increasing family of domains X, with functions h defined consistently with respect to restriction. But this means that there is a function g defined on the union of these domains, U X whose restriction to each X coincides with the corresponding h. This is clearly an upper bound. So A has a maximal element f. If the domain of f were not  24 CHAPTER 1. THE TOPOLOGY OF METRIC SPACES all of D we could add a single point x0 not in the domain of f and yo E F(xo) contradicting the maximality of f. QED 1.12 The Baire category theorem. Theorem 1.12.1 In a complete metric space any countable intersection of dense open sets is dense. Proof. Let X be the space, let B be an open ball in X, and let 01, 02 ... be a sequence of dense open sets. We must show that B n (On) f0. Since 01 is dense, Bn 01 0, and is open. Thus Bn 01 contains the closure B1 of some open ball B1. We may choose B1 (smaller if necessary) so that its radius is < 1. Since B1 is open and 02 is dense, B1n02 contains the closure B2 of some open ball B2, of radius < j, and so on. Since X is complete, the intersection of the decreasing sequence of closed balls we have constructed contains some point x which belong both to B and to the intersection of all the O. QED A Baire space is defined as a topological space in which every countable intersection of dense open sets is dense. Thus Baire's theorem asserts that every complete metric space is a Baire space. A set A in a topological space is called nowhere dense if its closure contains no open set. Put another way, a set A is nowhere dense if its complement Ac contains an open dense set. A set S is said to be of first category if it is a countable union of nowhere dense sets. Then Baire's category theorem can be reformulated as saying that the complement of any set of first category in a complete metric space (or in any Baire space) is dense. A property P of points of a Baire space is said to hold quasi - surely or quasi-everywhere if it holds on an intersection of countably many dense open sets. In other words, if the set where P does not hold is of first category. 1.13 Tychonoff's theorem. Let I be a set, serving as an "index set". Suppose that for each a e I we are given a non-empty topological space Sa. The Cartesian product S :=fS aEI is defined as the collection of all functions x whose domain in I and such that x(a) e Sa . This space is not empty by the axiom of choice. We frequently write xz, instead of x(a) and called xz the "a coordinate of x". The map cyI  1.14. URYSOHN'S LEMMA. 25 is called the projection of S onto Sa. We put on S the weakest topology such that all of these projections are continuous. So the open sets of S are generated by the sets of the form fa1(Uc,) where Uc, c Sc, is open. Theorem 1.13.1 [Tychonoff.] If all the Sc, are compact, then so is S = HcI1 Sa- Proof. Let F be a family of closed subsets of S with the property that the intersection of any finite collection of subsets from this family is not empty. We must show that the intersection of all the elements of F is not empty. Using Zorn, extend F to a maximal family Fo of (not necessarily closed) subsets of S with the property that the intersection of any finite collection of elements of F0 is not empty. For each a, the projection fc,(Fo) has the property that there is a point xc, E Sc, which is in the closure of all the sets belonging to fa (Fo). Let x E S be the point whose a-th coordinate is xc,. We will show that x is in the closure of every element of Fo which will complete the proof. Let U be an open set containing x. By the definition of the product topology, there are finitely many a2 and open subsets Uc, C Sc, such that X E fa(U,) C U. i=1 So for each i = 1, ... , n, xc, E Uc,. This means that Uc, intersects every set belonging to fa, (Fo). So fa'(Ua) intersects every set belonging to FO and hence must belong to Fo by maximality. Therefore, fc- (Uj)gE 0, i=1 again by maximality. This says that U intersects every set of Fo. In other words, any neighborhood of x intersects every set belonging to F0, which is just another way of saying x belongs to the closure of every set belonging to Fo. QED 1.14 Urysohn's lemma. A topological space S is called normal if it is Hausdorff, and if for any pair F1, F2 of closed sets with F1 n F2 = 0 there are disjoint open sets U1, U2 with F1 c U1 and F2 c U2. For example, suppose that S is Hausdorff and compact. For each p E F1 and q E F2 there are neighborhoods Oq of p and Wq of q with Oq n Wq= 0. This is the Hausdorff axiom. A finite number of the Wq cover F2 since it is compact. Let the intersection of the corresponding Oq be called Up and the union of the corresponding Wq be called V,. Thus for each p E F1 we have found a neighborhood Up of p and an open set V, containing F2 with  26 CHAPTER 1. THE TOPOLOGY OF METRIC SPACES Up n V = 0. Once again, finitely many of the Up cover F1. So the union U of these and the intersection V of the corresponding V, give disjoint open sets U containing F1 and V containing F2. So any compact Hausdorff space is normal. Theorem 1.14.1 [Urysohn's lemma.] If F0 and F1 are disjoint closed sets in a normal space S then there is a continuous real valued function f : S - R such that 0 < f <1, f =0 on F0 and f =1 on F1. Proof. Let V1 : =Ffl. We can find an open set V2 containing F0 and whose closure is contained in V1, since we can choose Vi disjoint from an open set containing F1. So we have Fo cV, V c V1. Applying our normality assumption to the sets F0 and V1 we can find an open set V1 with F0 c Vi and V1 c V1. Similarly, we can find an open set V3 with 4 4 4 2 4 Vi C V3 and V3 c V1. So we have F0 c V17, V1 c V1, V c V , V3 c V1 =Fl. Continuing in this way, for each 0 < r < 1 where r is a dyadic rational, r = m/2k we produce an open set Vr with F0 C V and Vr C V if r < s, including Vr c V1 = Ff. Define f as follows: Set f(x) = 1 for x E F1. Otherwise, define f (x) =inf{rxeVr}. So f =0 on F0. If 0 < b < 1, then f(x) < b means that x e Vr for some r < b. Thus f1([0, b)) U=U Vr. r a means that there is some r > a such that x g Vr. Thus f-1((a, 1]) = U (Vr)c, r>a also a union of open sets, hence open. So we have shown that f1([0, b)) and f -1((a,1]) are open. Hence f-1((a, b)) is open. Since the intervals [0, b), (a, 1] and (a, b) form a basis for the open sets on the interval [0, 1], we see that the inverse image of any open set under f is open, which says that f is continuous. QED We will have several occasions to use this result.  1.15. THE STONE-WEIERSTRASS THEOREM. 27 1.15 The Stone-Weierstrass theorem. This is an important generalization of Weierstrass's theorem which asserted that the polynomials are dense in the space of continuous functions on any compact interval, when we use the uniform topology. We shall have many uses for this theorem. An algebra A of (real valued) functions on a set S is said to separate points if for any p, q E S, p f q there is an f E A with f(p) # f(q). Theorem 1.15.1 [Stone-Weierstrass.] Let S be a compact space and A an algebra of continuous real valued functions on S which separates points. Then the closure of A in the uniform topology is either the algebra of all continuous functions on S, or is the algebra of all continuous functions on S which all vanish at a single point, call it x. We will give two different proofs of this important theorem. For our first proof, we first state and prove some preliminary lemmas: Lemma 1.15.1 An algebra A of bounded real valued functions on a set S which is closed in the uniform topology is also closed under the lattice operations V and A. Proof. Since f Vg = j(f+g+ f-g) and f A g = 2(f+g- f- g) we must show that feA -feA. Replacing f by f / f we may assume that f <1. The Taylor series expansion about the point 1 for the function t - (t + E2) 2 converges uniformly on [0, 1]. So there exists, for any c > 0 there is a polynomial P such that P(x2) - (x2 + 62) 2 < E on [-1, 1]. Let Q:= P - P(0). We have P(0) - E < eso P(0)| <2E. So Q(0) = 0 and Q(x2) - (x2 + 62) 2 < 3E. But (x2 + E2) I fAgeAand fVgeA. Then the closure of A in the uniform topology contains every continuous function on S which can be approximated at every pair of points by a function belonging to A. Proof. Suppose that f is a continuous function on S which can be approxi- mated at any pair of points by elements of A. So let p, q e S and c > 0, and let fp,q,c e A be such that f (p) - fp,q,E(p)| f (x) - E}. Fix q and E. The sets Up,q,c cover S as p varies. Hence a finite number cover S since we are assuming that S is compact. We may take the minimum fq,E of the corresponding finite collection of fp,q,E. The function fq,E has the property that fq,E(x) < f(x) + E and fq,E(x) > f(x) - E for X ~n)Vp,q,c where the intersection is again over the same finite set of p's. We have now found a collection of functions fq,E such that fq,E f - E on some neighborhood Vq,E of q. We may choose a finite number of q so that the Vq,E cover all of S. Taking the maximum of the corresponding fq,e gives a function f6 e A with f - E < f < f + E, i.e. |f - f |o <6.  1.15. THE STONE-WEIERSTRASS THEOREM. 29 Since we are assuming that A is closed in the uniform topology we conclude that f E A, completing the proof of the lemma. Proof of the Stone-Weierstrass theorem. Suppose first that for every x E S there is a g E A with g(x) # 0. Let x f y and h E A with h(y) # 0. Then we may choose real numbers c and d so that f = cg + dh is such that 0 f f(x)#f (y) #0. Then for any real numbers a and b we may find constants A and B such that Af (x) + Bf2(x) = a and Af(y) + Bf2(y) = b. We can therefore approximate (in fact hit exactly on the nose) any function at any two distinct points. We know that the closure of A is closed under V and A by the first lemma. By the second lemma we conclude that the closure of A is the algebra of all real valued continuous functions. The second alternative is that there is a point, call it p, at which all f E A vanish. We wish to show that the closure of A contains all continuous functions vanishing at p,. Let B be the algebra obtained from A by adding the constants. Then B satisfies the hypotheses of the Stone-Weierstrass theorem and contains functions which do not vanish at p,. so we can apply the preceding result. If g is a continuous function vanishing at p, we may, for any c > 0 find an f E A and a constant c so that g - (f + c)|| < -. Evaluating at p, gives c < E/2. So g - f|| 0 there is a compact set C such that f < E on the complement of C. We can think of these functions as being defined on S, and all vanishing at pe. We now turn to a second proof of this important theorem. 1.16 Machado's theorem. Let M be a compact space and let CR(M) denote the algebra of continuous real valued functions on M. We let |-=|| denote the uniform norm on C1(M). More generally, for any closed set F c M, we let f F sup f(x) xEF If A c CR(M) is a collection of functions, we will say that a subset E C M is a level set (for A) if all the elements of A are constant on the set E. Also, for any f E CR(M) and any closed set F c M, we let df(F):= inf ||f - g|F gEA So df(F) measures how far f is from the elements of A on the set F. (I have suppressed the dependence on A in this notation.) We can look for "small" closed subsets which measure how far f is from A on all of M; that is we look for closed sets with the property that df(E) = df (M). (1.3) Let F denote the collection of all non-empty closed subsets of M with this property. Clearly M E F so this collection is not empty. We order F by the reverse of inclusion: F1 - F2 if F1 D F2. Let C be a totally ordered subset of F. Since M is compact, the intersection of any nested family of non-empty closed sets is again non-empty. We claim that the intersection of all the sets in C belongs to F, i.e. satisfies (1.3). Indeed, since df (F) = df(M) for any F E C this means that for any g E A, the sets {x ECF f (x) - g(x) >df ((M))} are non-empty. They are also closed and nested, and hence have a non-empty intersection. So on the set E~nF FEC we have f -gg E df(M ).  1.16. MACHADO'S THEOREM. 31 So every chain has an upper bound, and hence by Zorn's lemma, there exists a maximum, i.e. there exists a non-empty closed subset E satisfying (1.3) which has the property that no proper subset of E satisfies (1.3). We shall call such a subset f-minimal. Theorem 1.16.1 [Machado.] Suppose that A C CR(M) is a subalgebra which contains the constants and which is closed in the uniform topology. Then for every f E CR(M) there exists an A level set satisfying(1.3). In fact, every f-minimal set is an A level set. Proof. Let E be an f-minimal set. Suppose it is not an A level set. This means that there is some h E A which is not constant on A. Replacing h by ah + c where a and c are constant, we may arrange that min h = 0 and max h = 1. xEE xEE Let 2 1 Eo:= {xeEO< h(x)< -} and E1 : {xEE- < X < 1}. 3 3- These are non-empty closed proper subsets of E, and hence the minimality of E implies that there exist go, gi e A such that f --go lEo df (M) - E and f - 91 E1 < df (M) -- for some c> 0. Define hn := (1 - h")2" and k := hugo + (1 - hn)g1. Both hn and ka belong to A and 0 < hn < 1 on E, with strict inequality on Eo n E1. At each x E Eo n E1 i we have f(x) - kn(x)| = h(x)f(x) - h (x)go(x) + (1 - h (x))f (x) - (1 - h (x))g1(x) < hn(x)|f - golEonE1 + (1-hn-) f-g1 EonE1 < h (x)|f - go Eo + (1 - hn(x) f-g E1 df(M)- E. We will now show that h - 1 on Eo \ E1 and hn - 0 on E1 \ Eo. Indeed, on Eo \ E1 we have h" < so h=(1- hh)2 > 1- 2"h" > 1 -1 since the binomial formula gives an alternating sum with decreasing terms. On the other hand, h (1 + h")2r'= 1 - h2.2<1 or 1 "(1 + h"h)2n  32 CHAPTER 1. THE TOPOLOGY OF METRIC SPACES Now the binomial formula implies that for any integer k and any positive number a we have ka < (1 + a)k or (1 + a)-k 1/(ka). So we have 1 h < . On Eo \ E1 we have h" > (j)f so there we have hn -4 0. Thus k - go uniformly on Eo \ E1 and k - gi uniformly on E1 \ Eo. We conclude that for n large enough f-kn El. We have proved Theorem 1.17.2 The map B - B** given above is a norm preserving injec- tion. 1.18 The Uniform Boundedness Principle. Theorem 1.18.1 Let B be a Banach space and {Fn} be a sequence of elements in B* such that for every fixed x E B the sequence of numbers {|Fn(x)|} is bounded. Then the sequence of norms {|Fn|} is bounded. Proof. The proof will be by a Baire category style argument. We will prove Proposition 1.18.1 There exists some ball B = B(y, r), r > 0 about a point y with y < 1 and a constant K such that Fn (z) < K for all z E B. Proof that the proposition implies the theorem. For any z with z <1 we have 2 r z-y = ((z-y)) and -(z-y)| 1 at some x E B(0, 1) and hence in some ball of radius c < 1 about x. Then we can find an n2 with Fn2 (z) > 2 in some non-empty closed ball of radius < lying inside the first ball. Continuing inductively, we choose a subsequence nm and a family of nested non-empty balls Bm with |Fnm (z) > m throughout Bm and the radii of the balls tending to zero. Since B is complete, there is a point x common to all these balls, and {|Fn(x)|} is unbounded, contrary to hypothesis. QED We will have occasion to use this theorem in a "reversed form". Recall that we have the norm preserving injection B - B** sending x H x** where x**(F) = F(x). Since B* is a Banach space (even if B is incomplete) we have Corollary 1.18.1 If {xn} is a sequence of elements in a normed linear space such that the numerical sequence {|F(xn)|} is bounded for each fixed F E B* then the sequence of norms {|xn|} is bounded.  Chapter 2 Hubert Spaces and Compact operators. 2.1 Hilbert space. 2.1.1 Scalar products. V is a complex vector space. A rule assigning to every pair of vectors f, g E V a complex number (f, g) is called a semi-scalar product if 1. (f, g) is linear in f when g is held fixed. 2. (g, f) = (f, g). This implies that (f, g) is anti-linear in g when f is held fixed. In other words. (f, ag + bh) = (f, g) + b(f, h). It also implies that (f, f) is real. 3. (f,f) 0 for all f e V. If 3. is replaced by the stronger condition 4. (f, f) > 0 for all non-zero f E V then we say that ( , ) is a scalar product. Examples. " V = Cm, so an element z of V is a column vector of complex numbers: z = Zn and (z, w) is given by (z, w) : Zzi2. 37  38 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. " V consists of all continuous (complex valued) functions on the real line which are periodic of period 27r and (f, g) :=2 f (x)g(x)dz. We will denote this space by C(T). Here the letter T stands for the one dimensional torus, i.e. the circle. We are identifying functions which are periodic with period 27r with functions which are defined on the circle R/27rZ. " V consists of all doubly infinite sequences of complex numbers a = ..., a2, a_1, ao, a1, a2, ... which satisfy Za2 2<. Here (a, b) Zab2. All three are examples of scalar products. 2.1.2 The Cauchy-Schwartz inequality. This says that if ( , ) is a semi-scalar product then (f,g) (f,f) (g,g)2. (2.1) Proof. For any real number t condition 3. above says that (f - tg, f - tg) > 0. Expanding out gives 0 (f - tg, f - tg) = (f, f) - t[(f, g) + (g, f)] + t2(g, g). Since (g, f) = (f, g), the coefficient of t in the above expression is twice the real part of (f, g). So the real quadratic form Q(t) := (f, f) - 2Re(f, g)t + t2(g.g) is nowhere negative. So it can not have distinct real roots, and hence by the b2 - 4ac rule we get 4(Re(f, g))2 - 4(f, f)(g, g) < 0 or (Re(f,g))2 (f,f)(g,g). (2.2) This is useful and almost but not quite what we want. But we may apply this inequality to h =e2Og for any 0. Then (h, h) =(g, g). Choose 0 so that (f, g) =rs  2.1. HILBERT SPACE. 39 where r =|(f, g)|. Then (f, h) =_(f, eig) = e--0(f, g) =|(f, g) and the preceding inequality with g replaced by h gives (fg)|2 , _(f,f)(g,g) and taking square roots gives (2.1). 2.1.3 The triangle inequality For any semiscalar product define f| :(f,f)2 so we can write the Cauchy-Schwartz inequality as f,g) f |g The triangle inequality says that f + g f + g|. (2.3) Proof. f+g|2 = (f+g,f+g) (f, f) + 2Re(f, g) + (g, g) (f,f)+2|f|g +(g,g) by (2.2) f|2+ 2|f|g|+|g||2 ( f| + |g||)2. Taking square roots gives the triangle inequality (2.3). Notice that cf =cl f (2.4) since (cf, cf) = cc(f, f) = |c2 2 Suppose we try to define the distance between two elements of V by d(f, g) := |f -- g||. Notice that then d(f, f) = 0, d(f, g) = d(g, f) and for any three elements d(f, h) < d(f, g) + d(g, h) by virtue of the triangle inequality. The only trouble with this definition is that we might have two distinct elements at zero distance, i.e. 0 d(f, g) =f - g But this can not happen if ( , ) is a scalar product, i.e. satisfies condition 4.  40 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. A complex vector space V endowed with a scalar product is called a pre- Hilbert space. Let V be a complex vector space and let|- be a map which assigns to any f E V a non-negative real f number such that f > 0 for all non-zero f. If - satisfies the triangle inequality (2.3) and equation (2.4) it is called a norm. A vector space endowed with a norm is called a normed space. The pre-Hilbert spaces can be characterized among all normed spaces by the parallelogram law as we will discuss below. Later on, we will have to weaken condition (2.4) in our general study. But it is too complicated to give the general definition right now. 2.1.4 Hilbert and pre-Hilbert spaces. The reason for the prefix "pre" is the following: The distance d defined above has all the desired properties we might expect of a distance. In particular, we can define the notions of "limit" and of a "Cauchy sequence" as is done for the real numbers: If fn is a sequence of elements of V, and f E V we say that f is the limit of the f, and write lim fn = f, or fn-f n-oo if, for any positive number c there is an N = N(E) such that d(fn,f)< eforalln;>N. If a sequence converges to some limit f, then this limit is unique, since any limits must be at zero distance and hence equal. We say that a sequence of elements is Cauchy if for any 6 > 0 there is an K = K(6) such that d(fm, fn) < 6 V, m, n ;>K. If the sequence fn has a limit, then it is Cauchy - just choose K(6) = N(jo) and use the triangle inequality. But it is quite possible that a Cauchy sequence has no limit. As an example of this type of phenomenon, think of the rational numbers with r - s as the distance. The whole point of introducing the real numbers is to guarantee that every Cauchy sequence has a limit. So we say that a pre-Hilbert space is a Hilbert space if it is "complete" in the above sense - if every Cauchy sequence has a limit. Since the complex numbers are complete (because the real numbers are), it follows that Ch is complete, i.e. is a Hilbert space. Indeed, we can say that any finite dimensional pre-Hilbert space is a Hilbert space because it is isomorphic (as a pre-Hilbert space) to Ch for some n. (See below when we discuss orthonormal bases.) The trouble is in the infinite dimensional case, such as the space of continuous periodic functions. This space is not complete. For example, let fa be the function which is equal to one on (-wr + i, -h), equal to zero on (, ,7 -)  2.1. HILBERT SPACE. 41 and extended linearly-k to and from 7F - to 7r + n so as to be continuous and then extended so as to be periodic.(Thus on the interval (7 - n,7 + ) the function is given by fn(x) - 2(x - (7 -k).) If m < n, the functions fm and fn agree outside two intervals of length m and on these intervals |fm(x) - fn(x) 1. So fm - f_||2,< 2 -2/m showing that the sequence {fh} is Cauchy. But the limit would have to equal one on (-7, 0) and equal zero on (0, 7) and so be discontinuous at the origin and at 7. Thus the space of continuous periodic functions is not a Hilbert space, only a pre-Hilbert space. But just as we complete the rational numbers (by throwing in "ideal" el- ements) to get the real numbers, we may similarly complete any pre-Hilbert space to get a unique Hilbert space. See the section Completion in the chapter on metric spaces for a general discussion of how to complete any metric space. In particular, the completion of any normed vector space is a complete normed vector space. A complete normed space is called a Banach space. The general construction implies that any normed vector space can be completed to a Ba- nach space. From the parallelogram law discussed below, it will follow that the completion of a pre-Hilbert space is a Hilbert space. The completion of the space of continuous periodic functions will be denoted by L2(T). 2.1.5 The Pythagorean theorem. Let V be a pre-Hilbert space. We have f+g2 f2+2Re(f,g) + g||2 So f + g|2 2+ g|2 Re (f, g)=0. (2.5) We make the definition fig (f,g)=0 and say that f is perpendicular to g or that f is orthogonal to g. Notice that this is a stronger condition than the condition for the Pythagorean theorem, the right hand condition in (2.5). For example f + if|2 = 2|f|2 but (f, if)= -i|f||2#O0if |f| # 0. If ui is some finite collection of mutually orthogonal vectors, then so are zin2 where the z2 are any complex numbers. So if then by the Pythagorean theorem ii 2 V 1 121 2  42 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. In particular, if the ui / 0, then u= 0 zi = 0 for all i. This shows that any set of mutually orthogonal (non-zero) vectors is linearly independent. Notice that the set of functions ein0 is an orthonormal set in the space of continuous periodic functions in that not only are they mutually orthogonal, but each has norm one. 2.1.6 The theorem of Apollonius. Adding the equations f+ g|2 f 2 + 2Re (f, g)+ g2 (2.6) f- g|2= f 2 - 2Re (f,g)+ g|2 (2.7) gives f +g|2+|f -g|2=2(|f2+g2). (2.8) This is known as the parallelogram law. It is the algebraic expression of the theorem of Apollonius which asserts that the sum of the areas of the squares on the sides of a parallelogram equals the sum of the areas of the squares on the diagonals. If we subtract (2.7) from (2.6) we get Re(f,=gY=2(||f2(f,+gg|2 f g2) . (2.9) Now (if,g) = i(f,g) and Re{i(f,g)} = -Im(f,g) so Im(f, g) = -Re(if, g) = Re(f, ig) so (f,g)= (f + g|2 f-g 2 + i f + ig2-if-ig 2) . (2.10) If we now complete a pre-Hilbert space, the right hand side of this equation is defined on the completion, and is a continuous function there. It therefore follows that the scalar product extends to the completion, and, by continuity, satisfies all the axioms for a scalar product, plus the completeness condition for the associated norm. In other words, the completion of a pre-Hilbert space is a Hilbert space. 2.1.7 The theorem of Jordan and von Neumann. This is essentially a converse to the theorem of Apollonius. It says that if is a norm on a (complex) vector space V which satisfies (2.8), then V is in fact a pre-Hilbert space with |f||2 =(f, f). If the theorem is true, then the scalar product must be given by (2.10). So we must prove that if we take (2.10) as the  2.1. HILBERT SPACE. 43 definition, then all the axioms on a scalar product hold. The easiest axiom to verify is (g,f) (f,g)- Indeed, the real part of the right hand side of (2.10) is unchanged under the interchange of f and g (since g - f = -(f -g) and| -h| =h for any h is one of the properties of a norm). Also g + if = i(f - ig) and ih| =h so the last two terms on the right of (2.10) get interchanged, proving that (g, f) (f, g). It is just as easy to prove that (if,g) i=(f,g)- Indeed replacing f by if sends f + ig|2 into if + ig|2 f + g|2 and sends f + g|2 into if + g|2 i(fg) 2= f-ig 2i(-i f-ig 2) sohasthe effect of multiplying the sum of the first and fourth terms by i, and similarly for the sum of the second and third terms on the right hand side of (2.10). Now (2.10) implies (2.9). Suppose we replace f, g in (2.8) by fi + g, f2 and by fi - g, f2 and subtract the second equation from the first. We get ||h+hf+g||2-|| +hf2-g||2+||l- f2+g||2-||l- f2-g||2 =2( fl+g 2 -fl-g 2). In view of (2.9) we can write this as Re (f1+f2,g)+Re (f1-f2,g) =2Re (fi,g). (2.11) Now the right hand side of (2.9) vanishes when f = 0 since g - g|. So if we take f1 = f2 = f in (2.11) we get Re (2f, g) = 2Re (f, g). We can thus write (2.11) as Re (f1 + f2,g) +Re (f1 - f2,g) = Re (2fi,g). In this equation make the substitutions 1 1 This yields Re (fi + f2,g) = Re (fi,g) + Re (f2,g). Since it follows from (2.10) and (2.9) that (f,g) = Re (f,g) - iRe (if, g) we conclude that  44 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. Taking fi= -f2 shows that (-f, g) =--(f, g). Consider the collection C of complex numbers a which satisfy (af,g) =a(f,g) (for all f, g). We know from (fi + f2, g) = (fi,g) + (f2, g) that a,#3 E C -> a+3 e C. So C contains all integers. If 0 # 3 e C then (f, g)= (13 (1//)f, g) = ((1//)f, g) so #--1 e C. Thus C contains all (complex) rational numbers. The theorem will be proved if we can prove that (af, g) is continuous in a. But the triangle inequality f +g| f + g|| applied to f= f2, g= fi - f2 implies that f1 fl-f2 +|f2 or fi ll- ||f2 f1-f2 Interchanging the role of fi and f2 gives fil - f2 f1-f2 Therefore af+g -|f+g| |(a--#)f Since (a - 3)f| - 0 as a - 3 this shows that the right hand side of (2.10) when applied to af and g is a continuous function of a. Thus C = C. We have proved Theorem 2.1.1 [P. Jordan and J. von Neumann] If V is a normed space whose norm satisfies (2.8) then V is a pre-Hilbert space. Notice that the condition (2.8) involves only two vectors at a time. So we conclude as an immediate consequence of this theorem that Corollary 2.1.1 A normed vector space is pre-Hilbert space if and only if every two dimensional subspace is a Hilbert space in the induced norm. Actually, a weaker version of this corollary, with two replaced by three had been proved by Frechet, Annals of Mathematics, July 1935, who raised the problem of giving an abstract characterization of those norms on vector spaces which come from scalar products. In the immediately following paper Jordan and von Neumann proved the theorem above leading to the stronger corollary that two dimensions suffice.  2.1. HILBERT SPACE. 45 2.1.8 Orthogonal projection. We continue with the assumption that V is pre-Hilbert space. If A and B are two subsets of V, we write A I B if u E A and v E B -> u I v, in other words if every element of A is perpendicular to every element of B. Similarly, we will write v I A if the element v is perpendicular to all elements of A. Finally, we will write Al for the set of all v which satisfy v I A. Notice that Al is always a linear subspace of V, for any A. Now let M be a (linear) subspace of V. Let v be some element of V, not necessarily belonging to M. We want to investigate the problem of finding a w E M such that (v - w) I M. Of course, if v E M then the only choice is to take w = v. So the interesting problem is when v g M. Suppose that such a w exists, and let x be any (other) point of M. Then by the Pythagorean theorem, v -z2 =(vw)+(w 2 = 2 +w - 2 since (v - w) I M and (w - x) E M. So ||v- w||< ||v- z| and this inequality is strict if x + w. In words: if we can find a w E M such that (v - w) I M then w is the unique solution of the problem of finding the point in M which is closest to v. Conversely, suppose we found a w E M which has this minimization property, and let x be any element of M. Then for any real number t we have |v-w||2 |(v-w)+tx|2 =|| w|2+2tRe (v-w,x)+t2 x 2 Since the minimum of this quadratic polynomial in t occurring on the right is achieved at t = 0, we conclude (by differentiating with respect to t and setting t = 0, for example) that Re (v-w,x) =0. By our usual trick of replacing x by e2% we conclude that (v-w,x) =0. Since this holds for all x E M, we conclude that (v - w) I M. So to find w we search for the minimum of v - xl, x E M. Now v - x > 0 and is some finite number for any x E M. So there will be some real number m such that m < v - x for x E M, and such that no strictly larger real number will have this property. (m is known as the "greatest lower bound" of the values v - xl, x E M.) So we can find a sequence of vectors xn E M such that v - xn ->m. We claim that the x form a Cauchy sequence. Indeed, Xi - Xj 2 = (V _ Xj) _ (V _  46 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. and by the parallelogram law this equals 2(|v - 2+ +v - z|2) -|2v- (X, + x) 2 Now the expression in parenthesis converges to 2m2. The last term on the right is 1 - 2(v--(x2+ x) 2 2 Since 2(x2 + xj)e M, we conclude that 2v - (x2 + x) 2> 4m2 so xI-x |2 <4(m+E)2-4m2 for i and j large enough that v -x < m + E and v -x | < m + E. This proves that the sequence x is Cauchy. Here is the crux of the matter: If M is complete, then we can conclude that the x converge to a limit w which is then the unique element in M such that (v - w) I M. It is at this point that completeness plays such an important role. Put another way, we can say that if M is a subspace of V which is complete (under the scalar product ( , ) restricted to M) then we have the orthogonal direct sum decomposition V =M e M1, which says that every element of V can be uniquely decomposed into the sum of an element of M and a vector perpendicular to M. For example, if M is the one dimensional subspace consisting of all (complex) multiples of a non-zero vector y, then M is complete, since C is complete. So w exists. Since all elements of M are of the form ay, we can write w = ay for some complex number a. Then (v - ay, y) = 0 or (v,y) = a|y l2 so (v,y) a = . lyl2 We call a the Fourier coefficient of v with respect to y. Particularly useful is the case where y = 1 and we can write a = (v, y). (2.12) Getting back to the general case, if V= M e M' holds, so that to every v there corresponds a unique w E M satisfying (v - w) E M' the map v H- w is called orthogonal projection of V onto M and will be denoted by 7rM.  2.1. HILBERT SPACE. 47 2.1.9 The Riesz representation theorem. Let V and W be two complex vector spaces. A map T:V -W is called linear if T(Ax+py)=AT(x)+ipT(Y) Vx,yeV, A,ApeC and is called anti-linear if T(Ax +py) = AT(x) + pT(Y) V x, y E V A,Ap E C. If £ : V - C is a linear map, (also known as a linear function) then ker £ := {x e V £(x) =0} has codimension one (unless £ 0). Indeed, if f (y) 0 then 1 £(x)=1 wherex y £(y) and for any z eV, z - £(z)x E ker L. If V is a normed space and £ is continuous, then ker (f) is a closed subspace. The space of continuous linear functions is denoted by V*. It has its own norm defined by l sup I(x)/ l. XEV,HHX|#O Suppose that H is a pre-hilbert space. Then we have an antilinear map S: H H*, (#(g))(f) := (f, g). The Cauchy-Schwartz inequality implies that ||#(g) g and in fact (gg) = g 2 shows that ||#(g)| = g In particular the map # is injective. The Riesz representation theorem says that if H is a Hilbert space, then this map is surjective:  48 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. Theorem 2.1.2 Every continuous linear function on H is given by scalar prod- uct by some element of H. The proof is a consequence of the theorem about projections applied to N :=ker £: If £ = 0 there is nothing to prove. If £ # 0 then N is a closed subspace of codimension one. Choose v g N. Then there is an x E N with (v - x) I N. Let 1 y := (v |v -cz| I). Then yIN and yll = For any f E H, so or [f -(f, y)y] 1 y f - (f, y)y E N f(f) = (f, y)£(y), so if we set then g := fY (f, g) = (f) for all f E H. QED 2.1.10 What is L2(T)? We have defined the space L2(T) to be the completion of the space C(T) under the L2 norm f|2 =(f, f) 2. In particular, every linear function on C(T) which is continuous with respect to the this L2 norm extends to a unique continuous linear function on L2 (T). By the Riesz representation theorem we know that every such continuous linear function is given by scalar product by an element of L2 (T). Thus we may think of the elements of L2(T) as being the linear functions on C(T) which are continuous with respect to the L2 norm. An element of L2(T) should not be thought of as a function, but rather as a linear function on the space of continuous functions relative to a special norm - the L2 norm.  2.1. HILBERT SPACE. 49 2.1.11 Projection onto a direct sum. Suppose that the closed subspace M of a pre-Hilbert space is the orthogonal direct sum of a finite number of subspaces M = 9 M meaning that the MI are mutually perpendicular and every element x of M can be written as x~>xi, x2 EJV2. (The orthogonality guarantees that such a decomposition is unique.) Suppose further that each M is such that the projection 7FM exists. Then 7FM exists and 7M(v) Z7M(v) -(2.13) Proof. Clearly the right hand side belongs to M. We must show v - > 7MZ (v) is orthogonal to every element of M. For this it is enough to show that it is orthogonal to each M since every element of M is a sum of elements of the M. So suppose xg E M. But (WMyV, z3) = 0 if i j. So (v - rM(v), z) -- (v -- M(v), zy) = 0 by the defining property of 7FMJ. 2.1.12 Projection onto a finite dimensional subspace. We now will put the equations (2.12) and (2.13) together: Suppose that M is a finite dimensional subspace with an orthonormal basis #2. This implies that M is an orthogonal direct sum of the one dimensional spaces spanned by the q5 and hence 7FM exists and is given by 7M(v)_Zai2bi where a2 = (v, #). (2.14) 2.1.13 Bessel's inequality. We now look at the infinite dimensional situation and suppose that we are given an orthonormal sequence {#c2 }1. Any v E V has its Fourier coefficients a2 =(v, #O2) relative to the members of this sequence. Bessel's inequality asserts that oo a2 v< 5|2, (2.15) 1 in particular the sum on the left converges.  50 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. Proof. Let n :Z= aiofi, i=1 so that vn is the projection of v onto the subspace spanned by the first n of the #i. In any event, (v - vn) I vn so by the Pythagorean Theorem |2 =|-vn|2 +||n||2 =v-vn2 + ai2. i=1 This implies that i=1 and letting n - oc shows that the series on the left of Bessel's inequality converges and that Bessel's inequality holds. 2.1.14 Parseval's equation. Continuing the above argument, observe that ||v - v|2 ->0 5|ai2 2 2 But v - on|2 - 0 if and only if v -on - 0 which is the same as saying that on - v. But vn is the n-th partial sum of the series E aiz, and in the language of series, we say that a series converges to a limit v and write E ai o = v if and only if the partial sums approach v. So aivi = v ai 2 = ||2. (2.16) In general, we will call the series Eia # the Fourier series of v (relative to the given orthonormal sequence) whether or not it converges to v. Thus Parseval's equality says that the Fourier series of v converges to v if and only if E ai 2 =v2 2.1.15 Orthonormal bases. We still suppose that V is merely a pre-Hilbert space. We say that an orthonor- mal sequence {#x} is a basis of V if every element of V is the sum of its Fourier series. For example, one of our tasks will be to show that the exponentials {e2nh} ___ form a basis of C(T). If the orthonormal sequence #5 is a basis, then any v can be approximated as closely as we like by finite linear combinations of the #i i fact by the partial sums of its Fourier series. We say that the finite linear combinations of the #j are dense in V. Conversely, suppose that the finite linear combinations of the  2.2. SELF-ADJOINT TRANSFORMATIONS. 51 #2 are dense in V. This means that for any v and any c > 0 we can find an n and a set of n complex numbers b2 such that ||v- 1: bioi|| 5 e But we know that vn is the closest vector to v among all the linear combinations of the first n of the #2. so we must have v -vn|| 0 VvEV.  2.2. SELF-ADJOINT TRANSFORMATIONS. 53 So if T is a non-negative self-adjoint transformation, then the rule which assigns to every pair of elements v and w the number (Tv, w) is a semi-scalar product to which we may apply the Cauchy-Schwartz inequality and conclude that (Tv,w) (Tv,v)2(Tw,w)K. Now let us assume in addition that T is bounded with norm Tl. Let us take w = Tv in the preceding inequality. We get Tv|2 =|(Tv,Tv)| (Tv,v)2(TTv,Tv). Now apply the Cauchy-Schwartz inequality for the original scalar product to the last factor on the right: (TTv,Tv) ||TTv|||Tv||b K ||T|||Tv|||Tv|| = |T Tv|, where we have used the defining property of T in the form TTvl T Tv Substituting this into the previous inequality we get Tv|2 < (Tv,v) 2|T|||Tv|. If Tv -f 0 we may divide this inequality by Tv to obtain T < K T l(Tv, v)2. (2.17) This inequality is clearly true if Tvl= 0 and so holds in all cases. We will make much use of this inequality. For example, it follows from (2.17) that (Tv,v)= 0 Tv = 0. (2.18) It also follows from (2.17) that if we have a sequence {vn} of vectors with (Tv,vn) - 0 then Tvv| - 0 and so (Tvo,vn)->0 - Tvn - 0. (2.19) Notice that if T is a bounded self adjoint transformation, not necessarily non- negative, then rI - T is a non-negative self-adjoint transformation if r > Tl: Indeed, ((rI - T)v, v) = r (v, v) - (Tv, v) ;> (,r - |Tl) (v, v) ;> 0 since, by Cauchy-Schwartz, (Tv,v) (Tv,v) K Tv|v| T v 2 =|T|(v,v). So we may apply the preceding results to rI - T.  54 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. 2.3 Compact self-adjoint transformations. We say that the self-adjoint transformation T is compact if it has the following property: Given any sequence of elements un e S, we can choose a subsequence un2 such that the sequence Tvun2 converges to a limit in V. Some remarks about this complicated looking definition: In case V is finite dimensional, every linear transformation is bounded, hence the sequence Tun lies in a bounded region of our finite dimensional space, and hence by the com- pleteness property of the real (and hence complex) numbers, we can always find such a convergent subsequence. So in finite dimensions every T is compact. More generally, the same argument shows that if R(T) is finite dimensional and T is bounded then T is compact. So the definition is of interest essentially in the case when R(T) is infinite dimensional. Also notice that if T is compact, then T is bounded. Otherwise we could find a sequence un of elements of S such that Tu|| > rn and so no subsequence Tune can converge. We now come to the key result which we will use over and over again: Theorem 2.3.1 Let T be a compact self-adjoint operator. Then R(T) has an orthonormal basis {$2} consisting of eigenvectors of T and if R(T) is infinite dimensional then the corresponding sequence {rn} of eigenvalues converges to 0. Proof. We know that T is bounded. If T = 0 there is nothing to prove. So assume that T / 0 and let m1 := T > 0. By the definition of T we can find a sequence of vectors un e S such that Tun| -- Tl. By the definition of compactness we can find a subsequence of this sequence so that Tun2 -- w for some w E V. On the other hand, the transformation T2 is self-adjoint and bounded by T 2. Hence T|2I - T2 is non-negative, and ( (|T|2 I - T 2)un, un ) = |T |2 - |T un||2- 0. So we know from (2.19) that T|2un - T2un -> 0. Passing to the subsequence we have T2un2= T(Tun2) -> Tw and so T|2n -> Tw or 1 un2 -> 2Tw. T1 Applying T to this we get Tun,~ -> l2T 2w T1  2.3. COMPACT SELF-ADJOINT TRANSFORMATIONS. 55 or T2w = mlw. Also ||w| =||Tl= m1 # 0. So w f 0. So w is an eigenvector of T2 with eigenvalue ml. We have 0= (T2 - mi)w = (T + mi)(T - mi)w. If (T - mi)w = 0, then w is an eigenvector of T with eigenvalue m1 and we normalize by setting 1 w. |w | Then |1|= 1 and T#1 =m1#1. If (T - mi)w f 0 then y := (T - mi)w is an eigenvector of T with eigenvalue -mi1 and again we normalize by setting 1 #1i:= y. y So we have found a unit vector #1 E R(T) which is an eigenvector of T with eigenvalue r1 = +m1. Now let V2 := #L. If (w, #1) = 0, then (Tw,1) = (w,T#1) = ri(w, #1) = 0. In other words, T (V2) C V2 and we can consider the linear transformation T restricted to V2 which is again compact. If we let m2 denote the norm of the linear transformation T when restricted to V2 then m2 <; m and we can apply the preceding procedure to find a unit eigenvector #2 with eigenvalue +m2. We proceed inductively, letting V := {#1,-.1. .,# 11 and find an eigenvector #n of T restricted to V with eigenvalue +mn f 0 if the restriction of T to V is not zero. So there are two alternatives: " after some finite stage the restriction of T to V is zero. In this case R(T) is finite dimensional with orthonormal basis #1, ..., #,_1. Or " The process continues indefinitely so that at each stage the restriction of T to V is not zero and we get an infinite sequence of eigenvectors and eigenvalues r2 with Irl ;> Iri+1|  56 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. The first case is one of the alternatives in the theorem, so we need to look at the second alternative. We first prove that rn| - 0. If not, there is some c > 0 such that Irn > c for all rn (since the r| are decreasing). If i f j,then by the Pythagorean theorem we have T#- T #4|2| T T ~2 2 2 +Tr ||#|2 Since ||il=/|#|= 1 this gives T# 2-| r2+r >2c2. Hence no subsequence of the T#o can converge, since all these vectors are at least a distance c 2 apart. This contradicts the compactness of T. To complete the proof of the theorem we must show that the #zi form a basis of R(T). So if w = Tv we must show that the Fourier series of w with respect to the #zi converges to w. We begin with the Fourier coefficients of v relative to the #i which are given by an = (vi#O). Then the Fourier coefficients of w are given by by=(w, #) =(Tv, #) =(v, Ti) =(v, ri) = riai. So w -3 b # = Tv -3 airi# = T(v - a#). i=1 i=1 i=1 Now v - E_ a1ii is orthogonal to #1,... ,#On and hence belongs to Vn+1. So m m T'(v - aioi)< rn+1||(v - aioi)|| i=1 i=1 By the Pythagorean theorem, m (v -Z aii) | v|| i=1 Putting the two previous inequalities together we get m m w-Zbii|| =T(v-Z ai$i)| |n+1|v| 0. i=1 i=1 This proves that the Fourier series of w converges to w concluding the proof of the theorem. The "converse" of the above result is easy. Here is a version: Suppose that H is a Hilbert space with an orthonormal basis {#z5} consisting of eigenvectors  2.4. FOURIER'S FOURIER SERIES. 57 of an operator T, so Tq5= A252, and suppose that A2 -- 0 as i -- oc. Then T is compact. Indeed, for each j we can find an N = N(j) such that Ar< V r > N(j). We can then let H3 denote the closed subspace spanned by all the eigenvectors #r,,r > N(j), so that H = Hr e H3 is an orthogonal decomposition and H' is finite dimensional, in fact is spanned the first N(j) eigenvectors of T. Now let {u} be a sequence of vectors with u j< 1 say. We decompose each element as ni = e u'u, ' E Hi, u2' E Hj. We can choose a subsequence so that u'k converges, because they all belong to a finite dimensional space, and hence so does Tuik since T is bounded. We can decompose every element of this subsequence into its H2 and H2 components, and choose a subsequence so that the first component converges. Proceeding in this way, and then using the Cantor diagonal trick of choosing the k-th term of the k-th selected subsequence, we have found a subsequence such that for any fixed j, the (now relabeled) subsequence, the H' component of Tub converges. But the H3 component of T u has norm less than 1/j, and so the sequence converges by the triangle inequality. 2.4 Fourier's Fourier series. We want to apply the theorem about compact self-adjoint operators that we proved in the preceding section to conclude that the functions e2Tx form an orthonormal basis of the space C(T). In fact, a direct proof of this fact is elementary, using integration by parts. So we will pause to given this direct proof. Then we will go back and give a (more complicated) proof of the same fact using our theorem on compact operators. The reason for giving the more complicated proof is that it extends to far more general situations. 2.4.1 Proof by integration by parts. We have let C(T) denote the space of continuous functions on the real line which are periodic with period 27. We will let C1 (T) denote the space of periodic functions which have a continuous first derivative (necessarily periodic) and by C2 (T) the space of periodic functions with two continuous derivatives. If f and g both belong to C1 (T) then integration by parts gives 1 f71 d 1 J 'd  58 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. since the boundary terms, which normally arise in the integration by parts formula, cancel, due to the periodicity of f and g. If we take g = einx/(in), n 0 the integral on the right hand side of this equation is the Fourier coefficient: c 2w= J f(x)e-ixdx. We thus obtain c 4 j f'(x)e-inxdx in 27rj- _T so, for n f 0, A 1_ c <; - where A := If'(x)|dx is a constant independent of n (but depending on f). If f E C2(T) we can take g(x) = -einx/n2 and integrate by parts twice. We conclude that (for n 0) B 1 f* p c l < where B :f"x)2 S2 27 n 27r _ is again independent of n. But this proves that the Fourier series of f, Sel"n converges uniformly and absolutely for and f E C2 (T). The limit of this series is therefore some continuous periodic function. We must prove that this limit equals f. So we must prove that at each point f Se CrE f (y). Replacing f(x) by f(x - y) it is enough to prove this formula for the case y = 0. So we must prove that for any f E C2(T) we have M lim c > f(0). N, M o- -N Write f(x) = (f(x) - f(0)) + f(0). The Fourier coefficients of any constant function c all vanish except for the co term which equals c. So the above limit is trivially true when f is a constant. Hence, in proving the above formula, it is enough to prove it under the additional assumption that f(0) = 0, and we need to prove that in this case lim (C-N --c-N+1+...+CM)-0. N,M-*oo The expression in parenthesis is 2w1 f(x)gN,M~x)dz  2.4. FOURIER'S FOURIER SERIES. 59 where gN,M(x) e iNx +e-i(N-1)x+. . iMx __ -iNx 1 + eix +... + ei(M+N)x)_ iNx 1 - ei(M+N+1)x e-iNx - ei(M+1)x 1 -ex1 - where we have used the formula for a geometric sum. By l'H6pital's rule, this extends continuously to the value M + N + 1 for x = 0. Now f (0) = 0, and since f has two continuous derivatives, the function f(x) h(x) = . 1 - e-ix defined for x f 0 (or any multiple of 27) extends, by l'H6pital's rule, to a func- tion defined at all values, and which is continuously differentiable and periodic. Hence the limit we are computing is h(x)eiNxd_ -ih(x)ei(M+1)xdx and we know that each of these terms tends to zero. We have thus proved that the Fourier series of any twice differentiable peri- odic function converges uniformly and absolutely to that function. If we consider the space C2(T) with our usual scalar product 1 P (f, g) =w] fgdx then the functions einx are dense in this space, since uniform convergence implies convergence in the || || norm associated to ( , ). So, on general principles, Bessel's inequality and Parseval's equation hold. It is not true in general that the Fourier series of a continuous function converges uniformly to that function (or converges at all in the sense of uniform convergence). However it is true that we do have convergence in the L2 norm, i.e. the Hilbert space || || norm on C(T). To prove this, we need only prove that the exponential functions einx are dense, and since they are dense in C2 (T), it is enough to prove that C2(T) is dense in C(T). For this, let # be a function defined on the line with at least two continuous bounded derivatives with #(0) = 1 and of total integral equal to one and which vanishes rapidly at infinity. A favorite is the Gauss normal function #(x) 1e-2/2 27 Equally well, we could take # to be a function which actually vanishes outside of some neighborhood of the origin. Let 1 ,M  60 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. As t - 0 the function #t becomes more and more concentrated about the origin, but still has total integral one. Hence, for any bounded continuous function f, the function #5 * f defined by (* f)(x) j f(x - y)#i (y)dy = f (u)#t(x - u)du. satisfies #t * f - f uniformly on any finite interval. From the rightmost expres- sion for #z5 * f above we see that #z5 * f has two continuous derivatives. From the first expression we see that # * f is periodic if f is. This proves that C2 (T) is dense in C(T). We have thus proved convergence in the L2 norm. 2.4.2 Relation to the operator y Each of the functions e2nxis an eigenvector of the operator d D = dx in that D (eih) = ine2. So they are also eigenvalues of the operator D2 with eigenvalues -n2. Also, on the space of twice differentiable periodic functions the operator D2 satisfies 1 ____ 1 i_ (D2f, g) = 12 f f"(x)g(x)dx = f'(x)g(x) -- I7 f'(x)g'(x)dx by integration by parts. Since f' and g are assumed to be periodic, the end point terms cancel, and integration by parts once more shows that (D2f,g) = (f,D2g) But of course D and certainly D2 is not defined on C(T) since some of the func- tions belonging to this space are not differentiable. Furthermore, the eigenvalues of D2 are tending to infinity rather than to zero. So somehow the operator D2 must be replaced with something like its inverse. In fact, we will work with the inverse of D2 - 1, but first some preliminaries. We will let C2([[-w, 7]) denote the functions defined on [-w.w] and twice dif- ferentiable there, with continuous second derivatives up to the boundary. We denote by C([-7, 7]) the space of functions defined on [-7, 7] which are continu- ous up to the boundary. We can regard C(T) as the subspace of C([-7, 7]) con- sisting of those functions which satisfy the boundary conditions f(7) = f(-7) (and then extended to the whole line by periodicity). We regard C([-7, 7]) as a pre-Hilbert space with the same scalar product that we have been using: (f, g) j f (x)g(x)dz.  2.4. FOURIER'S FOURIER SERIES. 61 If we can show that every element of C([-7, 7r]) is a sum of its Fourier series (in the pre-Hilbert space sense) then the same will be true for C(T). So we will work with C([-7r,7]). We can consider the operator D2 - 1 as a linear map D2 - 1 : C2([-7,w]) -C ,7]). This map is surjective, meaning that given any continuous function g we can find a twice differentiable function f satisfying the differential equation f"-f=g. In fact we can find a whole two dimensional family of solutions because we can add any solution of the homogeneous equation h"- h =0 to f and still obtain a solution. We could write down an explicit solution for the equation f" - f = g, but we will not need to. It is enough for us to know that the solution exists, which follows from the general theory of ordinary differential equations. The general solution of the homogeneous equation is given by h(x) = aeX + be-x. Let M c C2 be the subspace consisting of those functions which satisfy the "periodic bound- ary conditions" f(7) = f(-7), f'(w) = f'(-7). Given any f we can always find a solution of the homogeneous equation such that f - h E M. Indeed, we need to choose the complex numbers a and b such that if h is as given above, then h(w) - h(-7) = f(7) - f(-7), and h'(7) - h'(-7) = f'(7) - f'(-7). Collecting coefficients and denoting the right hand side of these equations by c and d we get the linear equations (e - e--")(a - b) = c, (e" - e--")(a + b) = d which has a unique solution. So there exists a unique operator T :C([-7,7]) - M with the property that (D2 - I) o T =I.  62 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. We will prove that T is self adjoint and compact. (2.20) Once we will have proved this fact, then we know every element of M can be expanded in terms of a series consisting of eigenvectors of T with non-zero eigenvalues. But if Tw= w then D2w= (D2 - I)w+w [(D2 -I) oT]w+w= + w. So w must be an eigenvector of D2; it must satisfy w" =pw. So if p = 0 then w = a constant is a solution. If p = r2 > 0 then w is a linear combination of erx and e-rx and as we showed above, no non-zero such combination can belong to M. If p = -r2 then the solution is a linear combination of eirx and e--rx and the above argument shows that r must be such that e2r = e-- so r =rn is an integer. Thus (2.20) will show that the emnx are a basis of M, and a little more work that we will do at the end will show that they are in fact also a basis of C([-7r,7r]). But first let us work on (2.20). It is easy to see that T is self adjoint. Indeed, let f = Tu and g = Tv so that f and g are in M and (u, Tv) = ([D2 - 1]f, g) =--(f', g') - (f,g) = (f, [D2 - 1]g) = (Tu, v) where we have used integration by parts and the boundary conditions defining M for the two middle equalities. 2.4.3 Girding's inequality, special case. We now turn to the compactness. We have already verified that for any f E M we have ([D2 _1]ff)=-(f',f') (f, f). Taking absolute values we get f'l2 + |f2< <([D2-1]f, f)|. (2.21) (We actually get equality here, the more general version of this that we will develop later will be an inequality.) Let u = [192 - 1]f and use the Cauchy-Schwartz inequality ([D 2 _1f)=(ttAf) vi f  2.4. FOURIER'S FOURIER SERIES. 63 on the right hand side of (2.21) to conclude that f 2 f or f ll. Use (2.21) again to conclude that by the preceding inequality. We have f = Tu, and let us now suppose that u lies on the unit sphere i.e. that |l= 1. Then we have proved that f <1, and f'l 1. ((2.22) We wish to show that from any sequence of functions satisfying these two condi- tions we can extract a subsequence which converges. Here convergence means, of course, with respect to the norm given by f|2jf(x)|2dx. In fact, we will prove something stronger: that given any sequence of functions satisfying (2.22) we can find a subsequence which converges in the uniform norm f| : max f(x). xE[-7,71] Notice that f f(x)|2dx 7 ( lf|(f )2dx f|o so convergence in the uniform norm implies convergence in the norm we have been using. To prove our result, notice that for any 7 a < b wr we have b b f (b) - f (a) = f'(x)dx j f'(x)dx= 27r( f'l, 1[a,b]) where 1[a,b] is the function which is one on [a, b] and zero elsewhere. Apply Cauchy-Schwartz to conclude that (If',1[ab]) f' |1[ab] But 1 1 |2 2= l b-a and f'||i |f' < 1.  64 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. We conclude that f (b) - f (a) <(27) 2lb - al 2. (2.23) In this inequality, let us take b to be a point where f takes on its maximum value, so that f(b) =f|. Let a be a point where f takes on its minimum value. (If necessary interchange the role of a and b to arrange that a < b or observe that the condition a < b was not needed in the above proof.) Then (2.23) implies that f l-minf <(2)lb-al. But 1 > f (1 f l2(x)dx (min f)2dx) =min f and b-a <2 so f < 1+ 27. Thus the values of all the f E T[S] are all uniformly bounded - (they take values in a circle of radius 1 + 27) and they are equicontinuous in that (2.23) holds. This is enough to guarantee that out of every sequence of such f we can choose a uniformly convergent subsequence. (We recall how the proof of this goes: Since all the values of all the f are bounded, at any point we can choose a subsequence so that the values of the f at that point converge, and, by passing to a succession of subsequences (and passing to a diagonal), we can arrange that this holds at any countable set of points. In particular, we may choose say the rational points in [-7, r]. Suppose that fn is this subsequence. We claim that (2.23) then implies that the fn form a Cauchy sequence in the uniform norm and hence converge in the uniform norm to some continuous function. Indeed, for any c choose 6 such that (27r)6o 1<6, 3 choose a finite number of rational points which are within 6 distance of any point of [-7, 7] and choose N sufficiently large that fi - f3| < }E at each of these points, r. when i and j are > N. Then at any x E [-7, 7] fi (x) - f3(x) < fi (x) - fi(r) + f3(x) - fM(r)|+|fi(r) - f3(r) < E since we can choose r such that that the first two and hence all of the three terms is < E. 2.5 The Heisenberg uncertainty principle. In this section we show how the arguments leading to the Cauchy-Schwartz in- equality give one of the most important discoveries of twentieth century physics, the Heisenberg uncertainty principle.  2.5. THE HEISENBERG UNCERTAINTY PRINCIPLE. 65 Let V be a pre-Hilbert space, and S denote the unit sphere in V. If # and V) are two unit vectors (i.e. elements of S) their scalar product (#, /) is a complex number with 0 < (#, ,))|2 < 1. In quantum mechanics, this number is taken as a probability. Although in the "real world" V is usually infinite dimensional, we will warm up by considering the case where V is finite dimensional. Given a e S and an orthonormal basis #1, ... , #0 of V, we have 1 = ||#||2)2 + -.-.-+ |(#, # |2. The says that the various probabilities (#, #z5)|2 add up to one. We recall some language from elementary probability theory: Suppose we have an experiment resulting in a finite number of measured numerical outcomes Aj, each with prob- ability p2 of occurring. Then the mean (A) is defined by (A) := A1p1 + -.-.-+ Anpa and its variance (AA)2 (AA)2 :2 (A1 - (1))2p + ... + (An - (A))2pn and an immediate computation shows that (AA)2 (A2) (A)2. The square root AA of the variance is called the standard deviation. The vari- ance (or the standard deviation) measures the "spread" of the possible values of A. To understand its meaning we have Chebychev's inequality which estimates the probability that Ak can deviate from (A) by as much as r/A for any posi- tive number r. Chebychev's inequality says that this probability is < 1/r2. In symbols 1 Prob (|A - (A)| > rA) r. Indeed, the probability on the left is the sum of all the Pk such that Ak - (A)| rA. Denoting this sum by E. we have (A A 1A))2 PPr2(AA)2(A - )1: r2(AA)2 (A - (A))2pk 2 allk all k Replacing A2 by A2 + c does not change the variance. Now suppose that A is a self-adjoint operator on V, that the A2 are the eigenvalues of A with eigenvectors #i constituting an orthonormal basis, and that the p2 =|(#z, #zl2 as above. 1. Show that (A) =(A#z, #z) and that (A A)2 =(A2#b, #z) - (A#z, #)2  66 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. We will write the expression (A#,#0) as (A)0, In quantum mechanics a unit vector is called a state and a self-adjoint operator is called an observable and the expression (A)o is called the expectation of the observable A in the state 0. Similarly we denote ((A2#, #) - (A#, #)2)1/2 by AoA. It is called the uncertainty of the observable A in the state #. Notice that (Ag A)2 =((A - ())) where I denotes the identity operator. Indeed ((A - (A#, #)I)2 = A2 - 2(A#, #)A + (A#, #)2I so ((A - (A)0)2)0 = (A2#, #) - 2(A) + (A)2 = (A2)0 - (A)). When the state # is fixed in the course of discussion, we will drop the sub- script # and write (A) and /A instead of (A)o and AgA. For example, we would write the previous result as DA = ((A - (A)I)2). If A and B are operators we let [A, B] denote the commutator: A, B] : = AB - BA. Notice that [A, B] -[B, A] and [I, B] = 0 for any B. So if A and B are self adjoint, so is i[A, B] and replacing A by A - (A)I and B by B - (B)I does not change DA, AB or i[A, B]. The uncertainty principle says that for any two observables A and B we have 1 (AA)(AB) > -(i[A, B])|q. 2 Proof. Set A1 := A - (A), B1 := B - (B) so that [A1,B1] =_[A,B]. Let := A1# + ixB1#q. Then (7", @4) =(AA)2 - x(i[A, B]) + (/XB)2. Since (/, /) ;> 0 for all x this implies that (b2 4ac) that (i[A, B])2 4(AA)2(AB)2, and taking square roots gives the result. The purpose of the next few sections is to provide a vast generalization of the results we obtained for the operator D2. We will prove the corresponding results for any "elliptic" differential operator (definitions below).  2.6. THE SOBOLEV SPACES. 67 I plan to study differential operators acting on vector bundles over manifolds. But it requires some effort to set things up, and I want to get to the key analytic ideas which are essentially repeated applications of integration by parts. So I will start with elliptic operators L acting on functions on the torus T = Tm, where there are no boundary terms when we integrate by parts. Then an immediate extension gives the result for elliptic operators on functions on manifolds, and also for boundary value problems such as the Dirichlet problem. The treatment here rather slavishly follows the treatment by Bers and Schechter in Partial Differential Equations by Bers, John and Schechter AMS (1964). 2.6 The Sobolev Spaces. Recall that T now stands for the n-dimensional torus. Let P = P(T) denote the space of trigonometric polynomials. These are functions on the torus of the form u(x) = 2Sae*x where is an n-tuplet of integers and the sum is finite. For each integer t (positive, zero or negative) we introduce the scalar product (u, v) 5:= (1 + £ - £)tad bf. (2.24) For t = 0 this is the scalar product (u, v)o =(2) u(x)v(x)dz. (2,)n IT This differs by a factor of (27r) -- from the scalar product that is used by Bers and Schecter. We will denote the norm corresponding to the scalar product (, )s by | s. If 02 02 - (x1))2 + + ( )2 the operator (1 + A) satisfies (1+A)u = j(1 + f - f)a e'* and so ((1 + A)tu, v)8 = (u, (1 + A)tv)8 = (u, v)s+t and Rl +,Vuh = Mk+2t- (2.25)  68 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. We then get the "generalized Cauchy-Schwartz inequality" |LuV)s v |u|s t|v|s (2.26) for any t, as a consequence of the usual Cauchy-Schwartz inequality. Indeed, Z(+&1)22 (1e ere 1ffia(+-) b ((1+A) u,(1+A) v)o (1+-A) allo l(1+A)v2tllo | s+t v s_t . The generalized Cauchy-Schwartz inequality reduces to the usual Cauchy- Schwartz inequality when t = 0. Clearly we have u|| | | ifs < t. If DP denotes a partial derivative, DP p &(x1)P1 ... (z")pm then Dpu = (it)Pafcie"". In these equations we are using the following notations: " If p = (p1,. . . , pn) is a vector with non-negative integer entries we set p| :=p1+--+pn. " If (1, . .. , fn) is a (row) vector we set It is then clear that DPvl i | +(2.27) and similarly llli (constant depending on t) ||DII o if t ;> 0. (2.28) In particular,  2.6. THE SOBOLEV SPACES. 69 Proposition 2.6.1 The norms t > 0 and are equivalent. We let Ht denote the completion of the space P with respect to the norm t. Each Ht is a Hilbert space, and we have natural embeddings Ht c H. if s < t. Equation (2.25) says that (1 + A)t : Hs+2t - H and is an isometry. From the generalized Schwartz inequality we also have a natural pairing of Ht with Ht given by the extension of ( , )o, so (tuV)o < v _ u . (2.29) In fact, this pairing allows us to identify H_t with the space of continuous linear functions on Ht. Indeed, if # is a continuous linear function on Ht the Riesz representation theorem tells us that there is a w E Ht such that #(u) (u, w)t. Set v := (1 + A)tw. Then v E H_t and (u, v)o = (u, (1 + A)tw)o = (u, W)t = #(u) We record this fact as H_t = (Ht)*. (2.30) As an illustration of (2.30), observe that the series j(1 + f - f) converges for n This means that if define v by taking b - 1  70 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. then v E Hs for s < - . If u is given by u(x) = E aex is any trigonometric polynomial, then (u, v)o = af = u(0). So the natural pairing (2.29) allows us to extend the linear function sending u F u(0) to all of Ht if t > 2. We can now give v its "true name": it is the Dirac "delta function" 6 (on the torus) where (u, 6)o = u(0). So 6 E H_t for t > 2, and the preceding equation is usually written symbolically as 1 u(x)6(x)dx = u(0); (2,r)n T but the true mathematical interpretation is as given above. We set Ho, := 0 Ht, H_,,, := U Ht. The space Ho is just L2(T), and we can think of the space Ht, t > 0 as consisting of those functions having "generalized L2 derivatives up to order t". Certainly a function of class Ct belongs to Ht. With a loss of degree of differentiability the converse is true: Lemma 2.6.1 [Sobolev.] If u E Ht and t [n=] +k+1 -2 then u E Ck(T) and sup Dpu(x)| const.lull for p [n/2] + 1. Then ( arl)2 J:(1 + £ - £)t|af 2 (1 + £ -t *< oo, since the series Z(1 + £ - £)-t converges for t ;> [n/2] + 1. So for this range of t, the Fourier series for u converges absolutely and uniformly. The right hand side of the above inequality gives the desired bound. QED A distribution on Th is a linear function T on C°(T n) with the continuity condition that (T,$k ) ->0 whenever DP~3k > 0  2.6. THE SOBOLEV SPACES. 71 uniformly for each fixed p. If u eH_t we may define (u,#): (#,)o and since C° (T) is dense in Ht we may conclude Lemma 2.6.2 H_t is the space of those distributions T which are continuous in the|t norm, i.e. which satisfy ||#k|t 0 -> (T,#k) + . We then obtain Theorem 2.6.1 [Laurent Schwartz.] H_,c is the space of all distributions. In other words, any distribution belongs to H_t for some t. Proof. Suppose that T is a distribution that does not belong to any Ht. This means that for any k > 0 we can find a C° function #k with 1 |kk k< and (T,#k) >1. But by Lemma 2.6.1 we know that ||#k k < implies that Dpi5k - 0 uniformly for any fixed p contradicting the continuity property of T. QED Suppose that # is a C° function on T. Multiplication by # is clearly a bounded operator on Ho = L2(T), and so it is also a bounded operator on Ht, t > 0 since we can expand DP(5u) by applications of Leibnitz's rule. For t = -s < 0 we know by the generalized Cauchy Schwartz inequality that u||ti = sup |(v,#ujio||v|| = sup |(u, zv)/||v|| < ||||||v||8||v||. So in all cases we have Ollt < (const. depending on # and t)||all. (2.32) Let L = a,(x)DP l|pl N. Let Zt be the subspace of Ht consisting of all u such that a = 0 when £ - £ < N. This is a space of finite codimension, and hence the unit ball of Z- c Ht can be covered by finitely many balls of radius 2. The space Z- consists of all u such that af = 0 when £ - £> N. The image of Z in HS is the orthogonal complement of the image of Zt. On the other hand, for u E B n Z we have ull <(1+ N)8- 2 < (,)2 So the image of B n Z is contained in a ball of radius 2. Every element of the image of B can be written as a(n orthogonal) sum of an element in the image of B n Z' and an element of B n Zt and so the image of B is covered by finitely many balls of radius E. QED 2.7 Girding's inequality. Let x, a, and b be positive numbers. Then za + x-b > 1 because if x > 1 the first summand is > 1 and if x < 1 the second summand is > 1. Setting x = E1/aA gives 1 eAa + +-b/aA-b if c and A are positive. Suppose that t1 > s > t2 and we set a = t1 -s, b =s-t2 and A = + £ - f. Then we get (1 + £ - £)S E(1 + £ - £)tl + E-(S-t2)/(t1-s) (1 + £ -, )t2 and therefore ||l | i j 1+ (S-t2)/(t1- s)||| if ti > s > t2, E > 0 (2.34) for all u E Ht1. This elementary inequality will be the key to several arguments in this section where we will combine (2.34) with integration by parts. A differential operator L =ZQp ap(x)DP with real coefficients and m even is called elliptic if there is a constant c > 0 such that (-1)m/2 S:a,(x)P ;> c(( . ()m/2. (2.35) In this inequality, the vector ( is a "dummy variable". (Its true invariant signif- icance is that it is a covector, i.e. an element of the cotangent space at z.) The  2.7. GARDING'S INEQUALITY. 73 expression on the left of this inequality is called the symbol of the operator L. It is a homogeneous polynomial of degree m in the variable ( whose coefficients are functions of x. The symbol of L is sometimes written as o-(L) or o-(L)(x, ). Another way of expressing condition (2.35) is to say that there is a positive constant c such that o-(L)(x,();>c for all x and ( such that - = 1. We will assume until further notice that the operator L is elliptic and that m is a positive even integer. Theorem 2.7.1 [Girding's inequality.] For every u e C (T) we have (u,Lu)o > c1| r/2 -c2| ll (2.36) where c1 and c2 are constants depending on L. Remark. If u E Hm/2, then both sides of the inequality make sense, and we can approximate u in the m/2 norm by C° functions. So once we prove the theorem, we conclude that it is also true for all elements of Hm/2. We will prove the theorem in stages: 1. When L is constant coefficient and homogeneous. 2. When L is homogeneous and approximately constant. 3. When the L can have lower order terms but the homogeneous part of L is approximately constant. 4. The general case. Stage 1. L = p-m a Dp where the a are constants. Then (u, Lu)o ag eif-, EaO(i )P at eif- |,|1=m o c (f - )m/2|af|2 by (2.35) cZ[1 + (f .-£)m/2]|a|2 - c|2 > cc|CC| /2 - c||lo where 1 + rm/2 C =sup r>o (1 + r)m/2 This takes care of stage 1.  74 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. Stage 2. L = Lo + L1 where Lo is as in stage 1 and L1 =IElpl-m ,3(x)DP and max l,3(x)| < r, where 77 sufficiently small. (How small will be determined very soon in the course of the discussion.) We have (u, Lou)o;> c'lul /2 - c||2o from stage 1. We integrate (u, Liv)o by parts m/2 times. There are no boundary terms since we are on the torus. In integrating by parts some of the derivatives will hit the coefficients. Let us collect all the these terms as '2. The other terms we collect as I1, so I1 S J bp' +ipDP'uDPvudx where p' p"|= m/2 and br= +#3. We can estimate this sum by Iil < T7 -const.llul m/2 and so will require that i - (const.) < c'. The remaining terms give a sum of the form '2 = ( bpiqDP'uDqudx where p' < m/2, q' < m/2 so we have I2 < const.lull || l _1. Now let us take m m s --1, ti-, 2t2 in (2.34) which yields, for any c > 0, || lly - < C||vll_ + e-m/2 Substituting this into the above estimate for '2 gives I2 < Econst.l u r/2 + Em/2const.lul m/2| llo For any positive numbers a, b and ( the inequality ((a -(-'b)2 > 0 implies that 2ab < (2a2 + -2b2. Taking (2 E2+1 we can replace the second term on the right in the preceding estimate for I2 by e-m-1 -*const.lull at the cost of enlarging the constant in front of ||i|2W We have thus established that Ii <; 17- (const.)1||allm/2  2.7. GARDING'S INEQUALITY. 75 where the constant depends only on m, and I2 e(const.)2|u||/2 + e-m-lconst.||2 where the constants depend on L1 but c is at our disposal. So if i(const.)1 < c' and we then choose c so that e(const.)2 < c' - i - (const.)i we obtain Girding's inequality for this case. Stage 3. L = Lo + L1 + L2 where Lo and L1 are as in stage 2, and L2 is a lower order operator. Here we integrate by parts and argue as in stage 2. Stage 4, the general case. Choose an open covering of T such that the variation of each of the highest order coefficients in each open set is less than the 77 of stage 1. (Recall that this choice of 77 depended only on m and the c that entered into the definition of ellipticity.) Thus, if v is a smooth function supported in one of the sets of our cover, the action of L on v is the same as the action of an operator as in case 3) on v, and so we may apply Girding's inequality. Choose a finite subcover and a partition of unity {#2} subordinate to this cover. Write #2 = (where we choose the # so that the + are smooth). So E + 1. Now (@iu, L( /n))0 c"| yu | 2 - const.lliull where c" is a positive constant depending only on c, ?7, and on the lower order terms in L. We have (u, Lu)o =J(Z iu)Ludx = (yU, L4itt)o + R where R is an expression involving derivatives of the i and hence lower order derivatives of u. These can be estimated as in case 2) above, and so we get (u, Lu)o c'" | /|il 2 - const.llul 20(2.37) since |/|iull 0 o. Now ||lm/2 is equivalent, as a norm, to >pm/2|Dpullo as we verified in the preceding section. Also 5:Dp(@n)|o = D ullo + R' where R' involves terms differentiating the V) and so lower order derivatives of u. Hence Si /2 > pos. const.lu rn/2 - const.lullo by the integration by parts argument again. Hence by (2.37) (u, Lu)o c"' ||iu/l 2 - const.lul20 > pos. const.lu m/2 - const.lul0  76 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. which is Girding's inequality. QED For the time being we will continue to study the case of the torus. But a look ahead is in order. In this last step of the argument, where we applied the partition of unity argument, we have really freed ourselves of the restriction of being on the torus. Once we make the appropriate definitions, we will then get Girding's inequality for elliptic operators on manifolds. Furthermore, the consequences we are about to draw from GArding's inequality will be equally valid in the more general setting. 2.8 Consequences of Girding's inequality. Proposition 2.8.1 For every integer t there is a constant c(t) = c(t, L) and a positive number A = A(t, L) such that ll < c(t)|Lu + Aull[-m (2.38) when for all smooth u, and hence for all u e Ht. Proof. Let s be some non-negative integer. We will first prove (2.38) for t = s + 2. We have | ||Lu + Au||t-, = | ||Lu + Au||-2 ull||(1+ A)8Lv+A(1+ A)ull-s- > (u, (1 + A)SLv + A(1 + A)Sv)o by the generalized Cauchy - Schwartz inequality (2.26). The operator (1 + A)8L is elliptic of order m + 2s so (2.25) and GArding's inequality gives (u, (1+ A)SLv+A(1 + A)Su)o >c1ll+ - c2||lo+uA i. Since |l ;> u o we can combine the two previous inequalities to get llLu + Aullt-m ;>cl l +(A- c2)||||. If A > c2 we can drop the second term and divide by |ll to obtain (2.38). We now prove the proposition for the case t = 2 - s by the same sort of argument: We have vlli|Lvi+ Avll- = |(1 + A)-8vls |Lvi+Av Aul-- > ((1 + A)8u, L(1+ A)8(1+ A)-Su+ Av)0.  2.8. CONSEQUENCES OF GARDING'S INEQUALITY. 77 Now use the fact that L(1 + A)s is elliptic and GArding's inequality to continue the above inequalities as >C 1(1+A)-su+ 2 -c2|(1+ A)-qsl o+A ll| =c1| ll-c2|l2t+Al|2 > c1|il if A > c2. Again we may then divide by |ll to get the result. QED The operator L + Al is a bounded operator from Ht to Ht-m (for any t). Suppose we fix t and choose A so large that (2.38) holds. Then (2.38) says that (L+AI) is invertible on its image, and bounded there with a bound independent of A > A, and this image is a closed subspace of Ht-m. Let us show that this image is all of Ht-m for A large enough. Suppose not, which means that there is some w E Ht-m with (w, Lu + Au)t-m = 0 for all u E Ht. We can write this last equation as ((1 + A)t-mw, Lu + Au)o = 0. Integration by parts gives the adjoint differential operator L* characterized by (#,L )0 = (L*#, @))o for all smooth functions # and @, and by passing to the limit this holds for all elements of Hr for r > m. The operator L* has the same leading term as L and hence is elliptic. So let us choose A sufficiently large that (2.38) holds for L* as well as for L. Now 0 = ((1 + A)t-mw, Lu + Au) O= (L*(1 + A)t-mw + A(1 + A)t-mwvi) for all u E Ht which is dense in Ho so L*(1 + A)t-mw + A(1 + A)t-mw= 0 and hence (by (2.38)) (1 + A)t-mw= 0 so w = 0. We have proved Proposition 2.8.2 For every t and for A large enough (depending on t) the operator L + Al maps Ht bijectively onto Htm and (L + AI)-1 is bounded independently of A. As an immediate application we get the important Theorem 2.8.1 If u is a distribution and Lu E H. then u E Hs+m. Proof. Write f = Lu. By Schwartz's theorem, we know that u e Hk for some k. So f + Ai E Hmin(k,s) for any A. Choosing A large enough, we conclude that vi= (L + AI)-1(f + Avi) E Hmin(k~m,s~m). If k + m < s + m we can repeat  78 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. the argument to conclude that u E Hmin(k+2m,s+m). we can keep going until we conclude that u E Hs+m. QED Notice as an immediate corollary that any solution of the homogeneous equa- tion Lu = 0 is C°. We now obtain a second important consequence of Proposition 2.8.2. Choose A so large that the operators (L+AI)- and (L*+AI)-- exist as operators from Ho - Hm. Follow these operators with the injection Lm : Hm -> Ho and set M:=tmo(L+AI)-1, M* :=Lmo(L*+I)-1. Since tm is compact (Rellich's lemma) and the composite of a compact operator with a bounded operator is compact, we conclude Theorem 2.8.2 The operators M and M* are compact. Suppose that L = L*. (This is usually expressed by saying that L is "for- mally self-adjoint". More on this terminology will come later.) This implies that M = M*. In other words, M is a compact self adjoint operator, and we can apply Theorem 2.3.1 to conclude that eigenvectors of M form a basis of R(M) and that the corresponding eigenvalues tend to zero. Prop 2.8.2 says that R(M) is the same as tm(Hm) which is dense in Ho = L2(T). We con- clude that the eigenvectors of M form a basis of L2(T). If Mu = ru then u = (L + AI)Mu = rLu + Aru so u is an eigenvector of L with eigenvalue 1 - rA r We conclude that the eigenvectors of L are a basis of Ho. We claim that only finitely many of these eigenvalues of L can be negative. Indeed, since we know that the eigenvalues rn of M approach zero, the numerator in the above expression is positive, for large enough n, and hence if there were infinitely many negative eigenvalues pk, they would have to correspond to negative rk and so these p1k -- -oc. But taking sk --k as the A in (2.38) in Prop. 2.8.1 we conclude that u = 0, if Lu = pku if k is large enough, contradicting the definition of an eigenvector. So all but a finite number of the rn are positive, and these tend to zero. To summarize: Theorem 2.8.3 The eigenvectors of L are C° functions which form a basis of Ho. Only finitely many of the eigenvalues p1k of L are negative and pn -> oc as n - oo. It is easy to extend the results obtained above for the torus in two directions. One is to consider functions defined in a domain =1bounded open set g of R and the other is to consider functions defined on a compact manifold. In both cases a few elementary tricks allow us to reduce to the torus case. We sketch what is involved for the manifold case.  2.9. EXTENSION OF THE BASIC LEMMAS TO MANIFOLDS. 79 2.9 Extension of the basic lemmas to manifolds. Let E - M be a vector bundle over a manifold. We assume that M is equipped with a density which we shall denote by dx and that E is equipped with a positive definite (smoothly varying) scalar product, so that we can define the L2 norm of a smooth section s of E of compact support: s := Is2(x)dx. JM Suppose for the rest of this section that M is compact. Let {U2} be a finite cover of M by coordinate neighborhoods over which E has a given trivialization, and p2 a partition of unity subordinate to this cover. Let q5 be a diffeomorphism or UZ with an open subset of T' where n is the dimension of M. Then if s is a smooth section of E, we can think of (pis) o#1 as an Rm or Cm valued function on Ti, and consider the sum of the k norms applied to each component. We shall continue to denote this sum by p2 f o #b71|k and then define f|k :-- pifo# 1 k where the norms on the right are in the norms on the torus. These norms depend on the trivializations and on the partitions of unity. But any two norms are equivalent, and the o norm is equivalent to the "intrinsic" L2 norm defined above. We define the Sobolev spaces Wk to be the completion of the space of smooth sections of E relative to the norm ||||k for k > 0, and these spaces are well defined as topological vector spaces independently of the choices. Since Sobolev's lemma holds locally, it goes through unchanged. Similarly Rellich's lemma: if sn is a sequence of elements of Wf which is bounded in the | norm for £ > k, then each of the elements pism o #71 belong to Hf on the torus, and are bounded in the | norm, hence we can select a subsequence of pism o #1 which converges in Hk, then a subsubsequence such that pism o #71 for i= 1, 2 converge etc. arriving at a subsequence of s which converges in Wk. A differential operator L mapping sections of E into sections of E is an operator whose local expression (in terms of a trivialization and a coordinate chart) has the form Ls = ap(x)Dps Here the a, are linear maps (or matrices if our trivializations are in terms of Rm). Under changes of coordinates and trivializations the change in the coefficients are rather complicated, but the symbol of the differential operator o-(L)( ) := a,(x)(p e T*Ma |p|=m is well defined.  80 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. If we put a Riemann metric on the manifold, we can talk about the length I of any cotangent vector. If L is a differential operator from E to itself (i.e. F=E) we shall call L even elliptic if m is even and there exists some constant C such that (v,o(L)(()v) ;>Cl'lVl2 for all x E M, v e Ex, E E T*MX and ( , ) denotes the scalar product on Ex. Girding's inequality holds. Indeed, locally, this is just a restatement of the (vector valued version) of Girding's inequality that we have already proved for the torus. But Stage 4 in the proof extends unchanged (other than the replacement of scalar valued functions by vector valued functions) to the more general case. 2.10 Example: Hodge theory. We assume knowledge of the basic facts about differentiable manifolds, in par- ticular the existence of an operator d :Ok Q Qk+1 with its usual properties, where Qk denotes the space of exterior k-forms. Also, if M is orientable and carries a Riemann metric then the Riemann metric induces a scalar product on the exterior powers of T*M and also picks out a volume form. So there is an induced scalar product ( , ) ( , )k on Qk and a formal adjoint 6 of d 6 : Qk a> Qk-1 and satisfies (d@, q#) = (#, 6#5) where # is a (k + 1)-form and V) is a k-form. Then A := d6+6d is a second order differential operator on Qk and satisfies (A0,#) =|d#|2 + #2 where |#|2 = (#, #) is the intrinsic L2 norm (so o in terms of the notation of the preceding section). Furthermore, if is a local expression for the differential form 0, where dx'-=dxi A---A dxk I --(il,...,ik) then a local expression for A is 6kx 8xc  2.10. EXAMPLE: HODGE THEORY. 81 where g3 =(dx,dx3) and the - - - are lower order derivatives. In particular A is elliptic. Let 5 E Qk and suppose that d#o=0. Let C(Q5), the cohomology class of # be the set of all V E Qk which satisfy # -@V= da, a E Qk-1 and let C(#) denote the closure of C in the L2 norm. It is a closed subspace of the Hilbert space obtained by completing Qk relative to its L2 norm. Let us denote this space by L2, so C(#) is a closed subspace of L2. Proposition 2.10.1 If0 # EQk and d# = 0, there exists a unique T e C(#) such that T | V E C(O). Furthermore, T is smooth, and dT=0 and 6r=0. If choose a minimizing sequence for || in C(#). If we choose a minimizing sequence for || in C(#) we know it is Cauchy, cf. the proof of the existence of orthogonal projections in a Hilbert space. So we know that T exists and is unique. For any a E Qk+1 we have (T, Sa) = lim(@, Sa) = lim(d, a) = 0 as V) ranges over a minimizing sequence. The equation (T, Sa) = 0 for all a E Qk+1 says that T is a weak solution of the equation dT= 0. We claim that (r,d)l=0 V/3EQkl- which says that T is a weak solution of 6r5= 0. Indeed, for any t E R, T|2 T+td||2 -T 2 +t2|d13|2 + 2t(T,d3) so -2t(T,d3) ) = 0 for any ) E Qk. Hence T is a weak solution of AT= 0 and so is smooth. The space '1k of weak, and hence smooth solutions of AT= 0 is finite dimensional by the general theory. It is called the space of Harmonic forms. We have seen that there is a unique harmonic form in the cohomology class of any closed form, s the cohomology groups are finite dimensional. In fact, the general theory tells us that L 2 Ea A (Hilbert space direct sum) where EA is the eigenspace with eigenvalue A of A. Each EA is finite dimensional and consists of smooth forms, and the A - oc. The eigenspace Eo is just H(k, the space of harmonic forms. Also, since (A#0, #) =|d#|2 + #2 we know that all the eigenvalues A are non-negative. Since dA0= d(d6 + 6d) = d6d = Ad, we see that d : E ->E+1 and similarly 6 : E - E -1. For A + 0, if # E Ej and d# =0, then A#=5 A# = do# so # =d(1/A)o# so d restricted to the EA is exact, and similarly so is 6. Furthermore, on Ok EA we have AI= A = (d+66)2 so we have EA = dEk-1 6 SEe+1 and this decomposition is orthogonal since (da, 3) =_(d2a,3) =0. As a first consequence we see that L k = Xe dQk-1 e Qk-1 (the Hodge decomposition). If H denotes projection onto the first component, then A is invertible on the image of I - H with a an inverse there which is compact. So if we let N denote this inverse on im I - H and set N = 0 on Xk we get AN =I-H Nd = dN NH =0  2.11. THE RESOLVENT. 83 which are the fundamental assertions of Hodge theory, together with the asser- tion proved above that H# is the unique minimizing element in its cohomology class. We have seen that d + 6: e Ek - E~k+1 is an isomorphism for A / 0 (2.39) k k which of course implies that S (-1)kdimE = 0 k This shows that the index of the operator d + 6 acting on 0 L2 is the Euler characteristic of the manifold. (The index of any operator is the difference between the dimensions of the kernel and cokernel). Let Pk,A denote the projection of L2 onto E . So S- Ee- P,,x is the solution of the heat equation on L2. As t -> oc this approaches the operator H projecting L2 onto 7k. Letting Ak denote the operator A on L2 we see that tr e [Ak e5 where the sum is over all eigenvalues Ak of Ak counted with multiplicity. It follows from (2.39) that the alternating sum over k of the corresponding sum over non-zero eigenvalues vanishes. Hence (-1)tre-tAk = x(M) is independent of t. The index theorem computes this trace for small values of t in terms of local geometric invariants. The operator d + 6 is an example of a Dirac operator whose general defi- nition we will not give here. The corresponding assertion and local evaluation is the content of the celebrated Atiyah-Singer index theorem, one of the most important theorems discovered in the twentieth century. 2.11 The resolvent. In order to connect what we have done here notation that will come later, it is convenient to let A = -L so that now the operator (zI-A)- is compact as an operator on Ho for z sufficiently negative. (I have dropped the Lm which should come in front of this expression.) The operator A now has only  84 CHAPTER 2. HILBERT SPACES AND COMPACT OPERATORS. finitely many positive eigenvalues, with the corresponding spaces of eigenvectors being finite dimensional. In fact, the eigenvectors An = An(A) (counted with multiplicity) approach -oc as n - oc and the operator (zI - A)-1 exists and is a bounded (in fact compact) operator so long as z An for any n. Indeed, we can write any u e Ho as where #n is an eigenvector of A with eigenvalue An and the # form an orthonor- mal basis of Ho. Then (zI - A)-lu = 1 anon. The operator (zI- A)1 is called the resolvent of A at the point z and denoted by R(z, A) or simply by R(z) if A is fixed. So R(z, A) := (zI-A)1 for those values of z E C for which the right hand side is defined. If z and are complex numbers with Rez > Rea, then the integral e-zt atdt converges, and we can evaluate it as e-zteatdt. z -a o If Rez is greater than the largest of the eigenvalues of A we can write oo R(z, A) = e-ztetAdt where we may interpret this equation as a shorthand for doing the integral for the coefficient of each eigenvector, as above, or as an actual operator valued integral. We will spend a lot of time later on in this course generalizing this formula and deriving many consequences from it.  Chapter 3 The Fourier Transform. 3.1 Conventions, especially about 27r. The space S consists of all functions on mathbbR which are infinitely differ- entiable and vanish at infinity rapidly with all their derivatives in the sense that f|m := sup{|xmf(n)(x)|} 1 ((-ix)"f)^=(i()m f as follows by differentiation under the integral sign and by integration by parts. This shows that the Fourier transform maps S to S. 85  86 CHAPTER 3. THE FOURIER TRANSFORM. 3.2 Convolution goes to multiplication. (f*g)^() 2 f (x - t)g(t)dxe- dx f(u)g(t)ee-u+t dudt 1 f uje-i"du 1 (gt e-itEdt 2JR 2J eR so (f*g)^=f . 3.3 Scaling. For any f E S and a > 0 define Saf by (Sa)f(x) := f(ax). Then setting u = ax so dx = (1/a)du we have (Saf)^2) = f (ax)e- dx 1 (1/a)f (u)e-i(/3du 27 JR so (Sa f)^ = (1/a)S1/af . 3.4 Fourier transform of a Gaussian is a Gaus- sian. The polar coordinate trick evaluates 1 e_-2/2dz = 1. 2w JR The integral 1wJ -x2/2-dx converges for all complex values of ?7, uniformly in any compact region. Hence it defines an analytic function of 77 that can be evaluated by taking 77 to be real and then using analytic continuation. For real j we complete the square and make a change of variables: 1 Je_2/2-dx = Je-(x+)2/2+n2/2dx en2/2 1w e-(X+n)2/2dz = e772/2R  3.4. FOURIER TRANSFORM OF A GAUSSIAN IS A GAUSSIAN. Setting r= i gives n = n if n(x) := e- / If we set a =cE in our scaling equation and define pE := Sen 87 so pE(x) = e-E2x2/2 then (p)(x) =e-2/2e2 Notice that for any g E S we have (1/a)(S/ag) ( )d =g(()df JR J so setting a =cE we conclude that 1 J (p )^(()dd for all c. Let V) "= 4'1 "= (ply and V) E (pc) Then 1V),7\ so (4e*g)(() --g( ) 2 R[ I)- g(01]'V) (Ts) drj (g(R - - c) - g( )]@(C)d C. Since g E S it is uniformly continuous on R, so that for any 6 > 0 we can find co so that the above integral is less than 6 in absolute value for all 0 < E < co. In short, e* 9-- 9|o a0, as E --0.  88 CHAPTER 3. THE FOURIER TRANSFORM. 3.5 The multiplication formula. This says that J (x)g(x)dx JR f(x)y(x)dx JR for any f, g e S. Indeed the left hand side equals 1 J f(y)e-fxdyg(x)dx. We can write this integral as a double integral and then of integration which gives the right hand side. interchange the order 3.6 The inversion formula. This says that for any f E S f(x) = wJ f( )exe d. To prove this, we first observe that for any h E S the Fourier transform of x F- e2nh(x) is just I(( -Tr) as follows directly from the definition. Taking g(x) = ex e-62x2/2 in the multiplication formula gives 1 JRf(t)exe-2t2/2dt Ja7 1 (F J f(t)bE (t - x)dt (ff * pE)(x). We know that the right hand side approaches f(x) as c - 0. Also, e-62t2/2 ' 1 for each fixed t, and in fact uniformly on any bounded t interval. Furthermore, 0 < e-K22/2 < 1 for all t. So choosing the interval of integration large enough, we can take the left hand side as close as we like to 2fR f(x)e2xtdt by then choosing c sufficiently small. QED 3.7 Plancherel's theorem Let f(x) := f (-x). Then the Fourier transform of f is given by 1 27Rf (-x)e- Edx 1 f (u)ei"u du f () so (J):: f. Thus (f*f)^= l 2  3.8. THE POISSON SUMMATION FORMULA. 89 The inversion formula applied to f * f and evaluated at 0 gives (f*f)(0) = wJ If2dx. The left hand side of this equation is 1 1 f (x)f(0 - x)dx 2=If (x)|2dz. Thus we have proved Plancherel's formula I f (x)|2dz = 1I f(x)|2dx. 2rR 27R Define L2 (R) to be the completion of S with respect to the L2 norm given by the left hand side of the above equation. Since S is dense in L2(R) we conclude that the Fourier transform extends to unitary isomorphism of L2(R) onto itself. 3.8 The Poisson summation formula. This says that for any g E S we have Eg(27rk)= 1 k 2w To prove this let h(x) : g(x + 27rk) k so h is a smooth function, periodic of period 27r and h(0) = g(27rk). k We may expand h into a Fourier series h(x) =Z:ameimx m where 1 27r am =h(x)e-i"xdx = g(x)e--"xdx=oy(m). Setting x = 0 in the Fourier expansion h(x) = 1 2 (m)ei" gives h(0) = 1Y gm)  90 CHAPTER 3. THE FOURIER TRANSFORM. 3.9 The Shannon sampling theorem. Let f E S be such that its Fourier transform is supported in the interval [-7, 7]. Then a knowledge of f(n) for all n E Z determines f. More explicitly, f 7(t) r f(n - t (3.1) n=-oo Proof. Let g be the periodic function (of period 27) which extends f, the Fourier transform of f. So g(r)=f|(r), r E [-7, r] and is periodic. Expand g into a Fourier series: g = cne n n?-Z where cnj j g(r)e-i" Td j=e-f(y)eidr, 27r_,27rf_ or 1 ca 1 f(-n). (27)2 But f(t) ( j f(T)eidT(= 1 g(T)eiTdT= (27r)2 oo(27r)2 1f(n)e(n+t)TdT. (27) 1--, (27) Replacing n by -n in the sum, and interchanging summation and integration, which is legitimate since the f (n) decrease very fast, this becomes f(t) _j1: f(n)j ei(-nL)TdT. But I 7r ~e t- ) T e t- ) -F - e idt- ) -F sin (n - t) ei(t-")TdT = = 2 . QED _,i(t -n) _ i(t -n) n- t It is useful to reformulate this via rescaling so that the interval [-w, w] is replaced by an arbitrary interval symmetric about the origin: In the engineering literature the frequency A is defined by 27A.  3.10. THE HEISENBERG UNCERTAINTY PRINCIPLE. 91 Suppose we want to apply (3.1) to g = Saf. We know that the Fourier transform of g is (1/a)S1/af and supp Si/al = asupp f. So if supp c [-27Arc, 27r\c] we want to choose a so that a27A <; r or 1 a < 2Ac.(3.2) For a in this range (3.1) says that 1 sin 7r(x - n) f (ax) =-Z f (na)n, 7r x - n or setting t = ax, sin(! (t - na) f (t) = f (na) a .(3.3) a- (t - na) This holds in L2 under the assumption that f satisfies supp f C [-27Ac, 2Ac]. We say that f has finite bandwidth or is bandlimited with bandlimit )c. The critical value ac = 1/2)c is known as the Nyquist sampling interval and (1/a) = 2)c is known as the Nyquist sampling rate. Thus the Shannon sampling theorem says that a band-limited signal can be recovered completely from a set of samples taken at a rate > the Nyquist sampling rate. 3.10 The Heisenberg Uncertainty Principle. Let f E S(R) with |f (x)|2dx= 1. We can think of x H |f(x)|2 as a probability density on the line. The mean of this probability density is m:= xlf(x)|2dx. If we take the Fourier transform, then Plancherel says that (f()|2d = 1 as well, so it defines a probability density with mean m(| )|2d .  92 CHAPTER 3. THE FOURIER TRANSFORM. Suppose for the moment that these means both vanish. The Heisenberg Un- certainty Principle says that Ixf(x)|2dx ( )2d) Proof. Write -if(() as the Fourier transform of f' and use Plancherel to write the second integral as f f'(x)|2dx. Then the Cauchy - Schwarz inequality says that the left hand side is > the square of Jxf(x)f'(x)|dx Re(xf(x)f'(x))dx= 2Ix(f (x)f'(x) + f (x)f'(x)dx d 1 QED If f has norm one but the mean of the probability density f 2 is not necessar- ily of zero (and similarly for for its Fourier transform) the Heisenberg uncertainty principle says that (x - xm)f (x)|2dx) ( - (m)f( )|2d) . The general case is reduced to the special case by replacing f(x) by f (x + xm)e2x. 3.11 Tempered distributions. The space S was defined to be the collection of all smooth functions on R such that f m := sup{|xmf(n)(x)|} < oo. The collection of these norms define a topology on S which is much finer that the L2 topology: We declare that a sequence of functions {fk} approaches g e S if and only if fk - g|m,n -> 0 for every m and n. A linear function on S which is continuous with respect to this topology is called a tempered distribution. The space of tempered distributions is denoted by s'. For example, every element f E S defines a linear function on S by 1j f'  3.11. TEMPERED DISTRIBUTIONS. 93 But this last expression makes sense for any element f E L2 (RI), or for any piecewise continuous function f which grows at infinity no faster than any poly- nomial. For example, if f = 1, the linear function associated to f assigns to # the value 2j(x)dx. This is clearly continuous with respect to the topology of S but this function of # does not make sense for a general element # of L2 (R). Another example of an element of S' is the Dirac 6-function which assigns to # E S its value at 0. This is an element of S' but makes no sense when evaluated on a general element of L2 (Rk). If f E S, then the Plancherel formula formula implies that its Fourier trans- form F(f) = f satisfies (#,f) =( ), (f)). But we can now use this equation to define the Fourier transform of an arbitrary element of S': If £ E S' we define F(f) to be the linear function FTV) () : = f (-F--1 (0)). 3.11.1 Examples of Fourier transforms of elements of S'. " If £ corresponds to the function f - 1, then FT V) (@)= L(F-')(()d V) = .%c (F--1)(0) = $(0). So the Fourier transform of the function which is identically one is the Dirac 6-function. " If 6 denotes the Dirac 6-function, then (F( )(@)= 6(F--())= ((F--()) (0) = (x)dz. So the Fourier transform of the Dirac 6 function is the function which is identically one. " In fact, this last example follows from the preceding one: If m = F(f) then (F(m)(#) = m(-F-1(#)) =(-(F-1(-F-1(#)). But F-2(#)(z) = #(-). So if m = (£) then F(m) =£ where W-6)) :  94 CHAPTER 3. THE FOURIER TRANSFORM. " The Fourier transform of the function x: This assigns to every V) E S the value 1 1ifxdldxd=$l (eix ) ddx = 2/ /e7r i d{ i 1 J d@()e2 d dx=i.f(F_1 (d( )))) (0)=i (d@()) v/5 dx ( dx () dx Now for an element of S we have -o fdx = - #-dfdx. dx v5 dx So we define the derivative of an £ E s' by d £ d# So the Fourier transform of x is -id.  Chapter 4 Measure theory. 4.1 Lebesgue outer measure. We recall some results from the chapter on metric spaces: For any subset A c R we defined its Lebesgue outer measure by m*(A) := inf1f (In) : In are intervals with A C U'I . (4.1) Here the length £(I) of any interval I = [a, b] is b - a with the same definition for half open intervals (a, b] or [a, b), or open intervals. Of course if a = -oc and b is finite or +oc, or if a is finite and b =+oc the length is infinite. So the infimum in (4.1) is taken over all covers of A by intervals. By the usual E/2n trick, i.e. by replacing each Ij = [as, b3] by (a3 - /2j+1, b3 + /23+1) we may assume that the infimum is taken over open intervals. (Equally well, we could use half open intervals of the form [a, b), for example.). It is clear that if A c B then m*(A) m*(B) since any cover of B by intervals is a cover of A. Also, if Z is any set of measure zero, then m* (A U Z) m*(A). In particular, m*(Z) = 0 if Z has measure zero. Also, if A = [a, b] is an interval, then we can cover it by itself, so m*([a, b]) < b - a, and hence the same is true for (a, b], [a, b), or (a, b). If the interval is infinite, it clearly can not be covered by a set of intervals whose total length is finite, since if we lined them up with end points touching they could not cover an infinite interval. We recall the proof that m*(I) =_£(I) (4.2) if I is a finite interval: We may assume that I = [c, d] is a closed interval by what we have already said, and that the minimization in (4.1) is with respect to a cover by open intervals. So what we must show is that if [c, d] C U(ai, b) 95  96 CHAPTER 4. MEASURE THEORY. then d-c Z(bi -a). i We first applied Heine-Borel to replace the countable cover by a finite cover. (This only decreases the right hand side of preceding inequality.) So let n be the number of elements in the cover. We needed to prove that if [c, d] c (ai, bi) then d - c < 1(bi - ai), i=1 i=1 and we did this this by induction on n. If n = 1 then a1 < c and b1 > d so clearly b1 - a1 > d - c. Suppose that n ;> 2 and we know the result for all covers (of all intervals [c, d] ) with at most n - 1 intervals in the cover. If some interval (ai, bi) is disjoint from [c, d] we may eliminate it from the cover, and then we are in the case of n - 1 intervals. So every (ai, bi) has non-empty intersection with [c, d]. Among the the intervals (ai, bi) there will be one for which a2 takes on the minimum possible value. By relabeling, we may assume that this is (a1, b1). Since c is covered, we must have a1 < c. If b1 > d then (a1, b1) covers [c, d] and there is nothing further to do. So assume b1 d. We must have b1 > c since (a1, bi) n [c, d] #0. Since b1 E [c, d], at least one of the intervals (ai, bi), i > 1 contains the point b1. By relabeling, we may assume that it is (a2, b2). But now we have a cover of [c, d] by n - 1 intervals: [c, d] C (a1, b2) UU(aj, b). i=3 So by induction d-c (b2 -ai)+Z(bi -a). i=3 But b2 - a1 < (b2 - a2) + (b1 - a1) since a2 < b1. QED We repeat that the intervals used in (4.1) could be taken as open, closed or half open without changing the definition. If we take them all to be half open, of the form Ii = [ai, bi), we can write each Ii as a disjoint union of finite or countably many intervals each of length < E. So it makes no difference to the definition if we also require the £(Ii) < E (4.3) in (4.1). We will see that when we pass to other types of measures this will make a difference. We have verified, or can easily verify the following properties: 1. m* (0) =O.  4.1. LEBESGUE OUTER MEASURE. 97 2. A C B ->m* (A) < m*(B). 3. m*(U AZ) Zm*(Ai). i i 4. If dist (A, B) > 0 then m*(A U B) = m*(A) + m*(B). 5. m*(A) = inf{m*(U) : U D A, U open}. 6. For an interval m*(I) = (I). The only items that we have not done already are items 4 and 5. But these are immediate: for 4 we may choose the intervals in (4.1) all to have length < E where 2E < dist (A, B) so that there is no overlap. As for item 5, we know from 2 that m*(A) m* (U) for any set U, in particular for any open set U which contains A. We must prove the reverse inequality: if m* (A) = oo this is trivial. Otherwise, we may take the intervals in (4.1) to be open and then the union on the right is an open set whose Lebesgue outer measure is less than m* (A) + 6 for any 6 > 0 if we choose a close enough approximation to the infimum. I should also add that all the above works for Rh instead of R if we replace the word "interval" by "rectangle", meaning a rectangular parallelepiped, i.e a set which is a product of one dimensional intervals. We also replace length by volume (or area in two dimensions). What is needed is the following Lemma 4.1.1 Let C be a finite non-overlapping collection of closed rectangles all contained in the closed rectangle J. Then volJ > vol I. IC If C is any finite collection of rectangles such that J C U I IC then vol J < 1:vol (I). IC This lemma occurs on page 1 of Strook, A concise introduction to the theory of integration together with its proof. I will take this for granted. In the next few paragraphs I will talk as if we are in R, but everything goes through unchanged if R is replaced by R"h.  98 CHAPTER 4. MEASURE THEORY. 4.2 Lebesgue inner measure. Item 5. in the preceding paragraph says that the Lebesgue outer measure of any set is obtained by approximating it from the outside by open sets. The Lebesgue inner measure is defined as m,(A) = sup{m*(K) : K c A, K compact }. (4.4) Clearly m, (A) < m* (A) since m*(K) m* (A) for any K c A. We also have Proposition 4.2.1 For any interval I we have m, (I) =_ £(I). (4.5) Proof. If £(I) = oc the result is obvious. So we may assume that I is a finite interval which we may assume to be open, I = (a, b). If K c I is compact, then I is a cover of K and hence from the definition of outer measure m*(K) f(I). So m,(I) f(I). On the other hand, for any c> 0, c< 2(b - a) the interval [a + E, b - E] is compact and m*([a - E, a + E]) = b - a - 2E K m,(I). Letting E -> 0 proves the proposition. QED 4.3 Lebesgue's definition of measurability. A set A with m* (A) < oc is said to measurable in the sense of Lebesgue if m,(A) = m*(A). (4.6) If A is measurable in the sense of Lebesgue, we write m(A) = m, (A) = m* (A). (4.7) If K is a compact set, then m,(K) = m*(K) since K is a compact set con- tained in itself. Hence all compact sets are measurable in the sense of Lebesgue. If I is a bounded interval, then I is measurable in the sense of Lebesgue by Proposition 4.2.1. If m*(A) = oc, we say that A is measurable in the sense of Lebesgue if all of the sets A n [-n, n] are measurable. Proposition 4.3.1 If A U=U)A, is a (finite or) countable disjoint union of sets which are measurable in the sense of Lebesgue, then A is measurable in the sense of Lebesgue and  4.3. LEBESGUE'S DEFINITION OF MEASURABILITY. 99 Proof. We may assume that m(A) < oo - otherwise apply the result to A n [-n, n] and Ai n [-n, n] for each n. We have m*(A) Zm*(An) = m(An). n n Let c > 0, and for each n choose compact Kn c An with m*(Kn) > m*(An) 2-E=m(An) - since An is measurable in the sense of Lebesgue. The sets Kn are pairwise disjoint, hence, being compact, at positive distances from one another. Hence m*(K1 U -.. U K ) =m*(K1) + --. + m* (Kn) and K1 U ... U K, is compact and contained in A. Hence m,(A) >m*(K1) + -..-+ m*(Kn), and since this is true for all n we have m,(A) > m(A) - 6. Since this is true for all c> 0 we get m,(A) m(A). But then m (A) > m* (A) and so they are equal, so A is measurable in the sense of Lebesgue, and m(A) = E m(Ai). QED Proposition 4.3.2 Open sets and closed sets are measurable in the sense of Lebesgue. Proof. Any open set 0 can be written as the countable union of open intervals Ii, and n-1 J ' =I \ U Ii i=1 is a disjoint union of intervals (some open, some closed, some half open) and 0 is the disjont union of the Jn. So every open set is a disjoint union of intervals hence measurable in the sense of Lebesgue. If F is closed, and m*(F) = oo, then Fn [-n, n] is compact, and so F is measurable in the sense of Lebesgue. Suppose that m*(F) 0 consider the sets G1,e :=[-1 + 22 2] nF G2,e ([-2 + $,-1] nF) U ([1,2 - ] n F) G3,e ([-3 + 4,-2] nF) U ([2,3 - ] n F) and set GE U C26E. 2 The Ge are all compact, and hence measurable in the sense of Lebesgue, and the union in the definition of GE is disjoint, so is measurable in the sense of Lebesgue. Furthermore, the sum of the lengths of the "gaps" between the intervals that went into the definition of the G2,e is E. So m(Ge) + E = m*(GE) + E> m* (F) > m*(GE) =m(Ge) Zm(G,E). In particular, the sum on the right converges, and hence by considering a finite number of terms, we will have a finite sum whose value is at least m(Ge) - E. The corresponding union of sets will be a compact set KE contained in F with m(Ke) > m*(F) - 2E. Hence all closed sets are measurable in the sense of Lebesgue. QED Theorem 4.3.1 A is measurable in the sense of Lebesgue if and only if for every c > 0 there is an open set U D A and a closed set F c A such that m(U \ F) m,(A) - E = m(A) - E/2. Since U \ F is open, it is measurable in the sense of Lebesgue, and so is F as it is compact. Also F and U \ F are disjoint. Hence by Proposition 4.3.1, m(U \ F) = m(U) - m(F) < m(A) + - {m(A) - =ce. If A is measurable in the sense of Lebesgue, and m(A) = oc, we can apply the above to A n I where I is any compact interval. So we can find open sets Un D A n [-n - 26n+1, n + 26,+1] and closed sets Fn c A n [-n, n] with m(Un \ Fe) < c/2Th. Here the on are sufficiently small positive numbers. We  4.3. LEBESGUE'S DEFINITION OF MEASURABILITY. 101 may enlarge each Fn if necessary so that Fn n [-n + 1, n - 1] D F,_1. We may also decrease the Un if necessary so that Un n(-n+ 1- n, n- 1+ 6n) C Un_1. Indeed, if we set Cn := [-n +1 - 6n, n - 1+6n] n Un_1 then Cn is a closed set with C n A = 0. Then Un n Cn is still an open set containing [-n - 26n+1, n + 26n+1] n A and (Un n Cn) n (-n + 1- on, n- 1+ 4) c Cn n (-n +1- on, n1-1+ 6) c Un_1. Take U UUn so U is open. Take F := U(Fn n ([-n, -n + 1] U [n - 1,1n])). Then F is closed, U D A D F and U \ F C U(Un/Fn) n ([-n, -n + 1] U [n - 1, n]) C U(Un \ Fn) In the other direction, suppose that for each c, there exist U D A D F with m(U \ F) < E. Suppose that m*(A) < oo. Then m(F) < oc and m(U) m(U \ F) + m(F) 0 we conclude that m(A) m*(A) so they are equal and A is measurable in the sense of Lebesgue. If m* (A) = oc, we have Urn (-n - E, n + E) D An [-n, n] D Fn [-n, n] and m((U n (-n - e,1n + E) \ (Fn [-n,1n]) < 2E + E= 3 so we can proceed as before to conclude that m,(A n [-n, n]) = m*(An [-n, n]). QED Several facts emerge immediately from this theorem: Proposition 4.3.3 If A is measurable in the sense of Lebesgue, so is its com- plement Ac= R \ A. Indeed, if F c A c U with F closed and U open, then FC D Ac D Uc with FC open and Uc closed. Furthermore, Fc \ Uc = U \ F so if A satisfies the condition of the theorem so does Ac. Proposition 4.3.4 If A and B are measurable in the sense of Lebesgue so is An B. For c > 0 choose UA D A D FA and UB D B D FB with m(UA \ FA) < E/2 and m(UB \ FB) < E/2. Then (UA n UB) D (A n B) D (FA n FB) and (Un n UB)\(FAn FB) c (Un\ FA)U(UB \ FB).QED Putting the previous two propositions together gives  102 CHAPTER 4. MEASURE THEORY. Proposition 4.3.5 If A and B are measurable in the sense of Lebesgue then so is A U B. Indeed, A U B = (AC n BC)C. Since A\ B = A n BC we also get Proposition 4.3.6 If A and B are measurable in the sense of Lebesgue then so is A\B. 4.4 Caratheodory's definition of measurability. A set E c R is said to be measurable according to Caratheodory if for any set A c R we have m*(A) = m*(A n E) + m*(A n Ec) (4.8) where we recall that Ec denotes the complement of E. In other words, A n Ec A \ E. This definition has many advantages, as we shall see. Our first task is to show that it is equivalent to Lebesgue's: Theorem 4.4.1 A set E is measurable in the sense of Caratheodory if and only if it is measurable in the sense of Lebesgue. Proof. We always have m*(A) m*(A n E) + m*(A \ E) so condition (4.8) is equivalent to m*(An E) + m*(A \ E) m*(A) (4.9) for all A. Suppose E is measurable in the sense of Lebesgue. Let c > 0. Choose U D E D F with U open, F closed and m(U/F) < E which we can do by Theorem 4.3.1. Let V be an open set containing A. Then A \ E C V \ F and AnE c (VnU) so m*(A\ E)+m*(An E) m(V \F)+m(V nU) m(V \U)+m(U\F)+m(V nU) m(V) +cE. (We can pass from the second line to the third since both V \ U and V n U are measurable in the sense of Lebesgue and we can apply Proposition 4.3.1.) Taking the infimum over all open V containing A, the last term becomes m* (A), and as c is arbitrary, we have established (4.9) showing that E is measurable in the sense of Caratheodory.  4.4. CARATHEODORY'S DEFINITION OF MEASURABILITY. 103 In the other direction, suppose that E is measurable in the sense of Caratheodory. First suppose that m*(E) <0o0. Then for any c > 0 there exists an open set U D E with m(U) < m*(E) + E. We may apply condition (4.8) to A = U to get m(U) = m*(Un E) + m*(U \ E) = m*(E) + m*(U \ E) so m*(U \ E) m(U) - E. So there is a closed set F c U \ V with m(F) > m(U) -cE. But since V D U \ E, we have U \ V c E. So F C E. So F C E C U and m(U \ F) = m(U) - m(F) < E. Hence E is measurable in the sense of Lebesgue. If m(E) = oc, we must show that E n [-n, n] is measurable in the sense of Caratheodory, for then it is measurable in the sense of Lebesgue from what we already know. We know that the interval [-n, n] itself, being measurable in the sense of Lebesgue, is measurable in the sense of Caratheodory. So we will have completed the proof of the theorem if we show that the intersection of E with [-n, n] is measurable in the sense of Caratheodory. More generally, we will show that the union or intersection of two sets which are measurable in the sense of Caratheodory is again measurable in the sense of Caratheodory. Notice that the definition (4.8) is symmetric in E and EC so if E is measurable in the sense of Caratheodory so is Ec. So it suffices to prove the next lemma to complete the proof. Lemma 4.4.1 If E1 and E2 are measurable in the sense of Caratheodory so is E1 U E2. For any set A we have m*(A) =m*(A n E1) + m*(A n El) by (4.8) applied to E1. Applying (4.8) to A n El and E2 gives m*(AOEl) =m*(AOEIOnE2) +m*(AOEI OE2).  104 CHAPTER 4. MEASURE THEORY. Substituting this back into the preceding equation gives m*(A) = m*(A n E1) + m*(A n E1 n E2) +m*(An El n E2). (4.10) Since El n E2 = (El U E2) we can write this as m*(A) = m*(A n El1) + m*(A n El n E2) + m*(A n (E1 U E2)c). Now A n(El1 U E2)= (An El1) U (An (Ei nE2) so m*(A n El) + m*(A n El n E2);> m*(A n (E1 U E2)). Substituting this for the two terms on the right of the previous displayed equa- tion gives m*(A) > m*(A n (El1 U E2)) + m*(A n (E1 U E2)c) which is just (4.9) for the set E1 U E2. This proves the lemma and the theorem. We let M denote the class of measurable subsets of R - "measurability" in the sense of Lebesgue or Caratheodory these being equivalent. Notice by induction starting with two terms as in the lemma, that any finite union of sets in M is again in M 4.5 Countable additivity. The first main theorem in the subject is the following description of M and the function m on it: Theorem 4.5.1 M and the function m : M - R have the following properties: " ReM. " E e M -> E" E M. " IfEnE Mforn=1,2,3,... then U, E, E M. " If Fn e M and the Fn are pairwise disjoint, then F := Un Fn e M and 00 m(F) =Z(:m(Fn). n=1 Proof. We already know the first two items on the list, and we know that a finite union of sets in M is again in M. We also know the last assertion which is Proposition 4.3.1. But it will be instructive and useful for us to have a proof starting directly from Caratheodory's definition of measurablity: If F1 e M, F2 e M and F1 n F2 = 0 then taking A=F1UF2, El1=F1, El2 F2  4.5. COUNTABLE ADDITIVITY. 105 in (4.10) gives m(F1 U F2) = m(F1) + m(F2). Induction then shows that if F1, ... , F, are pairwise disjoint elements of M then their union belongs to M and m(F1 G F2UG - GFn) = m(F1) +m(F2) +.-.-+ m(Fn). More generally, if we let A be arbitrary and take E1 = F1, E2 = F2 in (4.10) we get m*(A) = m*(A nF1) + m*(A nF2) + m*(A n (F1 U F2)c) If F3 E M is disjoint from F1 and F2 we may apply (4.8) with A replaced by A n(F1 UF2)c and E by F3 to get m*(A n (F1 U F2)c)) = m*(A n F3) + m*(A n (F1 U F2UF3)c), since (F1 u F2)c n F= F1 n F2 n F =(F1 u F2 u F3)c. Substituting this back into the preceding equation gives m*(A) = m*(A n F1) + m*(A n F2) + m*(A n F3) + m*(A n (F1 U F2U F3)c) Proceeding inductively, we conclude that if F1, ... , Fn are pairwise disjoint ele- ments of M then m*(A) Zm*(An Fi) + m*(An (F1 G -.--.U Fn)c). (4.11) Now suppose that we have a countable family {F} of pairwise disjoint sets belonging to M. Since i=1 (i=1 we conclude from (4.11) that m*(A)Z> m*(A n Fi) + m* A n (jF) 1 (i=1 and hence passing to the limit m*(A) Zm*(A n Fi) + m* A n (jF) . 1 i=1 Now given any collection of sets Bk we can find intervals {Ik,j } with Bk - IlI  106 CHAPTER 4. MEASURE THEORY. and m* (Bk) < ( (Ik,j) -|-2. So U Bk C UIk,i k k,j and hence m* (UBk)Z m*(B, the inequality being trivially true if the sum on the right is infinite. So m*(AnF) m* An ( FZ). i=1 i=1 Thus m*(A) m*(A n Fim) + m* An n F ) 1 i=1 > m* A n F)) +m* (A n F ). i=1 i=1 The extreme right of this inequality is the left hand side of (4.9) applied to E=U Z and so E E M and the preceding string of inequalities must be equalities since the middle is trapped between both sides which must be equal. Hence we have proved that if Fn is a disjoint countable family of sets belonging to M then their union belongs to M and m* (A)=(m*(A n F) + m* A n U F). (4.12) i i=1 If we take A U= F we conclude that m(F) Zm(Fn) (4.13) n=1 if the F3 are disjoint and F = U F. So we have reproved the last assertion of the theorem using Caratheodory's definition. For the third assertion, we need only observe that a countable union of sets in M can be always written as a countable disjoint union of sets in M. Indeed, set F1 := E1, F2 := E2\ E1 =E1 n Ej  4.5. COUNTABLE ADDITIVITY. 107 F3 := E3 \ (E1 UE2) etc. The right hand sides all belong to M since M is closed under taking complements and finite unions and hence intersections, and UFUEj. We have completed the proof of the theorem. A number of easy consequences follow: The symmetric difference between two sets is the set of points belonging to one or the other but not both: AOB := ( A\ B) U (B \ A). Proposition 4.5.1 If A e M and m(A/B) = 0 then B E M and m(A) m(B). Proof. By assumption A \ B has measure zero (and hence is measurable) since it is contained in the set ANB which is assumed to have measure zero. Similarly for B \ A. Also (A n B) E M since AnB =A\(A\B). Thus B=(AnB)U(B\A) EM. Since B \ A and A n B are disjoint, we have m(B) =m(A n B)+m(B\ A) =m(A n B) =m(A n B)+m(A\ B) =m(A). QED Proposition 4.5.2 Suppose that An E M and An C An+1 for n = 1,2,.... Then m (UA n = lim m (A,). Indeed, setting B := An \ An_1 (with B1 = A1) the B2 are pairwise disjoint and have the same union as the Ai so 00nn m (UA m(B) = lim m_(B) = lim m UBi = lim m(An). Tho -0o nT-0o i=1 i 1 i 1 QED Proposition 4.5.3 If Cn D Cn+1 is a decreasing family of sets in M and m(Ci) < oc then m Cn =lim m(Cn).  108 CHAPTER 4. MEASURE THEORY. Indeed, set A1:= 0, A2 := C1 \ C2, A3 := C1 \ C3 etc. The A's are increasing so m (Uci \ CZ)) =lim m(C1 \ Cn) = m(C1) - lim m(Cn) by the preceding proposition. Since m(Ci) 0 we conclude that m* is countably subadditive. So we have verified that m* defined by (4.14) is an outer measure. We must check that it satisfies the two conditions in the theorem. If A E C then the single element collection {A} e ccc(A), so m*(A) £(A), so the first condition is obvious. As to the second condition, suppose n* is an outer measure with n*(D) £(D) for all D E C. Then for any set A and any countable cover D of A by elements of C we have 5:£(D) 5 :n*(D) n* rDn*(A), DED DED DED where in the second inequality we used the countable subadditivity of n* and in the last inequality we used the monotonicity of n*. Minimizing over all D E ccc(A) shows that m*(A) n*(A). QED This argument is basically a repeat performance of the construction of Lebesgue measure we did above. However there is some trouble: 4.7.1 A pathological example. Suppose we take X = R, and let C consist of all half open intervals of the form [a, b). However, instead of taking £ to be the length of the interval, we take it to be the square root of the length: f([, )):=(b -a)i.  4.7. CONSTRUCTING OUTER MEASURES, METHOD I. 111 I claim that any half open interval (say [0, 1)) of length one has m*([a, b)) = 1. (Since £ is translation invariant, it does not matter which interval we choose.) Indeed, m*([0, 1)) < 1 by the first condition in the theorem, since £([0, 1)) = 1. On the other hand, if [0,1) c U[ai, b2) then we know from the Heine-Borel argument that S (b2 - a2) > 1, so squaring gives ( (bi-ai)2 S(bi-ai)+S(b-ai) (bi-aji >1. So m*([0, 1)) = 1. On the other hand, consider an interval [a, b) of length 2. Since it covers itself, m*([a, b)) V2 m*([-1, 1)). In other words, the closed unit interval is not measurable relative to the outer measure m* determined by the theorem. We would like Borel sets to be measur- able, and the above computation shows that the measure produced by Method I as above does not have this desirable property. In fact, if we consider two half open intervals I1 and '2 of length one separated by a small distance of size E, say, then their union I1 U '2 is covered by an interval of length 2 + E, and hence m*(I1 U I2) 2 + E < m*(I1) + m*(I2). In other words, m* is not additive even on intervals separated by a finite dis- tance. It turns out that this is the crucial property that is missing: 4.7.2 Metric outer measures. Let X be a metric space. An outer measure on X is called a metric outer measure if m*(A U B) = m*(A) + m*(B) whenver d(A, B) > 0. (4.15) The condition d(A, B) > 0 means that there is an c > 0 (depending on A and B) so that d(z.y) > c for all i E A, y E B. The main result here is due to Caratheodory:  112 CHAPTER 4. MEASURE THEORY. Theorem 4.7.2 If m* is a metric outer measure on a metric space X, then all Borel sets of X are m* measurable. Proof. Since the o-field of Borel sets is generated by the closed sets, it is enough to prove that every closed set F is measurable in the sense of Caratheodory, i.e. that for any set A m*(A) > m*(A n F) + m*(A \ F). Let A:= e A d(x,F)> We have d(Aj, A n F) > 1/j so, since m* is a metric outer measure, we have m*(An F) + m*(Aj) = m*((An F) U Aj) m*(A) (4.16) since (An F) U AJ c A. Now A\F UAW since F is closed, and hence every point of A not belonging to F must be at a positive distance from F. We would like to be able to pass to the limit in (4.16). If the limit on the left is infinite, there is nothing to prove. So we may assume it is finite. Now if x E A \ (F U Aj+1) there is a z e F with d(x, z) < 1/(j + 1) while if yE A3 we have d(y,z) > 1/j so 1 1 d(x, y)>d(y,z)- d(x,z) >1-.- >0. Let B1 := A1 and B2 := A2 \A1, B3 = A3 \A2 etc. Thus if i > j + 2, then By c A3 and B, c A\(F U Ai_1) c A\(F U Aj+1) and so d(B2, Bj) > 0. So m* is additive on finite unions of even or odd B's: m* ( B2k-1 = m*(B2k-1), m* ( B2k) m*(B2k). k=1 k=1 (k=1 k=1 Both of these are m* (A2n) since the union of the sets involved are contained in A2n. Since m* (A2n) is increasing, and assumed bounded, both of the above  4.8. CONSTRUCTING OUTER MEASURES, METHOD II. 113 series converge. Thus m* (A/F) = m* (U A) = m* A U U B) k>j+1 00 K m* (Aj) + m* (Bi ) k=j+1 00 < lim m*(A,)+ m*(Bj). k=j+1 But the sum on the right can be made as small as possible by choosing j large, since the series converges. Hence m*(A/F) lim m*(An) QED. 4.8 Constructing outer measures, Method II. Let C c E be two covers, and suppose that £ is defined on E, and hence, by restriction, on C. In the definition (4.14) of the outer measure mI c associated to £ and C, we are minimizing over a smaller collection of covers than in computing the metric outer measure m using all the sets of E. Hence me,c(A) m*,(A) for any set A. We want to apply this remark to the case where X is a metric space, and we have a cover C with the property that for every x E X and every c > 0 there is a C E C with x E C and diam(C) 0. Then for every set A the m(e,c (A) are increasing, so we can consider the function on sets given by ml, (A) sup mEc(A). E-o The axioms for an outer measure are preserved by this limit operation, so mi is an outer measure. If A and B are such that d(A, B) > 2E, then any set of CE which intersects A does not intersect B and vice versa, so throwing away extraneous sets in a cover of A U B which does not intersect either, we see that m1 (A U B) = mi T(A) + mi T(B). The method II construction always yields a metric outer measure.  114 CHAPTER 4. MEASURE THEORY. 4.8.1 An example. Let X be the set of all (one sided) infinite sequences of 0's and l's. So a point of X is an expression of the form a1a2a3 ... where each a2 is 0 or 1. For any finite sequence a of 0's or l's, let [a] denote the set of all sequences which begin with a. We also let a denote the length of a, that is, the number of bits in a. For each 0 0 and = 0 if and only if x = y, and dr(y, x) = dr(x, y). Also, for three x, y, and z we claim that dr (x,z) max{dr (x, y), dr (y, z)}. Indeed, if two of the three points are equal this is obvious. Otherwise, let j denote the length of the longest common prefix of x and y, and let k denote the length of the longest common prefix of y and z. Let m = min(j, k). Then the first m bits of x agree with the first m bits of z and so dr(x, z) < rm= max(ri,rk). QED A metric with this property (which is much stronger than the triangle in- equality) is called an ultrametric. Notice that diam [a] = r". (4.17) The metrics for different r are different, and we will make use of this fact shortly. But Proposition 4.8.1 The spaces (X, dr) are all homeomorphic under the identity map. It is enough to show that the identity map is a continuous map from (X, dr) to (X, d8) since it is one to one and we can interchange the role of r and s. So, given c > 0, we must find a 6 > 0 such that if dr(x, y) < 6 then ds(x, y) m*7([az]) where m* denotes the method I outer measure associated with £. What is special about the value 1 is that if k a then 1 k 1 k+1 1 k+1 222 (([Z ) ( 2 = )+ 2((]) ([) So if we also use the metric di, we see, by repeating the above, that every [a] can be written as the disjoint union C1 U-..- U Cn of sets in CE with £([a])) Z f(Ci). Thus m ,c,([a]) £(a) and so m ,c ([a])(A) m*7(A) or m*= m*. It also follows from the above computation that m*([a])=f([a]) There is also something special about the value s = }: Recall that one of the definitions of the Cantor set C is that it consists of all points x E [0, 1] which have a base 3 expansion involving only the symbols 0 and 2. Let h:X-C where h sends the bit 1 into the symbol 2, e.g. h(011001...) =.022002 .... In other words, for any sequence z h(Oz) =h(z) h(lz)= h(z) + 2 (4.18) 3' 3 I claim that: 1 -di (x, y) < h(x) - h(y)| < d (x, y) (4.19) Proof. If x and y start with different bits, say x O=x' and y =ly' then d3 (x, y) = 1 while h(x) lies in the interval [0, 3] and h(y) lies in the interval [3, 1] on the real line. So h(x) and h(y) are at least a distance 1 and at most a distance 1 apart, which is what (4.19) says. So we proceed by induction. Suppose we know that (4.19) is true when x= ax' and y =ay' with 7', y' starting with different digits, and |al L*(A) for all sets A and all c, and so 7* > L*. On the other hand, any bounded half open interval [a, b) can be broken up into a finite union of half open intervals of length < c, whose sum of diameters is b - a. So mi,6([a, b) K b - a. But the method I construction theorem says that L* is the largest outer measure satisfying m*([a, b)) K 0 7sX (F) = oo. Indeed, if diam A < c, then m*,6(A) (diam A)t < ct-s(diam A)s so by the method I construction theorem we have m*, (B) ct-sms,6(B) for all B. If we take B = F in this equality, then the assumption H.(F) < oc implies that the limit of the right hand side tends to 0 as c - 0, so Hti(F) = 0. The second assertion in the theorem is the contrapositive of the first. 4.10 Hausdorff dimension. This last theorem implies that for any Borel set F, there is a unique value so (which might be 0 or oc) such that Ht(F) = oc for all t < so and H.(F) = 0 for all for all s > so. This value is called the Hausdorff dimension of F. It is one of many competing (and non-equivalent) definitions of dimension. Notice that it is a metric invariant, and in fact is the same for two spaces different by a Lipschitz homeomorphism with Lipschitz inverse. But it is not a topological invariant. In fact, we shall show that the space X of all sequences of zeros and one studied above has Hausdorff dimension 1 relative to the metric d1 while it has Hausdorff dimension log 2/log 3 if we use the metric d1. Since we have shown that (X, d±) is Lipschitz equivalent to the Cantor set C, this will also prove that C has Hausdorff dimension log 2/log 3. We first discuss the d_ case and use the following lemma Lemma 4.10.1 If diam(A) > 0, then there is an a such that A C [a] and diam([a]) =diam A.  118 CHAPTER 4. MEASURE THEORY. Proof. Given any set A, it has a "longest common prefix". Indeed, consider the set of lengths of common prefixes of elements of A. This is finite set of non-negative integers since A has at least two distinct elements. Let n be the largest of these, and let a be a common prefix of this length. Then it is clearly the longest common prefix of A. Hence A c [a] and diam([a]) = diam A.QED Let C denote the collection of all sets of the form [a] and let £ be the function on C given by f([a]) = ( )|| and let t* be the associated method I outer measure, and m the associated measure; all these as we introduced above. We have £*(A) <*([a]) = diam([a]) = diam(A). By the method I construction theorem, mlE is the largest outer measure with the property that n*(A) diam A for sets of diameter < c. Hence £* K 0, we conclude that £* <7-(*. On the other hand, for any a and any c > 0, there is an n such that 2-' < c and n ;> a. The set [a] is the disjoint union of all sets [3] c [a] with |3 ;> n, and there are 2k-I of these subsets, each having diameter 2-'. So m*,,([az]) < 2-|"|. However £* is the largest outer measure satisfying this inequality for all [a]. Hence m1, < t* for all c so Xi K <*. In other words 7-1 =M. But since we computed that m(X) = 1, we conclude that The Hausdorff dimension of (X, d 1) is 1. Now let us turn to (X, d±). Then the diameter diam relative to the metric d_ and the diameter diam relative to the metric d are given by 2 3 3 diam ([a]) V(), diam ([a]) () , k= |al. If we choose s so that 2-k =_(3-k)s then diam2([a]) =_(diam1([a]))8. This says that relative to the metric di, the previous computation yields xs(X) = 1.  4.11. PUSH FORWARD. 119 Hence s = log 2/log 3 is the Hausdorff dimension of X. The material above (with some slight changes in notation) was taken from the book Measure, Topology, and Fractal Geometry by Gerald Edgar, where a thorough and delightfully clear discussion can be found of the subjects listed in the title. 4.11 Push forward. The above discussion is a sampling of introductory material to what is known as "geometric measure theory". However the construction of measures that we will be mainly working with will be an abstraction of the "simulation" approach that we have been developing in the problem sets. The setup is as follows: Let (X, F, m) be a set with a c--field and a measure on it, and let (Y, C) be some other set with a o--field on it. A map f:X - Y is called measurable if f-1(B) E F V B E g. We may then define a measure f,(m) on (Y, C) by (f,)m(B) = m(f --1(B)). For example, if YA is the Poisson random variable from the exercises, and u is the uniform measure (the restriction of Lebesgue measure to) on [0, 1], then f, (u) is the measure on the non-negative integers given by f, (u) ({k}) = e- ! It will be this construction of measures and variants on it which will occupy us over the next few weeks. 4.12 The Hausdorff dimension of fractals 4.12.1 Similarity dimension. Contracting ratio lists. A finite collection of real numbers (rin. . .atr ) is called a contracting ratio list if O 1, define the function f on [0, oc) by f(t) Z= r. i=1 We have f(0)= n and lim f(t)=0<1. t-oo Since f is continuous, there is some postive solution to (4.20). To show that this solution is unique, it is enough to show that f is monotone decreasing. This follows from the fact that its derivative is r2log ri < 0. i=1 QED Definition 4.12.1 The number s in (4.20) is called the similarity dimension of the ratio list (ri,...,rn). Iterated function systems and fractals. A map f : X - Y between two metric spaces is called a similarity with similarity ratio r if dy(f (xi), f (x2)) = rdx(xi,x2) Vx1, x2 E X. (Recall that a map is called Lipschitz with Lipschitz constant r if we only had an inequality, <, instead of an equality in the above.) Let X be a complete metric space, and let (ri,. . . , rn) be a contracting ratio list. A collection (f1,. . ., fn), fi : X ' X is called an iterated function system which realizes the contracting ratio list if fi:X X, i = 1,..., n is a similarity with ratio r2. We also say that (fi,....., fn) is a realization of the ratio list (ri, ...,r ). It is a consequence of Hutchinson's theorem, see below, that  4.12. THE HAUSDORFF DIMENSION OF FRACTALS 121 Proposition 4.12.2 If (fi,... , fn) is a realization of the contracting ratio list (r,..... , rn) on a complete metric space, X, then there exists a unique non-empty compact subset K c X such that K = fi1(K) U -.-. U fn (K). In fact, Hutchinson's theorem asserts the corresponding result where the f, are merely assumed to be Lipschitz maps with Lipschitz constants (ri,....., rn). The set K is sometimes called the fractal associated with the realization (fi,... , fn) of the contracting ratio list (ri,... , rn). The facts we want to establish are: First, dim(K) <;s (4.21) where dim denotes Hausdorff dimension, and s is the similarity dimension of (ri,. . . , rn). In general, we can only assert an inequality here, for the the set K does not fix (ri,..., rn) or its realization. For example, we can repeat some of the r2 and the corresponding f2. This will give us a longer list, and hence a larger s, but will not change K. But we can demand a rather strong form of non-redundancy known as Moran's condition: There exists an open set 0 such that O D f2(O) V i and f2(O) n fj(O)=0 Vi j. (4.22) Then Theorem 4.12.1 If (fi,... , fn) is a realization of (ri,... , rn) on Rd and if Moran's condition holds then dim K = s. The method of proof of (4.21) will be to construct a "model" complete metric space E with a realization (gi,....., gn) of (ri,... , rn) on it, which is "universal" in the sense that " E is itself the fractal associated to (gi,. . ., gn) " The Hausdorff dimension of E is s. " If (fi,....., fn) is a realization of (ri,..... , rn) on a complete metric space X then there exists a unique continuous map h:E-X such that h o g2 = f2 o h. (4.23) " The image h(E) of h is K. " The map h is Lipschitz. This is clearly enough to prove (4.21). A little more work will then prove Moran's theorem.  122 CHAPTER 4. MEASURE THEORY. 4.12.2 The string model. Construction of the model. Let (ri,... , rn) be a contracting ratio list, and let A denote the alphabet con- sisting of the letters {1,... ,n}. Let E denote the space of one sided infinite strings of letters from the alphabet A. If a denotes a finite string (word) of letters from A, we let w, denote the product over all i occurring in a of the r2. Thus we 1 where 0 is the empty string, and, inductively, wae =wc -we, eeA. If x / y are two elements of E, they will have a longest common initial string a, and we then define d(x, y) := we. This makes E into a complete ultrametic space. Define the maps gj : E - E by gi(x) = ix. That is, gj shifts the infinite string one unit to the right and inserts the letter i in the initial position. In terms of our metric, clearly (gi,. . . , gn) is a realization of (r1,... , rn) and the space E itself is the corresponding fractal set. We let [a] denote the set of all strings beginning with a, i.e. whose first word (of length equal to the length of a) is a. The diameter of this set is wa. The Hausdorff dimension of E is s. We begin with a lemma: Lemma 4.12.1 Let A c E have positive diameter. Then there exists a word a such that A c [a] and diam(A) = diam[a] = we. Proof. Since A has at least two elements, there will be a ry which is a prefix of one and not the other. So there will be an integer n (possibly zero) which is the length of the longest common prefix of all elements of A. Then every element of A will begin with this common prefix a which thus satisfies the conditions of the lemma. QED The lemma implies that in computing the Hausdorff measure or dimension, we need only consider covers by sets of the form [a]. Now if we choose s to be the solution of (4.20), then (iama]),\=R diam-ai]",=.(iam1a])R 1  4.12. THE HAUSDORFF DIMENSION OF FRACTALS 123 This means that the method II outer measure assosicated to the function A H (diam A)8 coincides with the method I outer measure and assigns to each set [a] the measure w,. In particular the measure of E is one, and so the Hausdorff dimension of E is s. The universality of E. Let (fi,... , fn) a realization of (ri,... , rn) on a complete metric space X. Choose a point a E X and define ho : E - X by ho(z) a. Inductively define the maps hp by defining hp+1 on each of the open sets [{i}] by hp+1(iz) := f2 (hp (z)). The sequence of maps {hp} is Cauchy in the uniform norm. Indeed, if y E [{i}] so y = gj(z) for some z e E then dx (hi(y), h,(y)) = dx (fi(h,(z)),fi(h_1(z))) = ridx (h,(z), hp_1(z))). So if we let c :=max2(ri) so that 0 < c < 1, we have sup dx (hi(y), h,(y)) < csup dx (h,(x), hp_1(x)) yEE xEE for p > 1 and hence sup dx (h,+1(y),h,(y)) s, and hence = s if we can prove that there exists a constant b such that for every Borel set B c K m(h-1(B)) b - diam(B)s, (4.29) where h : E - K is the map we constructed above from the string model to K. Let us introduce the following notation: For any (finite) non-empty string a, let a- denote the string (of cardinality one less) obtained by removing the last letter in a. Lemma 4.14.1 There exists an integer N such that for any subset B C K the set QB of all finite strings a such that O n B 0 and diam O < diam B < diam O- has at most N elements. Proof. Let D := diamO. The map f, is a similarity with similarity ratio diam[a] so diam O,= D - diam[a]. Let r : min r2. Then if a E QB we have diamO,= D - diam[a] > D - r diam[o-] = r diamO_- > r diam B. Let V denote the volume of 0 relative to the d-dimensional Hausdorff measure of Rd, i.e., up to a constant factor the Lebesgue measure. Let V denote the volume of 0, so that V = w V = V -(diam Oc,)/ diam O)d. From the preceding displayed equation it follows that Vrd V > Dd (diam B)d. If x e B, then every y E Oc, is within a distance diam B + diam Oc, 2 diam B of x. So if m denotes the number of elements in QB, we have m disjoint sets with volume at least D (diam B)d all within a ball of radius 2 - diam B. We have normalized our volume so that the unit ball has volume one, and hence the ball of radius 2 - diam B has volume 2d(diam B)d. Hence Vrd m-Dd (diam B)d 2d (diam B)d  4.14. AFFINE EXAMPLES 131 or 2dDd So any integer greater that the right hand side of this inequality (which is independent of B) will do. Q Now we turn to the proof of (4.29) which will then complete the proof of Moran's theorem. Let B be a Borel subset of K. Then B c U Oa aiYQB so h-1(B) c U [a]. cYCQB Now 1 s 1 ([a]) = (diam[a])8 D diam(O,) < D (diam B)8 and so m(h-1(B)) m(a) < N D (diamB)s aiYQB and hence we may take 1 b=N. DS(diam B)8 and then (4.29) will hold.  132 CHAPTER 4. MEASURE THEORY.  Chapter 5 The Lebesgue integral. In what follows, (X, F, m) is a space with a c--field of sets, and m a measure on . The purpose of this chapter is to develop the theory of the Lebesgue integral for functions defined on X. The theory starts with simple functions, that is functions which take on only finitely many non-zero values, say {a1,. . ., an} and where A2 : f -1(ai) E 7 In other words, we start with functions of the form #(x) = a 1A, AZ E F. (5.1) i=1 Then, for any E EF we would like to define the integral of a simple function over E as Sam(An E) (5.2) i= 1 and extend this definition by some sort of limiting process to a broader class of functions. I haven't yet specified what the range of the functions should be. Certainly, even to get started, we have to allow our functions to take values in a vector space over R, in order that the expression on the right of (5.2) make sense. In fact, I will eventually allow f to take values in a Banach space. However the theory is a bit simpler for real valued functions, where the linear order of the reals makes some arguments easier. Of course it would then be no problem to pass to any finite dimensional space over the reals. But we will on occasion need integrals in infinite dimensional Banach spaces, and that will require a little reworking of the theory. 133  134 CHAPTER 5. THE LEBESGUE INTEGRAL. 5.1 Real valued measurable functions. Recall that if (X, F) and (Y, C) are spaces with o-fields, then f :X -Y is called measurable if f-1(E) eF V E e g. (5.3) Notice that the collection of subsets of Y for which (5.3) holds is a u-field, and hence if it holds for some collection C, it holds for the u-field generated by C. For the next few sections we will take Y = R and g = B, the Borel field. Since the collection of open intervals on the line generate the Borel field, a real valued function f : X - R is measurable if and only if f -1(I) E F for all open intervals I. Equally well, it is enough to check this for intervals of the form (-oc, a) for all real numbers a. Proposition 5.1.1 If F : R2 - R is a continuous function and f, g are two measurable real valued functions on X, then F(f, g) is measurable. Proof. The set F-1(-o0, a) is an open subset of the plane, and hence can be written as the countable union of products of open intervals I x J. So if we set h = F(f, g) then h-1((-oo, a)) is the countable union of the sets f--1(I)ng-1 (J) and hence belongs to F. QED From this elementary proposition we conclude that if f and g are measurable real valued functions then " f + g is measurable (since (x, y) H x + y is continuous), " fg is measurable (since (x, y) H xy is continuous), hence " flA is measurable for any A E F hence " f+ is measurable since f-1([0, oc]) E F and similarly for f- so " f is measurable and so is f - gl. Hence " f A g and f V g are measurable and so on. 5.2 The integral of a non-negative function. We are going to allow for the possibility that a function value or an integral might be infinite. We adopt the convention that 0oc =0.  5.2. THE INTEGRAL OF A NON-NEGATIVE FUNCTION. 135 Recall that # is simple if # takes on a finite number of distinct non-negative (finite) values, a1, ... , an, and that each of the sets A2=#-1(a2) is measurable. These sets partition X: X =A1U --UAn. Of course since the values are distinct, AZn A =0 for i 4j. With this definition, a simple function can be written as in (5.1) and this ex- pression is unique. So we may take (5.2) as the definition of the integral of a simple function. We now extend the definition to an arbitrary ([0, oc] valued) function f by fdm :=sup I(E, f) (5.4) JE where I(Elf) ={ #dm :0< <5 # f, # simple}. (5.5) JE In other words, we take all integrals of expressions of simple functions # such that 5(x) f(x) at all x. We then define the integral of f as the supremum of these values. Notice that if A f-1(oo) has positive measure, then the simple functions n lA are all < f and so fX fdm = oc. Proposition 5.2.1 For simple functions, the definition (5.4) coincides with definition (5.2). Proof. Since # is < itself, the right hand side of (5.2) belongs to I(E, #) and hence is fE 5dm as given by (5.5). We must show the reverse inequality: Suppose that = b1B2, <5#. We can write the right hand side of (5.2) as S bym(En B3) = bjm(EnA nB) since E n BJ is the disjoint union of the sets En AZ n By because the AZ partition X, and m is additive on disjoint finite (even countable) unions. On each of the sets AZ n B we must have b3 < a2. Hence 5bjm(EnA2 nB) 5 aim(EnA2 nB)= aim(EnA) since the B partition X. QED In the course of the proof of the above proposition we have also established 5 # ~z for simple functions implies' dm J#~dm. (5.6)  136 CHAPTER 5. THE LEBESGUE INTEGRAL. Suppose that E and F are disjoint measurable sets. Then m(A2 n (E U F)) =m(A2 n E) + m(A2 n F) so each term on the right of (5.2) breaks up into a sum of two terms and we conclude that If # is simple and E n F= 0, then #dm J #dm + #dm. (5.7) J U F JE JF Also, it is immediate from (5.2) that if a > 0 then If # is simple thenJ adm = a #dm. (5.8) JEJE It is now immediate that these results extend to all non-negative measurable functions. We list the results and then prove them. In what follows f and g are non-negative measurable functions, a > 0 is a real number and E and F are measurable sets: f_ 0. This means that all sets which enter into the right hand side of (5.2) with a2 / 0 have measure zero, so the right hand side vanishes. So I(X, f) consists of the single element 0. This proves - in (5.15). We wish to prove the reverse implication. Let A = {xlf(x) > 0}. We wish to show that m(A) = 0. Now A U= A where An := {xf(x)>- n The sets An are increasing, so we know that m(A) = limn, m(An). So it is enough to prove that m(An) = 0 for all n. But 1 1A,, < f n and is a simple function. So 1 1 - lnd =-m(An < fdm= 0 n x 1Adm xm(An) f implying that m(An) = 0. (5.16): Let E {xlf(x) K g(x)}. Then E is measurable and EC is of measure zero. By definition, lEf i1Eg everywhere, hence by (5.11) 1E fdm Egdm. Jx x But I E fdm+ 1ElfdIdm+ Jfdm J X J XJ E J EC J X where we have used (5.14) and (5.13). Similarly for g. QED  138 CHAPTER 5. THE LEBESGUE INTEGRAL. 5.3 Fatou's lemma. This says: Theorem 5.3.1 If {fn} is a sequence of non-negative functions, then lim inf ffk dm> > lim inf fk) dm. (5.17) n-oo k>n nok>n Recall that the limit inferior of a sequence of numbers {an} is defined as follows: Set bn := inf ak k>n so that the sequence {bn} is non-decreasing, and hence has a limit (possibly infinite) which is defined as the lim inf. For a sequence of functions, lim inf fn is obtained by taking lim inf fn (x) for every x. Consider the sequence of simple functions {1[nm+1]}. At each point x the lim inf is 0, in fact 1[n,n+1](x) becomes and stays 0 as soon as n > x. Thus the right hand side of (5.17) is zero. The numbers which enter into the left hand side are all 1, so the left hand side is 1. Similarly, if we take fn = n1(o,1/n], the left hand side is 1 and the right hand side is 0. So without further assumptions, we generally expect to get strict inequality in Fatou's lemma. Proof: Set 9n :=inf fk k>n so that gn 9n+1 and set f := lim inf fn = lim g . n-oo k>n n-oo Let #<_ f be a simple function. We must show that f#Odm < lim inf fk dm. (5.18) n-oo k>J There are two cases to consider: a) m ({x: 5(x) > 0}) = oc. In this case f #dm = oc and hence f fdm = oc since #5 K f. We must show that liminf f fndm = oc. Let D :={x : #(x) >0} so m(D) =oo. Choose some positive number b < all the positive values taken by #5. This is possible since there are only finitely many such values.  5.3. FATO U'S LEMMA. 139 Let D :_{x g(x) > b}. The Dn D since b < q5(x) lim, gn (x) at each point of D. Hence m(Dn) -> m(D)= oc. But bm(Dn)< J Ydm J fkdm k>ri since gn fk for k > n~. Now J fr ~ fkdm since fk is non-negative. Hence urn inf f fndm=oc b) m ({x : 5(x) > O}) < oc. Choose c > 0 so that it is less than the minimum of the positive values taken on by 0z and set JO(x)- if z(x)>0 () 0 if (x) =0. Let C:_{x g(x) > bE} and C ={x :f (x) > 0 } . Then Cn C. We have Idm < Jgmdm JJm k > n So fkdm k~rn. We will next let n -> oc: Let c2 be the non-zero values of cb6 so ZEECi 1 B  140 CHAPTER 5. THE LEBESGUE INTEGRAL. since (BinC) /B nC=B2. So #ydm < liminf fkdm. Now J f dm =s#dm -em({xz(x) > 0}) Since we are assuming that m ({xz5(x) > 0}) < oc, we can let c - 0 and conclude that f #dm < lim inf f fkdm. QED 5.4 The monotone convergence theorem. We assume that {fn} is a sequence of non-negative measurable functions, and that fn(x) is an increasing sequence for each x. Define f(x) to be the limit (possibly +oo) of this sequence. We describe this situation by fn f. The monotone convergence theorem asserts that: fn> 0, fn / f - lim fndm = f dm. (5.19) The fn are increasing and all < f so the f fndm are monotone increasing and all < f fdm. So the limit exists and is < f fdm. On the other hand, Fatou's lemma gives f dm < lim inf fnidm = lim fndm. QED In the monotone convergence theorem we need only know that fn / f a.e. Indeed, let C be the set where convergence holds, so m(C') = 0. Let g= =1cfn and g = lcf. Then gn / g everywhere, so we may apply (5.19) to gn and g. But f gndm = f fndm and f gdm = f fdm so the theorem holds for fn and f as well. 5.5 The space £1(X, R). We will say an R valued measurable function is integrable if both f f+dm < oc and f f-dm < oc. If this happens, we set fdm :=J f+dm - Jf-dm. (5.20) Since both numbers on the right are finite, this difference makes sense. Some authors prefer to allow one or the other numbers (but not both) to be infinite,  5.5. THE SPACE L1(X, R). 141 in which case the right hand side of (5.20) might be = oc or -oc. We will stick with the above convention. We will denote the set of all (real valued) integrable functions by ,1 or L1 (X) or 21 (X, R) depending on how precise we want to be. Notice that if f g then f+ g+ and f g- all of these functions being non-negative. So fr+dm m fn(x) differs from f(x) by at most 2-n. Hence fn / f a.e., since f is finite a.e because its integral is finite. Similarly we can construct gn / g. Also (f+gn) /f+ga.e. By the a.e. monotone convergence theorem (f+g)dm = limJf(fn+gn)dm = limf fndm+limfgndm = f dm+gdm, where we have used (5.23) for simple functions. This argument shows that f(f + g)dm < oo if both integrals f fdm and f gdm are finite. " For any f E i we conclude from the preceding that |ffdm= (f++ f)dm 0 for all A E F then h > 0 a.e. Proof: Let An : {x h(x) < -n}. Then j hdm j dm m(A)  5.6. THE DOMINATED CONVERGENCE THEOREM. 143 so m(An) = 0. But if we let A := {x h(x) < 0} then An / A and hence m(A) O.QED We have defined the integral of any function f as f fdm = f f+dm - f f--dm, and f f dm = f f+dm + f f--dm. Since for any two non-negative real numbers a - b < a + b we conclude that f dm Jfdm. (5.24) If we define f|1 := Ifdm we have verified that f +g||1 |f|1+|g|, and have also verified that cf|1 = c f 1. In other words, 1 is a semi-norm on L1. From the preceding proposition we know that f|1 = 0 if and only if f = 0 a.e. The question of whether we want to pass to the quotient and identify two functions which differ on a set of measure zero is a matter of taste. 5.6 The dominated convergence theorem. This says that Theorem 5.6.1 Let fn be a sequence of measurable functions such that fan l f E I1and fndm Jfdm. Proof. The functions f, are all integrable, since their positive and negative parts are dominated by g. Assume for the moment that f ;> 0. Then Fatou's lemma says that fdm < liminf fndm. Fatou's lemma applied to g - f, says that J (g - f) dm < lim inf J(g - f,)dm = lim inf gdm -J fmdm Jgdm - lim sup frdm.  144 CHAPTER 5. THE LEBESGUE INTEGRAL. Subtracting f gdm gives lim sup frndm < f dm. So lim sup ffndm Jfdm < lim inf ffndm which can only happen if all three are equal. We have proved the result for non-negative fn. For general fn we can write our hypothesis as -gfn < g a.e.. Adding g to both sides gives O < fn + g < 2g a.e.. We now apply the result for non-negative sequences to g + fn and then subtract off f gdm. 5.7 Riemann integrability. Suppose that X = [a, b] is an interval. What is the relation between the Lebesgue integral and the Riemann integral? Let us suppose that [a, b] is bounded and that f is a bounded function, say f 0, the sums under the integral sign are increasing, and by definition converge to gk. The monotone convergence theorem implies the lemma. QED But both sides of the equation in the lemma might be infinite. Theorem 5.8.1 Beppo-Levi. Let fn E Ci and suppose that Z|ftdm < oo. k=1  146 CHAPTER 5. THE LEBESGUE INTEGRAL. Then Z fk(x) converges to a finite limit for almost all x, the sum is integrable, and fk dm -- fkdm. k=1 k=1 Proof. Take g, :=Ifl in the lemma. If we set g = E Z1 gf = En=1|0fnl then the lemma says that gdm Z | fn dm, n=~1 and we are assuming that this sum is finite. So g is integrable, in particular the set of x for which g(x) = oc must have measure zero. In other words, S Ifn(x)| n1. 2 Then choose n2 > ni so that 1 ||n h,|-<22 V >n2  5.10. DENSE SUBSETS OF E1(R, R). 147 Continuing this way, we have produced a subsequence ha- such that 1 |hnj - hn|| < 23' Let fi := hnim - hn2. Then |ffjdm < 1 so the hypotheses of the Beppo-Levy theorem are satisfied, and E fj converges almost everywhere to some limit f E fi,. But k hn1 +Zfj = hnkl. j=1 So the subsequence hk converges almost We must show that this h is the limit we will use Fatou's lemma. For a given c > 0, choose N so that h = lim hn2 we have, for k > N, everywhere to some h e E1. of the hn in the ||- | norm. For this h-hm < cfor k, n > N. Since ||h - hk||1 = h hk dm = lim|hnj - hk dm < lim inff hn, jJ-ooJ hk dm = liminf |hn -hk < e. QED 5.10 Dense subsets of £1(R, R). Up until now we have been studying integration on an arbitrary measure space (X, F, m). In this section and the next, we will take X = R, F to be the c--field of Lebesgue measurable sets, and m to be Lebesgue measure, in order to simplify some of the formulations and arguments. Suppose that f is a Lebesgue integrable non-negative function on R. We know that for any c > 0 there is a simple function # such that #0<; f and J fdm - #dm= (f - )dm= f- 1 < E. To say that # is simple implies that ZailA2  148 CHAPTER 5. THE LEBESGUE INTEGRAL. (finite sum) where each of the a2 > 0 and since f 5dm < oc each AZ has finite measure. Since m(Ain[-n, n]) -- m(A2) as n -- oc, we may choose n sufficiently large so that f - @||1 < 2E where )Zai1A3,[_,,]. For each of the sets AZ n [-n, n] we can find a bounded open set UZ which contains it, and such that m(U /AZ) is as small as we please. So we can find finitely many bounded open sets UZ such that f - 1:a2lyu|1 < 3E. Each UZ is a countable union of disjoint open intervals, U = U3'i,, and since m(U) => m(Ij), we can find finitely many Ifj, j ranging over a finite set of integers, J such that m (uJ) is as close as we like to m(U). So let us call a step function a function of the form E b211 where the I are bounded intervals. We have shown that we can find a step function with positive coefficients which is as close as we like in the ||1 norm to f. If f is not necessarily non-negative, we know (by definition!) that f+ and f- are in ,1, and so we can approximate each by a step function. the triangle inequality then gives Proposition 5.10.1 The step functions are dense in L1(R, R). If [a, b], a < b is a finite interval, we can approximate 1[a,b] as closely as we like in the ||- | norm by continuous functions: just choose n large enough so that < b - a, and take the function which is0 for x < a, rises linearly from 0 to 1 on [a, a+ ], is identically 1on [a+ , b - ], and goes down linearly from 1 to 0 from b - n to b and stays 0 thereafter. As n --> oc this clearly tends to 1[a,b] in the |-1 norm. So Proposition 5.10.2 The continuous functions of compact support are dense in E1(R, R). As a consequence of this proposition, we see that we could have avoided all of measure theory if our sole purpose was to define the space /21(R, R). We could have defined it to be the completion of the space of continuous functions of compact support relative to the |-1 norm. 5.11 The Riemann-Lebesgue Lemma. We will state and prove this in the "generalized form". Let h be a bounded measurable function on R. We say that h satisfies the averaging condition if lm j  5.11. THE RIEMANN-LEBESGUE LEMMA. 149 For example, if h(t) = cos ft, ( f 0, then the expression under the limit sign in the averaging condition is 1. sin t c which tends to zero as c - oc. Here the oscillations in h are what give rise to the averaging condition. As another example, let h (t) = 1 |t|< t ( 1/|t| It|> 1. Then the left hand side of (5.25) is 1 (1+log c), c >1. Here the averaging condition is satisfied because the integral in (5.25) grows more slowly that cl. Theorem 5.11.1 [Generalized Riemann-Lebesgue Lemma]. Let f E 1([c,d], R), -oc < c < d < oc. If h satisfies the averaging condition (5.25) then limj f(t)h(rt)dt = 0. (5.26) Proof. Our proof will use the density of step functions, Proposition 5.10.1. We first prove the theorem when f = 1[a,b] is the indicator function of a finite interval. Suppose for example that 0 < a < b.Then the integral on the right hand side of (5.26) is 1[a.b]h(rt)dt j= h(rt)dt, or setting x = rt 1 br 1 ra - h(x)dx-!jh(x)dx 0 '0i and each of these terms tends to 0 by hypothesis. The same argument will work for any bounded interval [a, b] we will get a sum or difference of terms as above. So we have proved (5.26) for indicator functions of intervals and hence for step functions. Now let M be such that h < M everywhere (or almost everywhere) and choose a step function s so that f- s1 2M Then fh=(f -s)h+sh f(t)h(rt)dt J(f(t) - s(t)h(rt)dt + s(t)h(rt)dt K (f (t) - s(t))h(rt)dt+J s(t)h(rt)dt FMM JI hrtd  150 CHAPTER 5. THE LEBESGUE INTEGRAL. We can make the second term < 2 by choosing r large enough. QED 5.11.1 The Cantor-Lebesgue theorem. This says: Theorem 5.11.2 If a trigonometric series +aO docos(nt - #n) do E R converges on a set E of positive Lebesgue measure then do -0. (I have written the general form of a real trigonometric series as a cosine series with phases since we are talking about only real valued functions at the present. Of course, applied to the real and imaginary parts, the theorem asserts that if E ane2nx converges on a set of positive measure, then the an - 0. Also, the notation suggests - and this is my intention - that the n's are integers. But in the proof below all that we will need is that the n's are any sequence of real numbers tending to oc.) Proof. The proof is a nice application of the dominated convergence theorem, which was invented by Lebesgue in part precisely to prove this theorem. We may assume (by passing to a subset if necessary) that E is contained in some finite interval [a, b]. If d -+ 0 then there is an c > 0 and a subsequence dmk > E for all k. If the series converges, all its terms go to 0, so this means that cos(rnkt --5k) -0 V t E E. So cos2(nkt --5k)-->0 V t E E. Now m(E) < oc and cos2(nkt - 5k) < 1 and the constant 1 is integrable on [a, b]. So we may take the limit under the integral sign using the dominated convergence theorem to conclude that lim cos2(nkt-- #k)dt = lim cos2(nkt - #k)dt = 0. k-wj EJE k-wo But 1 cos2(nkt-k) =-1+cos2(rnkt--k )] so cos2t nk t -- #x)dt = 1 + cos 2 nk t --#x) ] dt E 2IE Sm( E )+Jcos 2(kt --k#] 1 -m(E)+-1Ecos2(nkt-- k)dt- 22  5.12. FUBINI'S THEOREM. 151 But 1E E E12(R, R) so the second term on the last line goes to 0 by the Riemann Lebesgue Lemma. So the limit is jm(E) instead of 0, a contradiction. QED 5.12 Fubini's theorem. This famous theorem asserts that under suitable conditions, a double integral is equal to an iterated integral. We will prove it for real (and hence finite dimen- sional) valued functions on arbitrary measure spaces. (The proof for Banach space valued functions is a bit more tricky, and we shall omit it as we will not need it. This is one of the reasons why we have developed the real valued theory first.) We begin with some facts about product c--fields. 5.12.1 Product u-fields. Let (X, F) and (Y, C) be spaces with c-fields. On X x Y we can consider the collection P of all sets of the form AxB, A eF, Beg. The o--field generated by P will, by abuse of language, be denoted by JFxg9. If E is any subset of X x Y, by an even more serious abuse of language we will let Ex {yl(x,y) e E} and (contradictorily) we will let EY={xl(x,y) E E}. The set Ex will be called the x-section of E and the set EY will be called the y-section of E. Finally we will let C c P denote the collection of cylinder sets, that is sets of the form AxY AeF or XxB, Beg. In other words, an element of P is a cylinder set when one of the factors is the whole space. Theorem 5.12.1 F x 9 is generated by the collection of cylinder sets C.  152 CHAPTER 5. THE LEBESGUE INTEGRAL. " F x C is the smallest o-field on X x Y such that the projections prx : X x Y - X prx(x, y) = x pry : X x Y -Y pry(x, y) = y are measurable maps. " For each E E F x C and all x E X the x-section Ex of E belongs to C and for all y E Y the y-section Ey of E belongs to F. Proof. A x B = (A x Y) n (X x B) so any u-field containing C must also contain P. This proves the first item. Since prXj(A) = A x Y, the map prx is measurable, and similarly for Y. But also, any u-field containing all A x Y and X x B must contain P by what we just proved. This proves the second item. As to the third item, any set E of the form A x B has the desired section properties, since its x section is B if x E A or the empty set if x g A. Similarly for its y sections. So let 7 denote the collection of subsets E which have the property that all Ex E g and all Ey E . If we show that 7 is a u-field we are done. Now Ex = (Ex)c and similarly for y, so C is closed under taking comple- ments. Similarly for countable unions: UEn = U(En).. \n / n QED 5.12.2 ir-systems and A-systems. Recall that the u-field o(C) generated by a collection C of subsets of X is the intersection of all the u-fields containing C. Sometimes the collection C is closed under finite intersection. In that case, we call C a 7r-system. Examples: " X is a topological space, and C is the collection of open sets in X. " X = R, and C consists of all half infinite intervals of the form (-oc, a]. We will denote this 7 system by 7r(R). A collection 7C of subsets of X will be called a A-system if 1. X E H, 2. A,B E withAnB=0 -AUBe7I, 3. A,B ENandBcA -> (A\B)E ,and 4. {A}?C7Land An/A Ae71H.  5.12. FUBINI'S THEOREM. 153 From items 1) and 3) we see that a A-system is closed under complementa- tion, and since 0 = XC it contains the empty set. If B is both a 7-system and a A system, it is closed under any finite union, since AU B = AU (B/(AnB) which is a disjoint union. Any countable union can be written in the form A =/ An where the An are finite disjoint unions as we have already argued. So we have proved Proposition 5.12.1 Iff7 is both a 7-system and a A-system then it is a o-field. Also, we have Proposition 5.12.2 [Dynkin's lemma.] If C is a 7-system, then the o-field generated by C is the smallest A-system containing C. Let M be the o-field generated by C, and 7 the smallest A-system containing C. So M D 7. By the preceding proposition, all we need to do is show that 7 is a 7r-system. Let 711 :={Al AnC e71VC E C}. Clearly 71 is a A-system containing C, so 7 C 71 which means that A nC E 7 for all A E 7 and C E C. Let X2:= {A AnHe71 VHe71}. 712 is again a A-system, and it contains C by what we have just proved. So 72 D 7, which means that the intersection of two elements of 7 is again in 7, i.e. 7 is a 7-system. QED 5.12.3 The monotone class theorem. Theorem 5.12.2 Let B be a class of bounded real valued functions on a space Z satisfying 1. B is a vector space over R. 2. The constant function 1 belongs to B. 3. B contains the indicator functions lA for all A belonging to a 7-system I. 4. If {fn} is a sequence of non-negative functions in B and fn / f where f is a bounded function on Z, then f E B. Then B contains every bounded M measurable function, where M is the u-field generated by I. Proof. Let 71 denote the class of subsets of Z whose indicator functions belong to B. Then Z E 7 by item 2). IfB c A are both in71, then lA\B =1-1B and so A \ B belongs to 71 by item 1). Similarly, if Amn B =0 then lAUB 1A -|1B  154 CHAPTER 5. THE LEBESGUE INTEGRAL. and so if A and B belong to 7 so does A U B when An B = 0. Finally, condition 4) in the theorem implies condition 4) in the definition of a A-system. So we have proved that that 7 is a A-system containing I. So by Dynkin's lemma, it contains M. Now suppose that 0 < f < K is a bounded M measurable function, where we may take K to be an integer. For each integer n ;> 0 divide the interval [0, K] up into subintervals of size 2-n, and let A(n, i) := {zli2 -- f (z) < (i + 1)2--} where i ranges from 0 to K2'. Let K2" sn(z) S: 1A(n,i)- i=0 Since f is assumed to be M-measurable, each A(n, i) E M, so by the preceding, and condition 1), fn E B. But 0 < sn / f, and hence by condition 4), f E B. For a general bounded M measurable f, both f+ and f- are bounded and M measurable, and hence by the preceding and condition 1), f = f+ - f B.QED We now want to apply the monotone class theorem to our situation of a product space. So Z = X x Y, where (X, F) and (Y, C) are spaces with u-fields, and where we take I = P to be the 7r-system consisting of the product sets AxB, AE.F,Beg. Proposition 5.12.3 Let B consist of all bounded real valued functions f on X x Y which are F x g-measurable, and which have the property that e for each x E X, the function y F- f(x, y) is g-measurable, and e for each y E Y the function x F- f(x, y) is ,F-measurable. Then B consists of all bounded F x C measurable functions. Indeed, y H lAx B(x, y) = 1B(y) if x e A and = 0 otherwise; and similarly for x H 1Ax B(x, y). So condition 3) of the monotone class theorem is satisfied, and the other conditions are immediate. Since F x C was defined to be the u-field generated by P, the proposition is an immediate consequence of the monotone class theorem. 5.12.4 Fubini for finite measures and bounded functions. Let (X, F, m) and (Y, g, n) be measure spaces with m(X) < oc and n(Y) < oc. For every bounded F x g-measurable function f, we know that the function f (x, ) : yHf(x, y)  5.12. FUBINI'S THEOREM. 155 is bounded and g measurable. Hence it has an integral with respect to the measure n, which we will denote by f (x, y)n(dy) This is a bounded function of x (which we will prove to be F measurable in just a moment). Similarly we can form f (x, y)m(d) which is a function of y. Proposition 5.12.4 Let B denote the space of bounded F x C measurable func- tions such that " f. f(x, y)n(dy) is a F measurable function on X, e fX f(x, y)m(dx) is a g measurable function on Y and " f (x, Y)n(dy) m(dx) J Jf (x, Y)m(dx) n(dy). (5.27) Then B consists of all bounded F x C measurable functions. Proof. We have verified that the first two items hold for lAxB. Both sides of (5.27) equal m(A)n(B) as is clear from the proof of Proposition 5.12.3. So conditions 1-3 of the monotone class theorem are clearly satisfied, and condition 4) is a consequence of two double applications of the monotone convergence theorem. QED Now for any C E F x g we define (mxn)(C) J ( lC(x, y)n(dy) m(dx) J( lC(x, Y)m(dx) n(dy), Jx\JY /Y \Jx/ (5.28) both sides being equal on account of the preceding proposition. This measure assigns the value m(A)n(B) to any set A x B E P, and since P generates F x g as a sigma field, any two measures which agree on P must agree on F x g. Hence m x n is the unique measure which assigns the value m(A)n(B) to sets of P. Furthermore, we know that f (x, y)(m x n) J ( f (x, Y)n(dY) m(dx) J Jf (x, y)m(dx) n(dy) J XxYJ X \JY /Y \JX (5.29) is true for functions of the form 1Ax B and hence by the monotone class theorem it is true for all bounded functions which are measurable relative to F x g. The above assertions are the content of Fubini's theorem for bounded mea- sures and functions. We summarize:  156 CHAPTER 5. THE LEBESGUE INTEGRAL. Theorem 5.12.3 Let (X, F, m) and (Y, g, n) be measure spaces with m(X) < oc and n(Y) < oc. There exists a unique measure on F x g with the property that (mxrn)(AxB)=m(A)n(B) V A x B EGP. For any bounded F x C measurable function, the double integral is equal to the iterated integral in the sense that (5.29) holds. 5.12.5 Extensions to unbounded functions and to u-finite measures. Suppose that we temporarily keep the condition that m(X) 0 _ 1(f) > 0 or equivalently f g _ 1(f) 1(g). 3. fn\ 0 4 I(fn) \0. For example, we might take S = Rh, L = the space of continuous functions of compact support on Rh, and I to be the Riemann integral. The first two items on the above list are clearly satisfied. As to the third, we recall Dini's lemma from the notes on metric spaces, which says that a sequence of continuous functions of compact support {fn} on a metric space which satisfies fn \ 0 actually converges uniformly to 0. Furthermore the supports of the fn are all contained in a fixed compact set - for example the support of fi. This establishes the third item. 157  158 CHAPTER 6. THE DANIELL INTEGRAL. The plan is now to successively increase the class of functions on which the integral is defined. Define U := {limits of monotone non-decreasing sequences of elements of L}. We will use the word "increasing" as synonymous with "monotone non-decreasing" so as to simplify the language. Lemma 6.1.1 If f, is an increasing sequence of elements of L and if k E L satisfies k < lim fn then lim I(fn) 1(k). Proof. If k E L and lim f, k, then fnAk 0 so 1(h) - 1(g) =I(h + (-g)) =I(h - g) > 0. A function f is called I-summable if for every E > 0, 1 g E -U, h E U with g< f fe.M(g). So L c M(g) for any g e B, and since it is a monotone class B C M(g). This says that f, g e B -> f+g e B, f A g E B and f V g e B. Similarly, let M be the class of functions for which cf E B for all real c. This is a monotone class containing L hence contains B. QED  6.3. MEASURE. 161 Lemma 6.2.2 If f E B there exists a g E U such that f Kg. Proof. The limit of a monotone increasing sequence of functions in U belongs to U. Hence the set of f for which the lemma is true is a monotone class which contains L. hence it contains B. QED A function f is L-bounded if there exists a g E L+ with fl Kg. A class F of functions is said to be L-monotone if F is closed under monotone limits of L-bounded functions. Theorem 6.2.2 The smallest L-monotone class including L+ is B+. Proof. Call this smallest family F. If g E L+, the set of all f e 1+ such that f A g E F form a monotone class containing L+, hence containing B+ hence equal to B+. If f E 8+ and f g then f A g = f E F. So F contains all L bounded functions belonging to B+. Let f E g+. By the lemma, choose g E U such that f g, and choose g, E L+ with gn / g. Then f A gn gn and so is L bounded, so f A gn EF. Since (f A gn) - f we see that f EF. So B+c 7i. We know that B+ is a monotone class, in particular an L-monotone class. Hence F= B+. QED Define L1 := L1nB. Since L1 and B are both closed under the lattice operations, f EL #feL L |f EL1. Theorem 6.2.3 If f E B then f E L <>g E L' with f| Kg. We have proved =#>: simply take g =Ifl. For the converse we may assume that f > 0 by applying the result to f+ and f-. The family of all h E B+ such that h A g E L' is monotone and includes L+ so includes B+. So f= f A g e L1. QED Extend I to all of B+ be setting it = oc on functions which do not belong to L1. 6.3 Measure. Loomis calls a set A integrable if lA E B. The monotone class properties of B imply that the integrable sets form a c--field. Then define p(A) := a  162 CHAPTER 6. THE DANIELL INTEGRAL. Add Stone's axiom f E L -> f A1 e L. Then the monotone class property implies that this is true with L replaced by B. Theorem 6.3.1 f e B and a > 0 -> then Aa := {p f(p) > a} is an integrable set. If f E L1 then Proof. Let Then p(Aa) < 00. fn:=[n(f-fAa)]A1 EB. 1 if f (x) ;> a+ fn(x) = 0 if f (x) < a. n(f(x) - a)if a< f(x) 0 and Aa is integrable for all a > 0 then f e B. Proof. For 6 > 1 define for m E Z and Am := { xlo' < f (x) < om+1} f " Each ft e B. Take on --22--. Then each successive subdivision divides the previous one into "octaves" and f6- / f. QED Also and So we have ft < f of6 I(f6)S 6nuA~ Jfdt I(f) (6p(A) 61fodp. I(fM < I(f) < 6I(fh)  6.4. HOLDER, MINKO WSKI , LP AND LO. 163 and J fsdp< 1 fdp < b5 Jfsdp. So if either of 1(f) or f fdp is finite they both are and I(f) - J f dp < (6 - 1)I(f6) (6 - 1)1(f). So Ifd It(f). If f E g+ and a > 0 then {xf(x)a> b} = {xf(x) > ba}. So f E g+ - fa E 3+ and hence the product of two elements of B+ belongs to B+ because 12 fg = [(f +g2)2_ (f _ 6.4 Holder, Minkowski , LP and L'. The numbers p, q > 1 are called conjugate if 1 1 - + - 1. p q This is the same as pq= p + q or (p -1)(q -1) =1. This last equation says that if y=X- then x = yq-1. The area under the curve y = xp-1 from 0 to a is ap A= p while the area between the same curve and the y-axis up to y = b bq  164 CHAPTER 6. THE DANIELL INTEGRAL. Suppose b < ap-1 to fix the ideas. Then area ab of the rectangle is less than A + B or av bg + > ab p q with equality if and only if b =a-1. Replacing a by ap and b by b i gives aP bq 1 this is a (semi-)norm. If f e LPand g e Lc with fp 0and g q#Otake a =fP b- fp |g q as functions. Then | |g dp |f ||p|g ||qf|dp 6+ |g |dpA |f |, | | J(fg)d - fUP (±>f d+q g qJ/p q This shows that the left hand side is integrable and that Ifgdp ||f|pg|q (6.1) which is known as H6lder's inequality. (If either fll or g|q = 0 then fg = 0 a.e. and H6lder's inequality is trivial.) We write (f,g): fgdp. Proposition 6.4.1 [Minkowski's inequality] If f, g E LP, p > 1 then f+g E LP and f+g|| |f|+|g||p. For p = 1 this is obvious. If p > 1 f+gP<[2max(f,Ig)]P<2P[fP+|gP] implies that f + g e LP. Write f + g|| &I(|f+ g|P |f|)+I(|f+ g|1|g|). Now  6.4. HOLDER, MINKO WSKI , LP AND LO. 165 so f+ gp-1 E Lq and its |-q norm is I(f + gp) (fqP )1P=I(f + g p) P |f+ g||-1. So we can write the preceding inequality as f+g (f, ff+gp-1)+(g, f+gp-1) and apply H6lder's inequality to conclude that f+g < f+g|P-1(f||+|g). We may divide by f + g||-1 to get Minkowski's inequality unless f + g|| = 0 in which case it is obvious. QED Theorem 6.4.1 LP is complete. Proof. Suppose fn ;> 0, fn E LP, and E f < oc Then ka := fj E LP by Minkowski and since k/ f we have k|p / fP and hence by the monotone convergence theorem f : _ faE LP and |f|| = lim||k|| 5 0 so gn is increasing and similarly hn is decreasing. Hence f := lim gn e LP and f - f| ||hn - gn| p 2-n+2 -> 0. So the subsequence has a limit which then must be the limit of the original sequence. QED Proposition 6.4.2 L is dense in LP for any 1 < p < oo. Proof. For p = 1 this was a defining property of L1. More generally, suppose that f E LP and that f > 0. Let 1 A:={x : - < f(x) 0 then flalim |f||q. (6 q-oc (6.2)  6.6. THE RADON-NIKODYM THEOREM. 167 Remark. In the statement of the theorem, both sides of (6.2) are allowed to be oc. Proof. If f||o = 0, then f|q = 0 for all q > 0 so the result is trivial in this case. So let us assume that f|o > 0 and let a be any positive number smaller that f||m. In other words, 0a}. This set has positive measure by the choice of a, and its measure is finite since f E LP. Also 1/q f||q > ( f|q > ap( Aa)*1. JAa Letting q - oc gives lim inf flq;> a q--oo and since a can be any number < f|o we conclude that lim inf f q >f|o. q--oo So we need to prove that lim|flq < f|o. This is obvious if f||o = oc. So suppose that f|o is finite. Then for q > p we have f l IfP l lf|| )4-P almost everywhere. Integrating and taking the q-th root gives fl q <;(lf|p) (llf||0)1--q. Letting q - oc gives the desired result. QED 6.6 The Radon-Nikodym Theorem. Suppose we are given two integrals, I and J on the same space L. That is, both I and J satisfy the three conditions of linearity, positivity, and the monotone limit property that went into our definition of the term "integral". We say that J is absolutely continuous with respect to I if every set which is I null (i.e. has measure zero with respect to the measure associated to I) is J null. The integral I is said to be bounded if I(1) < oc, or, what amounts to the same thing, that A1(S) 1. Taking f = 1N in the preceding string of equalities shows that J(1N) nI(1N). Since n~ is arbitrary, we have proved  6.6. THE RADON-NIKODYM THEOREM. 169 Lemma 6.6.1 The set where g > 1 has I measure zero. We have not yet used the assumption that J is absolutely continuous with respect to I. Let us now use this assumption to conclude that N is also J-null. This means that if f > 0 and f E L1(J) then fg"m\ 0 almost everywhere (J), and hence by the dominated convergence theorem J(fg") \ 0. Plugging this back into the above string of equalities shows (by the monotone convergence theorem for I) that 00 f S i=1 converges in the L1(I) norm to J(f). In particular, since J(1) < oo, we may take f = 1 and conclude that Z> g converges in L1(I). So set 00 fo :=_g E L1(I). i=1 We have 1 fo = almost everywhere 1-g so g f= -° 1 almost everywhere fo and J(f) = I(ffo) for f > 0, f E L1(J). By breaking any f E L1(J) into the difference of its positive and negative parts, we conclude that (6.3) holds for all f E L1(J). The uniqueness of fo (almost everywhere (I)) follows from the uniqueness of g in L2(K). QED The Radon Nikodym theorem can be extended in two directions. First of all, let us continue with our assumption that I and J are bounded, but drop the absolute continuity requirement. Let us say that an integral H is absolutely singular with respect to I if there is a set N of I-measure zero such that J(h) = 0 for any h vanishing on N. Let us now go back to Lemma 6.6.1. Define Jsing by Jsing(f) --J(1Nf)- Then Jsing is singular with respect to I, and we can write J Jcont + Jsing where Jeomt =J -- Jsrng --J(1Ne')-  170 CHAPTER 6. THE DANIELL INTEGRAL. Then we can apply the rest of the proof of the Radon Nikodym theorem to Jcont to conclude that Jcone(f) I(f fo) where fo = E°i(1Ncg)2 is an element of L1(I) as before. In particular, Jcont is absolutely continuous with respect to I. A second extension is to certain situations where S is not of finite measure. We say that a function f is locally L1 if fIA E L1 for every set A with p(A) < oc. We say that S is a-finite with respect to p if S is a countable union of sets of finite p measure. This is the same as saying that 1 =is E B. If S is a-finite then it can be written as a disjoint union of sets of finite measure. If S is a-finite with respect to both I and J it can be written as the disjoint union of countably many sets which are both I and J finite. So if J is absolutely continuous with respect I, we can apply the Radon-Nikodym theorem to each of these sets of finite measure, and conclude that there is an fo which is locally L' with respect to I, such that J(f) = I(ffo) for all f E L1(J), and fo is unique up to almost everywhere equality. 6.7 The dual space of LP. Recall that H6lder's inequality (6.1) says that ffgdp < f|p|g|q if f E Lp and g E Lq where 1 1 - + - 1. p q For the rest of this section we will assume without further mention that this relation between p and q holds. H6lder's inequality implies that we have a map from Lq (LP)* sending g E Lq to the continuous linear function on LP which sends f -' I(fg) J=fgdp. Furthermore, H6lder's inequality says that the norm of this map from Lq' (LP)* is < 1. In particular, this map is injective. The theorem we want to prove is that under suitable conditions on S and I (which are more general even that a-finiteness) this map is surjective for 1 < p < oo. We will first prove the theorem in the case where p(S) < oc, that is when I is a bounded integral. For this we will will need a lemma:  6.7. THE DUAL SPACE OF LP. 171 6.7.1 The variations of a bounded functional. Suppose we start with an arbitrary L and I. For each 1 < p < oc we have the norm || -|| on L which makes L into a real normed linear space. Let F be a linear function on L which is bounded with respect to this norm, so that F(f) Cf l for all f E L where C is some non-negative constant. The least upper bound of the set of C which work is called Fll as usual. If f > 0 E L, define F+(f) := lub{F(g): 0 < g f, g E L}. Then F+(f) > 0 and F+(f) |Fll|f l since F(g) fi(x) we have g(x) - g A fi(x) = g(x) - fi(x) f2(x). So g -g A f1 f2 and so F+(f1 + f2) = lub F(g) 0, we see that F--(f) > 0 if f > 0. Clearly F- F+|+ F| 2|Flp.We have proved: Proposition 6.7.1 Every linear function on L which is bounded with respect to the | | norm can be written as the difference F = F+ - F- of two lin- ear functions which are bounded and take non-negative values on non-negative functions. In fact, we could formulate this proposition more abstractly as dealing with a normed vector space which has an order relation consistent with its metric but we shall refrain from this more abstract formulation. 6.7.2 Duality of LP and IU when ,u(S) 1 and q=oc ifp=1. Proof. Consider the restriction of F to L. We know that F = F+ - F- where both F+ and F- are linear and non-negative and are bounded with respect to the | - norm on L. The monotone convergence theorem implies that if f \ 0 then |fn|| - 0 and the boundedness of F+ with respect to the | | says that |fn||p- -> F+(fn)- 0. So F+ satisfies all the axioms for an integral, and so does F-. If f vanishes outside a set of I measure zero, then f|| = 0. Applied to a function of the form f = lA we conclude that if A has pu =p measure zero, then A has measure zero with respect to the measures determined by F± or F-. We can apply the  6.7. THE DUAL SPACE OF LP. 173 Radon-Nikodym theorem to conclude that there are functions g+ and g- which belong to L1 (I) and such that F'(f) = I(fg') for every f which belongs to L1(F'). In particular, if we set g := g+ - g- then F(f) = I(fg) for every function f which is integrable with respect to both F+ and F-, in particular for any f e LP(I). We must show that g E L . We first treat the case where p > 1. Suppose that 0 < f g and that f is bounded. Then I(fq) < I(fq-1 - sgn(g)g) = F(fq-1 - sgn(g)) |F ll f- ll. So I(N) F (j(f(q-1)p)) P Now (q - 1)p =q so we have I(fq) |F l(fq) I(f)1 This gives f|q |F|| for all 0 < f g with f bounded. We can choose such functions f n with f / g. It follows from the monotone convergence theorem that g and hence g e L (I). This proves the theorem for p > 1. Let us now give the argument for p = 1. We want to show that g|| |F||. Suppose that g > F|1 + E where c > 0. Consider the function lA where A := {x : g(x)> F|1 + }. __ 2 Then ( F|1 + )p(A) I(l lg) = I(lAsgn(g)g) = F(lAsgn(g)) 2 SF|11Asgn(g)|1 =|F|1(A) which is impossible unless p(A) = 0, contrary to our assumption. QED 6.7.3 The case where ,u(S) =0c. Here the cases p > 1 and p = 1 may be different, depending on "how infinite S is". Let us first consider the case where p > 1. If we restrict the functional F to any subspace of LP its norm can only decrease. Consider a subspace consisting of all functions which vanish outside a subset S1 where p(S1) < oc. We get  174 CHAPTER 6. THE DANIELL INTEGRAL. a corresponding function gi defined on Si (and set equal to zero off S1) with 9|q |F l and F(f) = I(fg1) for all f belonging to this subspace. If (S2, g2) is a second such pair, then the uniqueness part of the theorem shows that g1 = 92 almost everywhere on Si n S2. Thus we can consistently define 912 on S1 U S2. Let b := lub{ gc,|q} taken over all such go. Since this set of numbers is bounded by Fll this least upper bound is finite. We can therefore find a nested sequence of sets Sn and corresponding functions gn such that gn /b. By the triangle inequality, if n > m then gnm-gm|q ||gnq --|m|q and so, as in your proof of the L2 Martingale convergence theorem, this sequence is Cauchy in the q norm. Hence there is a limit g e Lq and g is supported on So U=Sn. There can be no pair (S', g') with S disjoint from So and g' f 0 on a subset of positive measure of S'. Indeed, if this were the case, then we could consider g + g' on S U S' and this would have a strictly larger || - ||q norm than g|q = b, contradicting the definition of b. (It is at this point in the argument that we use q < oc which is the same as p > 1.) Thus F vanishes on any function which is supported outside So. We have thus reduced the theorem to the case where S is a-finite. If S is a-finite, decompose S into a disjoint union of sets AZ of finite measure. Let fm denote the restriction of f E LP to Am and let hm denote the restriction of g to Am. Then 00 fm= f m=1 as a convergent series in LP and so F(f) =_1:F(fm) fSfmhm and this last series converges to I(fg) in L1. So we have proved that (LP)* = Lq in complete generality when p > 1, and for a-finite S when p = 1. It may happen (and will happen when we consider the Haar integral on the most general locally compact group) that we don't even have a-finiteness. But we will have the following more complicated condition: Recall that a set A is called integrable (by Loomis) if lA E B. Now suppose that S =U S  6.8. INTEGRATION ON LOCALLY COMPACT HAUSDORFF SPACES.175 where this union is disjoint, but possibly uncountable, of integrable sets, and with the property that every integrable set is contained in at most a countable union of the Sc,. A set A is called measurable if the intersections A n Sc, are all integrable, and a function is called measurable if its restriction to each Sc, has the property that the restriction of f to each Sc, belongs to B, and further, that either the restriction of f+ to every Sc, or the restriction of f- to every Sc, belongs to L1. If we find ourselves in this situation, then we can find a gc, on each Sc, since Sc, is o-finite, and piece these all together to get a g defined on all of S. If f E L1 then the set where f f 0 can have intersections with positive measure with only countably many of the Sc, and so we can apply the result for the o-finite case for p = 1 to this more general case as well. 6.8 Integration on locally compact Hausdorff spaces. Suppose that S is a locally compact Hausdorff space. As in the case of R", we can (and will) take L to be the space of continuous functions of compact support. Dini's lemma then says that if fn E L \ 0 then f -- 0 in the uniform topology. If A is any subset of S we will denote the set of f E L whose support is contained in A by LA. Lemma 6.8.1 A non-negative linear function I is bounded in the uniform norm on LC whenever C is compact. Proof. Choose g > 0 E L so that g(x) > 1 for x E C. If f E Lc then f f|og so I(f)| 0 we can find a k of the above form with h - k a. Since a was any number < p(U), we conclude that p(U) is m*(A n U) + m*(A n U ) (6.10) for any open set U and any set A with m*(A) < oc: If c > 0, choose an open set V D A with m*(V) m*(V n U) + m*(V n UC) - 2E. (6.11) This will then imply that m*(A) > m*(A n U) + m*(A n UC) - 3E and since c > 0 is arbitrary, this will imply (6.10). Using the definition (6.7), we can find an fi E L such that fi<-1Vnu and Supp(fi) c V n U with I(fi) > m*(V n U) - E. Let K := Supp(fi). Then K c U and so Kc D Uc and Kc is open. Hence V n Kc is an open set and VnKC D VnU. Using the definition (6.7), we can find an f2 E L such that f2<-l1VnKc and Supp(f2) c V n KC with I(f2) >m*(VnKc)-E. But m*(V n Kc) ;> m*(V n Uc) since V n Kc D V n Uc. So I(f2) > m*(V n UC) - e. So fi + f2 1K + 1VnKe 1-cE}. Then U, is an open set containing K. If 0 < g E L satisfies g < lu, then g = 0 on Uc, and for x E U6, g(x) < 1 while f(x) > 1 - E. So 1 and hence, by (6.7) m*(Ue) < I(f). ~1- So, by (6.8) 1 p(K) < m*(UE) < 1I fm 1(f) < p(Supp(f)). (6.12) So let V be an open set containing Supp(f). By definition (6.7), P(V) > I(f)  184 CHAPTER 6. THE DANIELL INTEGRAL. and, since V is an arbitrary open set containing Supp(f), we have p(Supp(f)) ;> I (f ) using the definition (6.8) of m*(Supp(f)). In the course of this argument we have proved Proposition 6.10.2 If g E L,O g 1K where K is compact, then 1(g) p(K). 6.10.5 Conclusion of the proof. Finally, we must show that all the elements of L are integrable with respect to p and I(f) =Jf dp. (6.13) Since the elements of L are continuous, they are Borel measurable. As every f E L can be written as the difference of two non-negative elements of L, and as both sides of (6.13) are linear in f, it is enough to prove (6.13) for non-negative functions. Following Lebesgue, divide the "y-axis" up into intervals of size E. That is, let c be a positive number, and, for every positive integer n set 1 0 if f (x) (n - 1)E fn (x):= f(x) - (n - 1)Eif (n-1)c f only the first alternative can occur, so all but finitely many of the fn vanish, and they all are continuous and have compact support so belong to L. Also f =Sfn this sum being finite, as we have observed, and so I(f) =I(fn). Set KO:= Supp(f) and Kn := {x : f(x)>ne} n = 1, 2,.... Then the KZ are a nested decreasing collection of compact sets, and ElKn f ElK_1- By Propositions 6.10.1 and6.10.2 we have Ep(Kn) I(fn) Ep(Km_1).  6.10. EXISTENCE. 185 On the other hand, the monotonicity of the integral (and its definition) imply that i(Kn) Jffndp i(Kn_1). Summing these inequalities gives N N-1 EZ/p(Kn) I(f) EZ< p(Kn) i=1 i=0 N N-1 EZAp(Kn) ffdp EZ(p(Kn) i=1 i=0 where N is sufficiently large. Thus 1(f) and f fdp lie within a distance N-1 N E 3 I(Kn) - EZ:p(Kn) = ep(Ko) - Ep(KN) r}) = 2 ) e-2/2t dx (2x j e_,2/2tdx ()= 1/2 m _e-2/2t 1/2 - pro{W) t)> rT+ -)T+ r  7.1. WIENER MEASURE. 191 For fixed r this tends to zero (very fast) as t - 0. In n-dimensions ||yll > E (in the Euclidean norm) implies that at least one of its coordinates y2 satisfies yl > E/ so we find that pr,({|w(t) - x > E}) ce--2/2nt for a suitable constant depending only on n. In particular, if we let p( , 6) denote the supremum of the above probability over all 0 < t < 6 then p( , 6) = o(6). (7.7) Lemma 7.1.1 Let0 < t1 - -"-" cE for some j = 1, ... m}. Then 1 pr<(A) 2p(-, 6) (7.8) 2' independently of the number m of steps. Proof. Let 1 B:={Wlw(t1)-w(tm)|>2 let 1 C:= {w lw(t2) - W(tm) > -} 2 and let D2 := {w lw(t1) -w(t2)| > eand |w(t1)- w(tk)|iE}  192 CHAPTER 7. WIENER MEASURE, BROWNIAN MOTION AND WHITE NOISE. so that 1C2 = Ft2,tn- Similarly, let G be the indicator function of the subset of RN x R x - - - x R~N (i copies) consisting of all points with Izk-- zl <;E, k =1,..., i- 1, Iz1-xzil> E so that 1D2 G,t,...,ty . Then pr (Ci n Di) ... . p(x, x1; t1) - - -p(zi_1, xi; ti-ti_1)F(x1, . .. , zi)G(zi, zn)p(zi, zn; in -ti)dzi -.-.-dzx. The last integral (with respect to xn) is < p(jE, 6). Thus prx (Ci n Di) < p( , 6) prx (Di). 2' The Di are disjoint by definition, so pr (Di) < prx(JDi) < 1. So 1 1 pr,(A) < pr,(B) + p(-, 6) <2p(- 6). QED Let E:{w w(t) - w(ty) >2 for some 1 2E then either w(ti) - w(ty) > E or w(t1) - w(tk) > E (or both). So 1 pr<(E) 2p(-cE b). (7.10) Lemma 7.1.2 Let 0 < a < b with b - a < 6. Let E(a, b, c) {w w(s) - w(t) > 2E for some s, t E [a, b]}. Then 1 pr (E(a, b.E)) 2p(-, 6). Proof. Here is where we are going to use the regularity of the measure. Let S denote a finite subset of [a, b] and and let E(a, b, E, S) := {w l|w(s) - w(t) > 2E for some s, t e S}.  7.1. WIENER MEASURE. 193 Then E(a, b, c, S) is an open set and pr,(E(a, b, c, S)) < 2p(jc, 6) for any S. The union over all S of the E(a, b, c, S) is E(a, b, c). The regularity of the measure now implies the lemma. QED Let k and n be integers, and set 1 n Let F(k,c, 6) := {w l |w(t) - w(s)| > 4 for some t, s e [0, k], with t - s < b}. Then we claim that p( j,6) prx(F(k, c, ()) < 2k . 6 (7.11) Indeed, [0, k] is the union of the nk = k/6 subintervals [0, 6], [6, 26], ..., [k - 6, 6]. If w e F(k, c, 6) then w(s) - w(t) > 4e for some s and t which lie in either the same or in adjacent subintervals. So w must lie in E(a, b, c) for one of these subintervals, and there are kn of them. QED Let w E Q be a continuous path in R"h. Restricted to any interval [0, k] it is uniformly continuous. This means that for any c > 0 it belongs to the complement of the set F(k, c, 6) for some 6. We can let c = 1/p for some integer p. Let C denote the set of continuous paths from [0, oc) to Rh. Then C =O UF(k,e,6)c k c 6 so the complement Cc of the set of continuous paths is U O OF(k,e, 6), k c 6 a countable union of sets of measure zero since prx (OF(kE,6)) lim2kp(-5,6)/= 0. 6-0 o 2 We have thus proved a famous theorem of Wiener: Theorem 7.1.1 [Wiener.] The measure prx is concentrated on the space of continuous paths, i.e. pr (C) = 1. In particular, there is a probability measure on the space of continuous paths starting at the origin which assigns probability pro (E) = I J p(0, xi; ti)p(xi, x2; t2 - t1) . . p(Xm-1, Xm, tm - tm-1)dxi . . .cdm JE1 J ,-m to the set of all paths w which start at 0 and pass through the set E1 at time t1, the set E~2 at time t2 etc. and we have denoted this set of paths by E.  194CHAPTER 7. WIENER MEASURE, BROWNIAN MOTION AND WHITE NOISE. 7.1.4 Embedding in S'. For convenience in notation let me now specialize to the case n = 1. Let WcC consist of those paths w with w(0) = 0 and (1+t)-2w(t)dt < oo. Proposition 7.1.1 [Stroock] The Wiener measure pr0 is concentrated on WV. Indeed, we let E(|w(t)|) denote the expectation of the function w(t) of w with respect to Wiener measure, so E(|(t)|) ze-X2/2t dx - 1 - t Xe-X2/tdz = Ct1/2. JRJt ot Thus, by Fubini, E (1+ t)-2|w(t)|dt) J(1+t)-2E(|w(t)) < oo. Hence the set of w with fj (1+t)-2|w(t)|dt = oc must have measure zero. QED Now each element of W defines a tempered distribution, i.e. an element of S' according to the rule 00 We claim that this map from W to S' is measurable and hence the Wiener measure pushes forward to give a measure on S'. To see this, let us first put a different topology (of uniform convergence) on WV. In other words, for each w E W let U,(w) consist of all w1 such that sup |wi(t) - w(t)| 0 and take these to form a basis for a topology on W. Since we put the weak topology on S' it is clear that the map (7.12) is continuous relative to this new topology. So it will be sufficient to show that each set UE(w) is of the form A n W where A is in B(Q), the Borel field associated to the (product) topology on f. So first consider the subsets Vs,6(w) of W consisting of all w1 E W such that 1 sup wi(t) -w(t)| e- -. [>0  7.2. STOCHASTIC PROCESSES AND GENERALIZED STOCHASTIC PROCESSES. 195 Clearly U (w) U= V,6(w), a countable union, so it is enough to show that each Vn,E(w) is of the form An n W where An E B(X). Now by the definition of the topology on Q, if r is any real number, the set 1 Am,r : {wi l w1(r) - w(r)| e - - An~r n is closed. So if we let r range over the non-negative rational numbers Q+, then An = n An,r rEQ+ belongs to B(Q). But if wi is continuous, then if wi E An then suptER+Pi1(t) - w(t) <;E -n, and so An n W = Vn,E(w) as was to be proved. 7.2 Stochastic processes and generalized stochas- tic processes. In the standard probability literature a stochastic process is defined as follows: one is given an index set T and for each t E T one has a random variable X(t). More precisely, one has some probability triple (Q, F, P) and for each t E T a real valued measurable function on (Q, F). So a stochastic process X is just a collection X = {X(t), t E T} of random variables. Usually T = Z or Z+ in which case we call X a discrete time random process or T = R or R+ in which case we call X a continuous time random process. Thus the word process means that we are thinking of T as representing the set of all times. A realization of X, that is the set X(t)(w) for some w E Q is called a sample path. If T is one of the above choices, then X is said to have independent increments if for all to < t1