Table of Contents

It’s been too long—a month and a half since my last review, and about three months since Analysis I. I’ve been immersed in my work for chai, but reality doesn’t grade on a curve, and I want more mathematical firepower.

Analysis II

12: Metric Spaces

Metric spaces; completeness and compactness.

Proving Completeness

It sucks and I hate it.

13: Continuous Functions on Metric Spaces

Generalized continuity, and how it interacts with the considerations introduced in the previous chapter. Also, a terrible introduction to topology.

There’s a lot I wanted to say here about topology, but I don’t think my understanding is good enough to break things down—I’ll have to read an actual book on the subject.

14: Uniform Convergence

Pointwise and uniform convergence, the Weierstrass MM-test, and uniform approximation by polynomials.

Breaking Point

Suppose we have some sequence of functions f(n):[0,1]Rf^{(n)}:[0,1]\to\mathbb{R}, f(n)(x)xnf^{(n)}(x)≝x^n, which converge pointwise to the 1-indicator function f:[0,1]Rf:[0,1]\to\mathbb{R} (i.e. f(1)=1f(1)=1 and 00 otherwise). Clearly, each f(n)f^{(n)} is (infinitely) differentiable. However, the limiting function ff isn’t differentiable at all! Basically, pointwise convergence isn’t at all strong enough to stop the limit from “snapping” the continuity of its constituent functions.

Progress

As in previous posts, I mark my progression by sharing a result derived without outside help.

Already proven: 11(1x2)Ndx1N\int_{-1}^1 (1-x^2)^N \,dx \geq \frac{1}{\sqrt{N}}.

Definition: (ϵ,δ)(\epsilon, \delta)-approximation to the identity

Let ϵ>0\epsilon>0 and 0<δ<10 < \delta < 1. A function f:RRf: \mathbb{R} \to \mathbb{R} is said to be an (ϵ,δ)(\epsilon, \delta)-approximation to the identity if it obeys the following three properties:

  • ff is compactly supported on [1,1][-1,1].
  • ff is continuous, and f=1\int_{-\infty}^\infty f=1.
  • f(x)ϵ|f(x)|\leq \epsilon for all δx1\delta \leq |x| \leq 1.
Lemma

For every ϵ>0\epsilon>0 and 0<δ<10 < \delta < 1, there exists an (ϵ,δ)(\epsilon, \delta)-approximation to the identity which is a polynomial PP on [1,1][-1,1].

Proof of Exercise 14.8.2(c). Suppose cR,NNc\in\mathbb{R},N\in\mathbb{N}; define f(x)c(1x2)Nf(x)≝c(1-x^2)^N for x[1,1]x \in [-1,1] and 0 otherwise. Clearly, ff is compactly supported on [1,1][-1,1] and is continuous. We want to find c,Nc,N such that the second and third properties are satisfied. Since (1x2)N(1-x^2)^N is non-negative on [1,1][-1,1], cc must be positive, as ff must integrate to 1. Therefore, ff is non-negative.

We want to show that c(1x2)Nϵ|c(1-x^2)^N| \leq \epsilon for all δx1\delta \leq |x| \leq 1. Since ff is non-negative, we may simplify to (1x2)Nϵc(1-x^2)^N\leq \frac{\epsilon}{c}. Since the left-hand side is strictly monotone increasing on [1,δ][-1,-\delta] and strictly monotone decreasing on [δ,1][\delta,1], we substitute x=δx=\delta without loss of generality. As ϵ>0\epsilon > 0, so we may take the reciprocal and multiply by ϵ\epsilon, arriving at ϵ(1δ2)Nc\epsilon(1-\delta^2)^{-N} \geq c.

We want f=1\int_{-\infty}^\infty f = 1; as ff is compactly supported on [1,1][-1,1], this is equivalent to 11f(x)dx=1\int_{-1}^1 f(x)\, dx = 1. Using basic properties of the Riemann integral, we have 11(1x2)Ndx=1c\int_{-1}^1 (1-x^2)^N \, dx=\frac{1}{c}. Substituting in for cc,

ϵ1(1δ2)N1N11(1x2)Ndx,\begin{align*} \epsilon^{-1}(1-\delta^2)^N&\leq \frac{1}{\sqrt{N}} \leq \int_{-1}^1 (1-x^2)^N\,dx, \end{align*}

with the second inequality already having been proven earlier. Note that although the first inequality is not always true, we can make it so: since ϵ\epsilon is fixed and 1δ2(0,1)1-\delta^2 \in (0,1), the left-hand side approaches 0 more quickly than 1N\frac{1}{\sqrt{N}} does. Therefore, we can make NN as large as necessary; isolating ϵ\epsilon,

ϵ(1δ2)NNϵN>(1δ2)NN,\epsilon \geq (1-\delta^2)^N\sqrt{N}\\ \epsilon \geq \sqrt{N} >(1-\delta^2)^N\sqrt{N},

the second line being a consequence of 1>(1δ2)N1 > (1-\delta^2)^N. Then set NN to be any natural number such that this inequality is satisfied. Finally, we set c=111(1x2)Ndxc = \frac{1}{\int_{-1}^1 (1-x^2)^N \, dx}. By construction, these values of c,Nc,N satisfy the second and third properties. □

Convoluted No Longer

Those looking for an excellent explanation of convolutions, look no further!

Weierstrass Approximation Theorem

Theorem

Suppose f:[a,b]Rf : [a,b] \to \mathbb{R} is continuous and compactly supported on [a,b][a,b]. Then for every ϵ>0\epsilon > 0, there exists a polynomial PP such that Pf<ϵ.\vert\vert P - f\vert\vert_\infty < \epsilon.

In other words, any continuous, real-valued ff on a finite interval can be approximated with arbitrary precision by polynomials.

Why I’m talking about this. On one hand, this result makes sense, especially after taking machine learning and seeing how polynomials can be contorted into basically whatever shape you want.

On the other hand, I find this theorem intensely beautiful. P[a,b]=C[a,b]\overline{P[a,b]}=C[a,b]’s proof was slowly constructed, much to the reader’s benefit. I remember the exact moment the proof sketch came to me, newly installed gears whirring happily.

15: Power Series

Real analytic functions, Abel’s theorem, exp\exp and log\log, complex numbers, and trigonometric functions.

EXP

Cached thought from my CS undergrad: Exponential functions always end up growing more quickly than polynomials, no matter the degree. Now, I finally have the gears to see why:

exp(x)k=0xkk!.\exp(x)≝\sum_{k=0}^\infty \frac{x^k}{k!}.

exp\exp has all the degrees, so no polynomial (of necessarily finite degree) could ever hope to compete! This also suggests why ddxex=ex\frac{d}{dx}e^x=e^x.

Complex Exponentiation

The book
You can multiply a number by itself some number of times.
Me
🙂‍↕️
The book
You can multiply a number by itself a negative number of times.
Me
Sure.
The book
You can multiply a number by itself an irrational number of times.
Me
… OK, I understand limits.
The book
You can multiply a number by itself an imaginary number of times.
Me
😠 Out. Now.

Seriously, this one’s weird (rather, it seems weird, but how can “how the world is” be “weird”)?

Suppose we have some cCc \in\mathbb{C}, where c=a+bic=a+bi. Then ec=eaebie^c=e^{a}e^{bi}, so “all” we need to figure out is how to take an imaginary exponent. Brian Slesinsky has us covered.

Years before becoming involved with the rationalist community, Nate asks this question, and Qiaochu answers.

This isn’t a coincidence, because nothing is ever a coincidence.

Or maybe it is a coincidence, because Qiaochu answered every question on StackExchange.

16: Fourier Series

Periodic functions, trigonometric polynomials, periodic convolutions, and the Fourier theorem.

17: Several Variable Differential Calculus

A beautiful unification of Linear Algebra and calculus: linear maps as derivatives of multivariate functions, partial and directional derivatives, Clairaut’s theorem, contractions and fixed points, and the inverse and implicit function theorems.

Implicit Function Theorem

If you have a set of points in Rn\mathbb{R}^n, when do you know if it’s secretly a function g:Rn1Rg:\mathbb{R}^{n-1} \to \mathbb{R}? For functions RR\mathbb{R}\to\mathbb{R}, we can just use the geometric “vertical line test” to figure this out, but that’s a bit harder when you only have an algebraic definition. Also, sometimes we can implicitly define a function locally by restricting its domain (even if no explicit form exists for the whole set).

The implicit function theorem

Let EE be an open subset of Rn\mathbb{R}^n, let f:ERf:E \to \mathbb{R} be continuously differentiable, and let y=(y1,,yn)y=(y_1,\dots,y_n) be a point in EE such that f(y)=0f(y)=0 and fxn0\frac{\partial f}{\partial x_n}\neq0. Then there exists an open URn1U \subseteq \mathbb{R}^{n-1} containing (y1,,yn1)(y_1, \dots, y_{n-1}), an open VEV \subseteq E containing yy, and a function g:URg: U \to \mathbb{R} such that g(y1,,yn1)=yng(y_1, \dots, y_{n-1})=y_n, and

{(x1,,xn)V:f(x1,,xn)=0}={(x1,,xn1,g(x1,,xn1)):(x1,,xn1)U}.\begin{align*} &\left\{(x_1, \dots, x_n)\in V: f(x_1, \dots,x_n)=0\right\}\\ &=\left\{(x_1, \dots, x_{n-1}, g(x_1,\dots, x_{n-1})): (x_1, \dots, x_{n-1})\in U\right\}. \end{align*}

So, I think what’s really going on here is that we’re using the derivative at this known zero to locally linearize the manifold we’re operating on (similar to Newton’s approximation), which lets us have some neighborhood UU in which we can derive an implicit function, even if we can’t always write it out.

18: Lebesgue Measure

Outer measure; measurable sets and functions.

Tao lists desiderata for an ideal measure before deriving it. Imagine that.

19: Lebesgue Integration

Building up the Lebesgue integral, culminating with Fubini’s theorem.

Conceptual Rotation

Suppose ΩRn\Omega \subseteq \mathbb{R}^n is measurable, and let f:Ω[0,]f:\Omega \to [0,\infty] be a measurable, non-negative function. The Lebesgue integral of ff is then defined as:

Ωfsup{Ωs:s is simple and non-negative, and minorizes f}.\int_\Omega f ≝ \sup\left\{\int_\Omega s: s \text{ is simple and non-negative, and minorizes }f\right\}.

This hews closely to how we defined the lower Riemann integral in Chapter 11; however, we don’t need the equivalent of the upper Riemann integral for the Lebesgue integral.

To see why, let’s review why Riemann integrability demands the equality of the lower and upper Riemann integrals of a function gg. Suppose that we integrate over [0,1][0,1], and that gg is the indicator function for the rationals. As the rationals are dense in the reals, any interval [a,b][0,1][a,b]\subseteq[0,1] (b>ab>a) contains rational numbers, no matter how much the interval shrinks! Therefore, the upper Riemann integral equals 1, while the lower equals 0 (for similar reasons). gg is Lebesgue integrable; since it’s 0 almost everywhere (as the rationals have 0 measure), its integral is 0.

Lebesgue integration marks a fundamental shift in how we integrate. With the Riemann integral, we consider the limsup\lim \sup and liminf\lim \inf of increasingly refined upper and lower Riemann sums (the length approach). In Lebesgue integration, however, we consider which EΩE\subseteq \Omega is responsible for each value yy in the range (i.e. f1(y)=Ef^{-1}(y)=E), multiplying yy by the measure of EE (inversion).

In a sense, the Lebesgue integral more cleanly strikes at the heart of what it means to integrate. Surely, Riemann integration was not far from the mark; however, if you rotate the problem slightly in your mind, you will find a better, cleaner way of structuring your thinking.

Final Thoughts

Although Tao botches a few exercises and the section on topology, I’m a big fan of Analysis I and II. Do note, however, that II is far more difficult than I (not just in content, but in terms of the exercises). He generally provides relevant, appropriately difficult problems, and is quite adept at helping the reader develop rigorous and intuitive understanding of the material.

Tips

  • To avoid getting hung up in Chapter 17, this book should be read after a linear algebra text.
  • Don’t do exercise 17.6.3—it’s wrong.
  • Deep understanding comes from sweating it out. Don’t hide, don’t wave away bothersome details—stay and explore. If you follow my strategy of quickly generating outlines—can you formally and precisely write out each step?

Verification

I completed every exercise in this book; in the second half, I started avoiding looking at the hints provided by problems until I’d already thought for a few minutes. Often, I’d solve the problem and then turn to the hint: “be careful when doing X—don’t forget edge case Y; hint: use lemma Z”! A pit would form in my stomach as I prepared to locate my mistake and back-propagate where-I-should-have-looked, before realizing that I’d already taken care of that edge case using that lemma.

Why Bother?

One can argue that my time would be better spent picking up things as I work on problems in alignment. However, while I’ve made, uh, quite a bit of progress with impact measures this way, concept-shaped holes are impossible to notice. If there’s some helpful information-theoretic way of viewing a problem that I’d only realize if I had already taken information theory, I’m out of luck.

Also, developing mathematical maturity brings with it a more rigorous thought process.

Fairness

There’s a sense I get where even though I’ve made immense progress over the past few months, it still might not be enough. The standard isn’t “am I doing impressive things for my reference class?”, but rather the stricter “am I good enough to solve serious problems that might not get solved in time otherwise?”. This standard is hard to meet, and even given my textbook and research progress (including the upcoming posts), I don’t think I meet it.

In a way, this excites me. I welcome any advice for buckling down further and becoming yet stronger.

Thanks

Thank you to everyone who has helped me. In particular, TheMajor has been incredibly generous with their explanations and encouragement.

Find out when I post more content: newsletter & rss

Thoughts? Email me at alex@turntrout.com (pgp)