Table of Contents

What is functional analysis? A satisfactory answer requires going back to where it all started.

Once upon a time…

“All are present; the meeting convenes,” intoned Fredholm. Intent were the gathered faces, their thoughts fixed on their students. “What do we know of their weaknesses?.”

Hilbert leaned back, torch’s light flickering across his features. “Lots of dimensions, especially when they need to find the Hessian. What if… what if we made them deal with infinitely many dimensions?”…

It was Banach who finally spoke. “David, they already know about the vector space for the polynomials.”

Hilbert smirked. “Who said anything about countably infinite?”. More silence, then glances, then grins.

It was Riesz’s voice which next broke the silence. “And we can make them do analysis in that space. And linear algebra, but not the easy parts. Of course, they’ll need to also deal with complex numbers. Sprinkle a little topology and abstract algebra on top, because… they deserve—“

“Frigyes, some of them might actually be able to do that. We need more.” After a pause, Fredholm continued: “We’ll tell them that they only need to know basic calculus.”

A Friendly Approach to Functional Analysis

I didn’t actually find the book overly hard (it took me seven days to complete, which is how long it took for my first book, Naïve Set Theory), although there were some parts I skipped due to unclear exposition. it’s actually one of my favorite books I’ve read in a while—it’s for sure my favorite since the last one. That said, I’m glad I didn’t attempt this early in my book-reading journey.

My brain won’t stop line to me

Some part of me insisted that the left-shift mapping:

(x1,x2,)(x2,x3,):(x_1, x_2,…)\mapsto (x_2, x_3,…) :\ell^{\infty }\rightarrow \ell^{\infty}

is “non-linear” because it incinerates x1x_1! But wait, brain, this totally is linear, and it’s also continuous with respect to the ambient supremum norm!

Formally, a map TT is linear when T(αx+βy)=αT(x)+βT(y)T(\alpha x + \beta y)=\alpha T(x) +\beta T(y).

Informally, linearity is about being able to split a problem into small parts which can be solved individually. It doesn’t have to “look like a line,” or something. In fact, lines1 y=mxy=mx are linear because putting in Δx\Delta x more xx gets you mΔxm\cdot \Delta x more yy!

Linearity and continuity

Two things surprised me.

First, a(n infinite-dimensional) linear function can be discontinuous. 🤨

Second, a linear function TT is continuous if and only if it is bounded; that is, there is an M>0M >0 such that x,x0:T(xx0)Mxx0\forall x,x_0: | | T (x-x_0)| | \leq M | | x-x_{0 } | |.

  • The if is easy: this is just Lipschitz continuity, which obviously implies normal continuity.
  • The other direction follows because the continuity implies that for ϵ1\epsilon≝1, we can bound how much it’s expanding the volume of some δ\delta-ball and then apply linearity.

What the hell are functional derivatives?

Derivatives tell you how quickly a function is changing in each input dimension. In single-variable calculus, the derivative of a function f:RRf:\mathbb{R}\to\mathbb{R} is a function f:RRf':\mathbb{R}\to\mathbb{R}.

In multi-variable calculus, the derivative of a function g:RnRg:\mathbb{R}^n\to\mathbb{R} is a function g:RnRng':\mathbb{R}^n\to\mathbb{R}^{n}—for a given nn-dimensional input vector, the real-valued output of gg can change differently depending on in which input dimension change occurs.

You can go even further and consider the derivative of h:RnRmh:\mathbb{R}^n\to\mathbb{R}^m, which is the function h:RnRn×mh':\mathbb{R}^n\to\mathbb{R}^{n\times m}—for a given nn-dimensional input vector, hh again can change its vector-valued output differently depending on in which input dimension change occurs.

What if we want to differentiate the following function LL, with range R\mathbb{R} and domain the set of continuous functions bounded to [a,b][a,b]C[a,b]C[a,b]:

L(f)01(f(t))2dt. L(\mathbf{f})≝\int_{0}^{1} (\mathbf{f}(t))^{2} dt.

How do you differentiate with respect to a function? I’m going to claim that:

Lf(g)=012f(t)g(t)dt. L'_{\mathbf{f}}(\mathbf{g})=\int_{0}^{1} 2\mathbf{f}(t)\mathbf{g}(t) dt.

It’s not clear why this is true, or what it even means. Here’s an intuition: at any given point, there are uncountably many partial derivatives in the function space C[a,b]C[a,b]—there are many, many “directions” in which we could “push” a function f\mathbf{f} around. Lf(g)L'_{\mathbf{f}}(\mathbf{g}) gives us the partial derivative at f\mathbf{f} with respect to g\mathbf{g}.

This concept is important because it’s what you use to prove e.g. that a line is the shortest continuous path between two points.

Below is an exchange between me and TheMajor, reproduced and slightly edited with permission:

Alex
I’m having trouble understanding functional derivatives. I’m used to thinking about derivatives as with respect to time, or with respect to variations along the input dimensions. But when I think about a derivative on function space, I’m not sure what the “time” is, even though I can think about the topology and the neighborhoods around a given function. And I know the answer is that there isn’t “time,” but I’m not sure what there is.

An interesting concept that comes to mind is thinking about a functional derivative with respect to e.g. a straight-line homotopy, where you really could say how a function is changing at every point with respect to time. But I don’t think that’s the same concept.

TheMajor
The concept is as follows:
Let’s say we have some (a priori non-linear) map LL, which takes a function as an input and gives a number as an output. i.e. it maps from a vector space XX of functions to the complex numbers C\mathbb{C}. Now fix a function fXf\in X, and a second function gXg\in X. We can then consider the 1-dimensional linear subspace f+Cg{f+λg:λC}f + \mathbb{C}g ≝ \{f + \lambda g: \lambda \in \mathbb{C}\}. The map LL on this subspace is just a normal map, and if it is differentiable at the point ff in this subspace then its derivative is called the functional derivative of LL at ff with respect to gg.
Alex
By normal map, is that something like a normal operator?
TheMajor
sorry, I didn’t mean normal in a technical context. Since the subspace I introduced is one-dimensional (as a complex vector space), and it maps to the complex numbers as well, we have good old introduction to complex analysis derivatives here. If you like you can work with reals instead of complex variables too, in which case it would be the familiar real derivative.
Alex
Wouldn’t it still output a function, gg' maybe? wait. Would the derivative with respect to λ\lambda just be gg?
TheMajor
there is no derivative with respect to λ\lambda.
Alex
ah ya. duh (my brain was still acting as if differentiation had to be from the real numbers to the real numbers, so it searched for a real / complex number in the problem formalization and found λ\lambda.)
TheMajor
let me know if this part is clear, because unfortunately its the next few steps where it gets really confusing.
Alex
Unfortunately, I don’t think it’s clear yet. So I see how this is a one-dimensional subspace,2 because it’s generated by one basis function (gg). However, I don’t see how this translates to a normal complex derivative, in particular, I don’t quite understand what the range of this function is.
TheMajor
No problem, and it’s very good that you share that it’s unclear. The range of LL is the complex numbers, LL maps from XX (our vector space of functions) to C\mathbb{C} (the complex numbers).
Alex
I guess I’m confused why we’re using that type signature if we’re taking a derivative on the whole function—but maybe that’ll be clear after I get the rest.
TheMajor
that is exactly the heart of the confusion surrounding functional derivatives, and we’ll have to get there in a few steps.we’ll start with defining functional derivatives for easy maps, i.e. the ones that take on complex values, and then work towards more complicated settings. so back to the example above; we have a vector space XX (our ‘function space’), we have a (possibly non-linear) map L:XCL: X \to \mathbb{C}. we will now introduce the derivative of LL at ff with respect to gg, with f,gXf,g\in X. This derivative is just a complex number.

To find this we consider the 1-dimensional subspace f+Cgf + \mathbb{C}g that I introduced above, and we note that the map from C\mathbb{C} to this subspace, given by λf+λg\lambda\mapsto f + \lambda g, is a bijection that goes through ff at 0. this gives us a map from C\mathbb{C} to C\mathbb{C}, by sending λ\lambda to L(f+λg)L(f+\lambda g). We take the derivative of that at λ=0\lambda = 0, and that is the derivative of LL at ff with respect to gg.

Alex
Okay, that makes sense so far.
TheMajor
Nice 😃 this map has a few properties that I just want to remark and then ignore. For example it need not be linear in ff (which makes sense, since ff is only the point we’re evaluating at). And by doing some work with chain rules it does have some linear properties in gg. now there are two ways in which we can make this story complicated again, and most authors do both simultaneously.

Firstly we can try to extend the “derivative of LL at ff with respect to gg” to something like “derivative of LL at ff.” We’ll do this first. Secondly we can try to take a different map, say MM, which maps from XX into another vector space YY (instead of the complex numbers). We can then try and define a derivative of MM at ff with respect to gg.

The first step is conceptually simple, but formally and computationally very difficult. Given a point fXf\in X and our map LL from before, we can simply say that “the derivative of LL at ff” is the map that sends gXg \in X to “the derivative of LL at ff with respect to gg.” So “the derivative of LL at ff” is a map from XX to C\mathbb{C}.

this is formally difficult because usually you want this derivative to have some nice properties, but because it was defined pointwise it’s very difficult to establish this! Frequently these derivatives are not continuous, and mathematicians resort to horrible tricks (like throwing out a bunch of points of the domain X on which our derivative is annoying) to recover some structure here.

Alex
So, given some arbitrary function L:XCL : X \to\mathbb{C} which is “differentiable” at ff, we define a function Lf:gL'_{f}: g \mapsto (derivative of LL at ff with respect to gg)?
TheMajor
yes, exactly.
Alex
You could even maybe think of each input gg as projecting the derivative of LL at ff? Or specifying one of many possible directions.
TheMajor

Yes, this is 100% correct. This is related to the “nice linear properties in gg” that I mentioned above

I also stated that this is computationally difficult. This is actually quite funny—the best way to find “The derivative of LL at ff” is to take a ‘test function’ gXg \in X (arbitrarily), compute (the derivative of LL at ff with respect to gg), and then tahdah, you have now found the map that sends gg to (the derivative of LL at ff with respect to gg), i.e. exactly what you were looking for.

Alex
this sounds pretty computationally easy? Or are you calculating LL' for a general test function gg, in which case, how do you get any nontrivial information out of that?
TheMajor
Yes, you need to calculate it for a general test function. also something that may help with gaining insight: in multivariable calculus (lets say 2 dimensions, that’s already plenty difficult) there is a clear divide between the [existence of a partial derivative of a function at a point] and [the function being differentiable at that point].
Alex
yeah, because LL' has to exist for… all gg? That seems a little tough.
Edited after posting

Back in my Topology review, I discussed a similar phenomenon: continuity in multiple input dimensions requires not just continuity in each input variable, but in all sequences converging to the point in question: A diagram illustrating the topological definition of continuity. A function f maps a domain X to a codomain Y. In X, a sequence of points x_n converges to a point x. In Y, the corresponding sequence f(x_n) converges to the point f(x).

Continuity in the variables says that paths along the axes converge in the right way. But for continuity overall, we need all paths to converge in the right way. Directional continuity when the domain is R\mathbb{R} is a special case of this: continuity from below and from above if and only if continuity for all sequences converging topologically to xx.

Similarly, for a function to be differentiable, the existence of all of its partial derivatives isn’t enough—you need derivatives for every possible approach to the point in question. Here, the existence of all of the partials automatically guarantees the derivatives for every possible approach, because there’s a partial for every function.

here we have the same, except we have (in an infinite-dimenional function space X) infinitely many “partial derivatives.” so from that point of view it’s not that surprising that a function “having a derivative at ff” is actually quite rare / complicated.

TheMajor
It exists for all gg, and then LfL'_f exists as a formal map. But usually you want something stronger, for example that Lf:XCL'_f: X\to \mathbb{C} is continuous.

as an important but relatively trivial aside: if LL is a linear map, then LfL'_f does not actually depend on ff. So usually it is just called “the derivative of LL” instead of “the derivative of LL at ff.” This is confusing, because for non-linear LL there is also something called “the derivative of LL,” namely “the map that sends ff to [the derivative of LL at ff].”

Alex
hm. That’s because of the definition of linearity, right? it’s a homomorphism for both the operations of addition and scalar multiplication… Wait, I intuitively understand why linearity means it’s the same everywhere, but I’m having trouble coming up with the formal justification…
TheMajor
Yes, the point is that when we look at the definition of “derivative of LL at ff with respect to gg” that is given by limλ0L(f+λg)L(f)λ\lim_{\lambda\to 0}\frac{ L(f + \lambda g) - L(f)}{\lambda}
Alex
ah, got it!
TheMajor
ok, so this was all the first way to make it confusing again. Ready for the second?
Alex
I’m ready to be reconfused.
TheMajor
Ok, so now let’s pick a range not inside the complex numbers C\mathbb{C}, but inside a second normed vector space YY. So we have a map M:XYM: X\to Y, not necessarily linear. Again fix points f,gXf, g\in X. We are going to define the derivative of MM at ff with respect to gg. so we repeat our trick from before, consider the map from C\mathbb{C} via XX to YY given by λM(f+λg)\lambda\mapsto M(f + \lambda g). We wish to differentiate it at λ=0\lambda = 0.

unfortunately, its image is now in YY, not in C\mathbb{C}, so we don’t really know what the derivative means. But because YY is a normed vector space, the expression M(f+λg)M(f)λ\frac{M(f + \lambda g) - M(f)}{\lambda} makes sense for all non-zero λ\lambda.

if this function can be continuously extended to λ=0\lambda = 0 then we define its image at 0 as the derivative of MM at ff with respect to gg. Note that this notion of continuity has to do with the norm of YY.

this is now a vector in YY, so if this works we have: [the derivative of MM at ff with respect to gg] which is an element of YY, [the derivative of MM at ff] which is a (linear! usually horrible and not continous!) map from XX to YY.

btw if the “continuously extending” part is new, you can also just think of it as the limit of that fraction as λ\lambda approaches 0. The only point is that (as long as we’re working with complex vector spaces) there are a lot of different ways for λ\lambda to approach 0, and it has to work for all of them.

if we’re working over the reals its simply the notion of “right limit” and “left limit” (the only two ways to approach 0 in R\mathbb{R}) that you may have seen before, except that the convergence is now happening in YY.

Other notes

  • The operator norm is really cool.
  • Linear combinations always involve finitely many terms, but using the orthonormal basis of an infinite dimensional space, you can take the limit as nn\to \infty.
  • I was really happy to see watered-down versions of symmetry / conservation law correspondences (aka Noether’s theorem). Can’t wait to learn the real version.

Final thoughts

The book is pretty nice overall, with some glaring road bumps—apparently, the Euler-Lagrange equation is one of the most important equations of all time, and Sasane barely spends any effort explaining it to the reader!

And if I didn’t have the help of TheMajor, I wouldn’t have understood the functional derivative, which, in my opinion, was the profoundly important insight I got from this book. My models of function space structure feel qualitatively improved. I can look at a Fourier transform and see what it’s doing—I can feel it, to an extent. Without a doubt, that single insight makes it all worth it.

Forward

I’m probably going to finish up an epidemiology textbook, before moving on to complex analysis, microeconomics, or… something else—who knows!

Find out when I post more content: newsletter & rss

Thoughts? Email me at alex@turntrout.com (pgp)

Footnotes

  1. Lines y=mx+by=mx+b (b0b\neq 0) aren’t actually linear functions, because they don’t go through the origin. Instead, they’re affine.

  2. To be more specific, f+Cg{f+λg:λC}f + \mathbb{C}g ≝ \{f + \lambda g: \lambda \in \mathbb{C}\} is often an affine subspace, because the zero function is not necessarily a member.