Some Notes on Metric Spaces

This post contains some summary informal notes of key ideas from my reading of Mícheál Ó Searcóid’s Metric Spaces (Springer, 2007). These notes are here as a reference for me, my students, and any others who may be interested. They are by no means exhaustive, but rather cover topics that seemed interesting to me on first reading. By way of a brief book review, it’s worth noting that Ó Searcóid’s approach is excellent for learning a subject. He has a few useful tricks up his sleeve, in particular:

  • Chapters will often start with a theorem proving equivalence of various statements (e.g. Theorem 8.1.1, Criteria for Continuity at a Point). Only then will he choose one of these statements as a definition, and he explains this choice carefully, often via reference to other mathematics.
  • The usual definition-theorem-proof style is supplemented with ‘question’ – these are relatively informally-stated questions and their answers. They have been carefully chosen to highlight some questions the reader might be wondering about at that point in the text and to demonstrate key (and sometimes surprising) answers before the formal theorem statement.
  • The writing is pleasant, even playful at times though never lacking formality. This is a neat trick to pull off.
  • There are plenty of exercises, and solutions are provided.

These features combine to produce an excellent learning experience.

1. Some Basic Definitions

A metric on a set X is a function d : X \times X \to {\mathbb R} such that:

  • Positivity: d(a,b) \geq 0 with equality iff a = b
  • Symmetry: d(a,b) = d(b,a)
  • Triangle inequality: d(a,b) \leq d(a,c) + d(c,b)

The combination of such a metric and a the corresponding set is a metric space.

Given a metric space (X,d), the point function at z is \delta_z : x \mapsto d(z,x).

A pointlike function u : X \to {\mathbb R}^\oplus is one where u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)

For metric spaces (X,d) and (Y,e), X is a metric subspace of Y iff X \subseteq Y and d is a restriction of e.

For metric spaces (X,d) and (Y,e), an isometry \phi : X \to Y is a function such that e(\phi(a),\phi(b)) = d(a,b). The metric subspace (\phi(X),e) is an isometric copy of (X,d).

Some standard constructions of metrics for product spaces:

  1. \mu_1 : (a,b) \mapsto \sum_{i=1}^n \tau_i(a_i,b_i)
  2. \mu_2 : (a,b) \mapsto \sqrt{\sum_{i=1}^n \left(\tau_i(a_i,b_i)\right)^2}
  3. \mu_\infty : (a,b) \mapsto \max\left\{\tau_i(a_i,b_i) | i \in {\mathbb N}\right\}

A conserving metric e on a product space is one where \mu_\infty(a,b) \leq e(a,b) \leq \mu_1(a,b). Ó Searcóid calls these conserving metrics because they conserve an isometric copy of the individual spaces, recoverable by projection (I don’t think this is a commonly used term). This can be seen because fixing elements of all-but-one of the constituent spaces makes the upper and lower bound coincide, resulting in recovery of the original metric.

A norm on a linear space V over {\mathbb R} or {\mathbb C} is a real function such that for x, y \in V and \alpha scalar:

  • ||x|| \geq 0 with equality iff x = 0
  • ||\alpha x|| = |\alpha|\; ||x||
  • ||x + y|| \leq ||x|| + ||y||

The metric defined by the norm is d(a,b) = ||a - b||.

2. Distances

The diameter of a set A \subseteq X of metric space (X,d) is \text{diam}(A) = \sup\{d(r,s) | r, s \in A\}.

The distance of a point x \in X from a set A \subseteq X is \text{dist}(x, A) = \inf\{ d(x,a) | a \in A\}.

An isolated point z \in S where S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) \neq 0.

An accumulation point or limit point z \in X of S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) = 0. Note that z doesn’t need to be in S. A good example is z = 0, X = {\mathbb R}, S = \{1/n | n \in {\mathbb N}\}.

The distance from subset A to subset B of a metric space is defined as \text{dist}(A,B) = \inf\{ d(a,b) | a \in A, b \in B\}.

A nearest point s \in S of S to z \in X is one for which d(z,s) = \text{dist}(z,S). Note that nearest points don’t need to exist, because \text{dist} is defined via the infimum. If a metric space is empty or admits a nearest point to each point in every metric superspace, it is said to have the nearest-point property.

3. Boundaries

A point a is a boundary point of S in X iff \text{dist}(a,S) = \text{dist}(a,S^c) = 0. The collection of these points is the boundary \partial S.

Metric spaces with no proper non-trivial subset with empty boundary are connected. An example of a disconnected metric space is X = (0,1) \cup (7,8) as a metric subspace of {\mathbb R}, while {\mathbb R} itself is certainly connected.

Closed sets are those that contain their boundary.

The closure of S in X is \bar{S} \triangleq X \cup \partial S. The interior is S \setminus \partial S. The exterior is (\bar{S})^c.

Interior, boundary, and exterior are mutually disjoint and their union is X.

4. Sub- and super-spaces

A subset S \subseteq X is dense in X iff \bar{S} = X, or equivalently if for every x \in X, \text{dist}(x,S) = 0. The archetypal example is that \mathbb{Q} is dense in \mathbb{R}.

A complete metric space X is one that is closed in every metric superspace of X. An example is \mathbb{R}.

5. Balls

Let b[a;r) = \{ x \in X | d(a,x) < r \} denote an open ball and similarly b[a;r] = \{ x \in X | d(a,x) \leq r \} denote a closed ball. In the special case of normed linear spaces, b[a;r) = a + rb[0;1) and similarly for closed balls, so the important object is this unit ball – all others have the same shape. A norm on a space V is actually defined by three properties such balls U must have:

  • Convexity
  • Balanced (i.e. x \in U \Rightarrow -x \in U)
  • For each x \in V \setminus \{0\}, the set \{ t \in \mathbb{R}^+ | t x \in U \},
    • is nonempty
    • must have real supremum s
    • sx \notin U

6. Convergence

The mth tail of a sequence x = (x_n) is the set \mbox{tail}_m(x) = \{x_m | n \in {\mathbb N}, n \geq m \}.

Suppose X is a metric space, z \in X and x= (x_n) is a sequence in X. Sequence x converges to z in X, denoted x_n \to z iff every open subset of X that contains z includes a tail of x. In this situation, z is unique and is called the limit of the sequence, denoted \mbox{lim }x_n.

It follows that for (X,d) a metric space, z \in X and (x_n) a sequence in X, the sequence (x_n) converges to z in X iff the real sequence (d(x_n,z))_{n \in \mathbb{N}} converges to 0 in {\mathbb R}.

For real sequences, we can define the:

  • limit superior, \mbox{lim sup } x_n = \mbox{inf } \{ \mbox{sup } \mbox{tail}_n(x) | n \in \mathbb{N} \} and
  • limit inferior, \mbox{lim inf } x_n = \mbox{sup } \{ \mbox{inf } \mbox{tail}_n(x) | n \in \mathbb{N} \}.

It can be shown that x_n \to z iff \mbox{lim sup } x_n = \mbox{lim inf } x_n = z.

Clearly sequences in superspaces converge to the same limit – the same is true in subspaces if the limit point is in the subspace itself. Sequences in finite product spaces equipped with product metrics converge in the product space iff their projections onto the individual spaces converge.

Every subsequence of a convergent sequence converges to the same limit as the parent sequence, but the picture for non-convergent parent sequences is more complicated, as we can still have convergent subsequences. There are various equivalent ways of characterising these limits of subsequences, e.g. centres of balls containing an infinite number of terms of the parent sequence.

A sequence (x_n) is Cauchy iff for every r \in \mathbb{R}^+, there is a ball of radius r that includes a tail of (x_n). Every convergent sequence is Cauchy. The converse is not true, but only if the what should be the limit point is missing from the space — adding this point and extending the metric appropriately yields a convergent sequence. It can be shown that a space is complete (see above for definition) iff every Cauchy sequence is also a convergent sequence in that space.

7. Bounds

A subset S of a metric space X is a bounded subset iff S = X = \emptyset or S is included in some ball of X. A metric space X is bounded iff it is a bounded subset of itself. An alternative characterisation of a bounded subset S is that it has finite diameter.

The Hausdorff metric is defined on the set S(X) of all non-empty closed bounded subsets of a set X equipped with metric d. It is given by h(A,B) = \max \{ \sup\{ \mbox{dist}(b, A) | b \in B\}, \sup\{ \mbox{dist}(a, B) | a \in A\} \}.

Given a set X and a metric space Y, f : X \to Y is a bounded function iff f(X) is a bounded subset of Y. The set of bounded functions from X to Y is denoted B(X,Y). There is a standard metric on bounded functions, s(f,g) = \sup \{ e(f(x),g(x)) | x \in X \} where e is the metric on Y.

Let X be a nonempty set and Y be a nonempty metric space. Let (f_n) be a sequence of functions from X to Y and g: X \to Y. Then:

  • (f_n) converges pointwise to g iff (f_n(z)) converges to g(z) for all z \in X
  • (f_n) converges uniformly to g iff \sup\{ e(f_n(x),g(x)) | x \in X \} is real for each n \in {\mathbb N} and the sequence ( \sup\{ e(f_n(x),g(x) | x \in X \})_{n \in {\mathbb N}} converges to zero in {\mathbb R}.

It’s interesting to look at these two different notions of convergence because the second is stronger. Every uniformly-convergent sequence of functions converges pointwise, but the converse is not true. An example is the sequence f_n : \mathbb{R}^+ \to \mathbb{R} given by f_n(x) = 1/nx. This converges pointwise but not uniformly to the zero function.

A stronger notion than boundedness is total boundedness. A subset S of a metric space X is totally bounded iff for each r \in {\mathbb R}^+, there is a finite collection of balls of X of radius r that covers S. An example of a bounded but not totally bounded subset is any infinite subset of a space with the discrete metric. Total boundedness carries over to subspaces and finite unions.

Conserving metrics play an important role in bounds, allowing bounds on product spaces to be equivalent to bounds on the projections to the individual spaces. This goes for both boundedness and total boundedness.

8. Continuity

Given metric spaces X and Y, a point z \in X and a function f: X \to Y, the function is said to be continuous at z iff for each open subset V \subseteq Y with f(z) \in V, there exists and open subset U of X with z \in U such that f(U) \subseteq V.

Extending from points to the whole domain, the function is said to be continuous on X iff for each open subset V \subseteq Y, f^{-1}(V) is open in X.

Continuity is not determined by the codomain, in the sense that a continuous function is continuous on any metric superspace of its range. It is preserved by function composition and by restriction.

Continuity plays well with product spaces, in the sense that if the product space is endowed with a product metric, a function mapping into the product space is continuous iff its compositions with the natural projections are all continuous.

For (X,d) and (Y,e) metric spaces, \mathcal{C}(X,Y) denotes the metric space of continuous bounded functions from X to Y with the supremum metric (f,g) \mapsto \sup\{ e(g(x),f(x)) | x \in X \}. \mathcal{C}(X,Y) is closed in the space of bounded functions from X to Y.

Nicely, we can talk about convergence using the language of continuity. In particular, let X be a metric space, and \tilde{\mathbb{N}} = \mathbb{N} \cup \{ \infty \}. Endow \tilde{\mathbb{N}} with the inverse metric (a,b) \mapsto |a^{-1} - b^{-1} | for a,b \in {\mathbb N}, (n,\infty) \mapsto n^{-1} and (\infty, \infty) \mapsto 0. Let \tilde{x} : \tilde{\mathbb{N}} \to X. Then \tilde{x} is continuous iff the sequence (x_n) converges in X to x_{\infty}. In particular, the function extending each convergent sequence with its limit is an isometry from the space of convergent sequences in X to the metric space of continuous bounded functions from \tilde{\mathbb{N}} to X.

9. Uniform Continuity

Here we explore increasing strengths of continuity: Lipschitz continuity > uniform continuity > continuity. Ó Searcóid also adds strong contractions into this hierarchy, as the strongest class studied.

Uniform continuity requires the \delta in the epsilon-delta definition of continuity to extend across a whole set. Consider metric spaces (X,d) and (Y,e), a function f : X \to Y, and a metric subspace S \subseteq X. The function f is uniformly continuous on S iff for every \epsilon \in \mathbb{R}^+ there exists a \delta \in \mathbb{R}^+ s.t. for every x, z \in S for which d(z,x) < \delta, it holds that e( f(z), f(x) ) < \epsilon.

If (X,d) is a metric space with the nearest-point property and f is continuous, then f is also uniformly continuous on every bounded subset of X. A good example might be a polynomial on \mathbb{R}.

Uniformly continuous functions map compact metric spaces into compact metric spaces. They preserve total boundedness and Cauchy sequences. This isn’t necessarily true for continuous functions, e.g. x \mapsto 1/x on (0,1] does not preserve the Cauchy property of the sequence (1/n).

There is a remarkable relationship between the Cantor Set and uniform continuity. Consider a nonempty metric space (X,d). Then X is totally bounded iff there exists a bijective uniformly continuous function from a subset of the Cantor Set to X. As Ó Searcóid notes, this means that totally bounded metric spaces are quite small, in the sense that none can have cardinality greater than that of the reals.

Consider metric spaces (X,d) and (Y,e) and function f: X \to Y. The function is called Lipschitz with Lipschitz constant k \in \mathbb{R}^+ iff e( f(a), f(b) ) \leq k d(a,b) for all a, b \in X.

Note here the difference to uniform continuity: Lipschitz continuity restricts uniform continuity by describing a relationship that must exist between the \epsilons and \deltas – uniform leaves this open. A nice example from Ó Searcóid of a uniformly continuous non-Lipschitz function is x \mapsto \sqrt{1 - x^2} on [0,1).

Lipschitz functions preserve boundedness, and the Lipschitz property is preserved by function composition.

There is a relationship between Lipschitz functions on the reals and their differentials. Let I be a non-degenerate intervals of \mathbb{R} and f: I \to \mathbb{R}. Then f is Lipschitz on I iff f' is bounded on I.

A function with Lipschitz constant less than one is called a strong contraction.

Unlike the case for continuity, not every product metric gives rise to uniformly continuous natural projections, but this does hold for conserving metrics.

10. Completeness

Let (X,d) be a metric space and u : X \to \mathbb{R}. The function u is called a virtual point iff:

  • u(a) - u(b) \leq d(a,b) \leq u(a) + u(b) for all a,b \in X
  • \text{inf} \; u(X) = 0
  • 0 \notin u(X)

We saw earlier that a metric space X is complete iff it is closed in every metric superspace of X. There are a number of equivalent characterisations, including that every Cauchy sequence in X converses in X.

Consider a metric space (X,d). A subset of S \subseteq X is a complete subset of X iff (S,d) is a complete metric space.

If X is a complete metric space and S \subseteq X, then S is complete iff S is closed in X.

Conserving metrics ensure that finite products of complete metric spaces are complete.

A non-empty metric space (X,d) is complete iff (\mathcal{S},h) is complete, where \mathcal{S}(X) denotes the collection of all non-empty closed bounded subsets of X and h denotes the Hausdorff metric.

For X a non-empty set and (Y,e) a metric space, the metric space B(X,Y) of bounded functions from X to Y with the supremum metric is a complete metric space iff Y is complete. An example is that the space of bounded sequences in \mathbb{R} is complete due to completeness of \mathbb{R}.

We can extend uniformly continuous functions from dense subsets to complete spaces to unique uniformly continuous functions from the whole: Consider metric spaces (X,d) and (Y,e) with the latter being complete. Let S \subseteq X be a dense subset of X and f : S \to Y be a uniformly continuous function. Then there exists a uniformly continuous function \tilde{f} : X \to Y such that \tilde{f}|_S = f. There are no other continuous extensions of f to X.

(Banach’s Fixed-Point Theorem). Let (X,d) be a non-empty complete metric space and f : X \to X be a strong contraction on X with Lipschitz constant k \in (0,1). Then f has a unique fixed point in X and, for each w \in X, the sequence (f^n(w)) converges to the fixed point. Beautiful examples of this abound, of course. Ó Searcóid discusses IFS fractals – computer scientists will be familiar with applications in the semantics of programming languages.

A metric space (Y,e) is called a completion of metric space (X,d) iff (Y,e) is complete and (X,d) is isometric to a dense subspace of (Y,e).

We can complete any metric space. Let (X,d) be a metric space. Define \tilde{X} = \delta(X) \cup \text{vp}(X) where \delta(X) denotes the set of all point functions in X and \text{vp}(X) denotes the set of all virtual points in X. We can endow \tilde{X} with the metric s given by (u,v) \mapsto \sup\{ |u(x) - v(x)| | x \in X \}. Then \tilde{X} is a completion of X.

Here the subspace (\delta(X),s) of (\tilde{X},s) forms the subspace isometric to (X,d).

11. Connectedness

A metric space X is a connected metric space iff X cannot be expressed as the union of two disjoint nonempty open subsets of itself. An example is \mathbb{R} with its usual metric. As usual, Ó Searcóid gives a number of equivalent criteria:

  • Every proper nonempty subset of X has nonempty boundary in X
  • No proper nonempty subset of X is both open and closed in X
  • X is not the union of two disjoint nonempty closed subsets of itself
  • Either X = \emptyset or the only continuous functions from X to the discrete space \{0,1\} are the two constant functions

Connectedness is not a property that is relative to any metric superspace. In particular, if X is a metric space, Z is a metric subspace of X and S \subseteq Z, then the subspace S of Z is a connected metric space iff the subspace S of X is a connected metric space. Moreover, for a connected subspace X of X with S \subseteq A \subseteq \bar{S}, the subspace A is connected. In particular, \bar{S} itself is connected.

Every continuous image of a connected metric space is connected. In particular, for nonempty S \subseteq \mathbb{R}, S is connected iff S is an interval. This is a generalisation of the Intermediate Value Theorem (to see this, consider the continuous functions f : X \to \mathbb{R}).

Finite products of connected subsets endowed with a product metric are connected. Unions of chained collections (i.e. sequences of subsets whose sequence neighbours are non-disjoint) of connected subsets are themselves connected.

A connected component U of a metric space X is a subset that is connected and which has no proper superset that is also connected – a kind of maximal connected subset. It turns out that the connected components of a metric space X are mutually disjoint, all closed in X, and X is the union of its connected components.

A path in metric space X is a continuous function f : [0, 1] \to X. (These functions turn out to be uniformly continuous.) This definition allows us to consider a stronger notion of connectedness: a metric space X is pathwise connected iff for each a, b \in X there is a path in X with endpoints a and b. An example given by Ó Searcóid of a space that is connected but not pathwise connected is the closure in \mathbb{R}^2 of \Gamma = \{ (x, \sin (1/x) | x \in \mathbb{R}^+ \}. From one of the results above, \bar{\Gamma} is connected because \Gamma is connected. But there is no path from, say, (0,0) (which nevertheless is in \bar{\Gamma}) to any point in \Gamma.

Every continuous image of a pathwise connected metric space is itself pathwise connected.

For a linear space, an even stronger notion of connectedness is polygonal connectedness. For a linear space X with subset S and a, b \in S, a polygonal connection from a to b in X is an n-tuple of points (c_1, \ldots c_n) s.t. c_1 = a, c_n = b and for each i \in \{1, 2, \ldots, n-1\}, \{(1 - t)c_i + t c_{i+1} | t \in [0,1] \} \subseteq S. We then say a space is polygonally connected iff there exists a polygonal connection between every two points in the space. Ó Searcóid gives the example of \{ z \in \mathbb{C} | \; |z|= 1 \} as a pathwise connected but not polygonally connected subset of \mathbb{C}.

Although in general these three notions of connectedness are distinct, they coincide for open connected subsets of normed linear spaces.

12. Compactness

Ó Searcóid gives a number of equivalent characterisations of compact non-empty metric spaces X, some of the ones I found most interesting and useful for the following material include:

  • Every open cover for X has a finite subcover
  • X is complete and totally bounded
  • X is a continuous image of the Cantor set
  • Every real continuous function defined on X is bounded and attains its bounds

The example is given of closed bounded intervals of \mathbb{R} as archetypal compact sets. An interesting observation is given that ‘most’ metric spaces cannot be extended to compact metric spaces, simply because there aren’t many compact metric spaces — as noted above in the section on bounds, there are certainly no more than |\mathbb{R}|, given they’re all images of the Cantor set.

If X is a compact metric space and S \subseteq X then S is compact iff S is closed in X. This follows because S inherits total boundedness from X, and completeness follows also if S is closed.

The Inverse Function Theorem states that for X and Y metric spaces with X compact, and for f : X \to Y injective and continuous, f^{-1}: f(X) \to X is uniformly continuous.

Compactness plays well with intersections, finite unions, and finite products endowed with a product metric. The latter is interesting, given that we noted above that for non conserving product metrics, total boundedness doesn’t necessarily carry forward.

Things get trickier when dealing with infinite-dimension spaces. The following statement of the Arzelà-Ascoli Theorem is given, which allows us to characterise the compactness of a closed, bounded subset of \mathcal{C}(X,Y) for compact metric spaces X and Y:

For each x \in X, define \hat{x}: S \to Y by \hat{x}(f) = f(x) for each f \in S. Let \hat{X} = \{\hat{x} | x \in X \}. Then:

  • \hat{X} \subseteq B(S,Y) and
  • S is compact iff x \to \hat{x} from X to B(S,Y) is continuous

13. Equivalence

Consider a set X and the various metrics we can equip it with. We can define a partial order \succeq on these metrics in the following way. d is topologically stronger than e, d \succeq e iff every open subset of (X,e) is open in (X,d). We then get an induced notion of topological equivalence of two metrics, when d \succeq e and e \succeq d.

As well as obviously admitting the same open subsets, topologically equivalent metrics admit the same closed subsets, dense subsets, compact subsets, connected subsets, convergent sequences, limits, and continuous functions to/from that set.

It turns out that two metrics are topologically equivalent iff the identity functions from (X,d) to (X,e) and vice versa are both continuous. Following the discussion above relating to continuity, this hints at potentially stronger notions of comparability – and hence of equivalence – of metrics, which indeed exist. In particular d is uniformly stronger than e iff the identify function from (X,d) to (X,e) is uniformly continuous. Also, d is Lipschitz stronger than e iff the identity function from (X,d) to (X,e) is Lipschitz.

The stronger notion of a uniformly equivalent metric is important because these metrics additionally admit the same Cauchy sequences, totally bounded subsets and complete subsets.

Lipschitz equivalence is even stronger, additionally providing the same bounded subsets and subsets with the nearest-point property.

The various notions of equivalence discussed here collapse to a single one when dealing with norms. For a linear space X, two norms on X are topologically equivalent iff they are Lipschitz equivalent, so we can just refer to norms as being equivalent. All norms on finite-dimensional linear spaces are equivalent.

Finally, some notes on the more general idea of equivalent metric spaces (rather than equivalent metrics.) Again, these are provided in three flavours:

  • topologically equivalent metric spaces (X,d) and (Y,e) are those for which there exists a continuous bijection with continuous inverse (a homeomorphism) from X to Y.
  • for uniformly equivalent metric spaces, we strengthen the requirement to uniform continuity
  • for Lipschitz equivalent metric spaces, we strengthen the requirement to Lipschitz continuity
  • strongest of all, isometries are discussed above

Note that given the definitions above, the metric space (X,d) is equivalent to the metric space (X,e) if d and e are equivalent, but the converse is not necessarily true. For equivalent metric spaces, we require existence of a function — for equivalent metrics this is required to be the identity.

Book Review: Category Theory for the Sciences

I have been reading Spivak’s book ‘Category Theory for the Sciences’ on-and-off for the last 18 months during vacations as some of my ‘holiday reading.’ I’ve wanted to learn something about category theory for a while, mainly because I feel it’s something that looks fun and different enough to my day-to-day activity not to feel like work. I found Spivak’s book on one of my rather-too-regular bookshop trips a couple of years ago, and since it seemed like a more applied introduction, I thought I’d give it a try. This blog post combines my views on the strengths of this book from the perspective of a mathematically-minded engineer, together with a set of notes on the topic which might be useful for me – and maybe other readers – to refer back to in the future.

Screenshot 2019-08-01 at 07.40.30.png
Category Theory by the Pool, Summer 2019

The book has a well-defined structure. The first few chapters cover topics that should be familiar to applied mathematicians and engineers: sets, monoids, graphs, orders, etc., but often presented in an unusual way. This sets us up for the middle section of the book which presents basic category theory, showing how the structures previously discussed can be viewed as special cases within a more general framework. The final chapters of the book discuss some topics like monads that can then be explored using the machinery of this framework.

Set

Chapter 3 covers the category \mathbf{Set} of sets.

One of the definitions here that’s less common in engineering is that of coproduct. The coproduct of sets X and Y, denoted X \sqcup Y is defined as the disjoint union of X and Y. I first came across disjoint union as a concept when doing some earlier holiday reading on topology. We also here get the first experience of the author’s use of informal ‘slogans’ to capture the ideas present in more formal definitions and theorems: “any time behavior is determined by cases, there is a coproduct involved.” I really like this approach – it’s the kind of marginalia I usually write in books myself, but pre-digested by the author!

The notation \text{Hom}_{\mathbf{Set}}(X,Y) is used for the set of functions from X to Y, which we are encouraged to refer to as morphisms, all without really explaining why at this stage – it of course all becomes clear later in the book.

To give a flavour of the kind of exercises provided, we are asked to prove an isomorphism \text{Hom}_{\text{\bf Set}}(X \sqcup Y, A) \xrightarrow{\cong} \text{Hom}_{\text{\bf Set}}(X, A) \times \text{Hom}_{\text{\bf Set}}(Y, A). This reminds me of digital circuit construction: the former corresponds to a time-multiplexed circuit capable of producing outputs from A with inputs from either X or Y, together with an identifier for which (to feed a multiplexer select line); the latter corresponds to two circuits producing both the outputs rather than just the selected one.

The fibre product, X \times_Z Y, given functions f : X \to Z and g : Y \to Z is defined as the set X \times_Z Y = \{ (x, z, y) | f(x) = z = g(y)\}. Any set isomorphic to this fibre product is called a pullback of the diagram X \xrightarrow{f} Z \xleftarrow{g} Y.

The fibre sum, X \sqcup_W Y, given functions f : W \to X and g : W \to Y is the quotient of X \sqcup W \sqcup Y by the equivalence relation \sim generated by w \sim f(w) and w \sim g(w) for all w \in W. A pushout of X and Y over W is any set Z which is isomorphic to X \sqcup_W Y.

A good intuitive description in the text is to think of the fibre sum of X and Y as gluing the two sets together, with W as the glue. A nice example from one of the exercises is W = \mathbb{N}, X = \mathbb{Z}, Y a singleton set, f : W \to X defined by w \mapsto -(w+1). Then we’re left with X \sqcup_W Y \cong \{ -, 0, 1, \ldots \}, where - is an element corresponding to the collapsing of all negative elements in X to a single point.

The familiar notions of surjective and injective functions are recast as monomorphisms and epimorphisms, respectively. A function f : X \to Y is a monomorphism if for all sets A and pairs of functions g, g' : A \to X, if fg = fg' then g = g'. A function f : X \to Y is an epimorphism if for all sets B and pairs of functions h, h' : Y \to B, if hf = h'f then h = h'. By this time in the book, we get the hint that we’re learning this new language as it will generalise later beyond functions from sets to sets – the chapter ends with various slightly more structured concepts, like set-indexed functions, where the idea is always to define the object and the type of morphism that might make sense for that object.

Monoids, Groups, Graphs, Orders & Databases

In Chapter 4, the reader is introduced to several other structures. With the exception of the databases part, this was familiar material to me, but with slightly unfamiliar presentation.

We learn the classical definition of a monoid and group, and also discuss the free monoid generated by a set X as being the monoid consisting of set \mbox{List}(X) of lists of elements of X, with the empty list as the identity and list concatenation as the operation. We discuss the idea of a monoid action on a set S, i.e. a function \circlearrowleft: M \times S \to S associated with the monoid M having the properties e \circlearrowleft s = s for any s \in M, where e is the identity, and m \circlearrowleft (n \circlearrowleft s) = (m \star n) \circlearrowleft s, where m and n are any elements of M, s is any element of S, and \star is the monoid operation. The discussion of monoids is nice and detailed, and I enjoyed the exercises, especially those leading up to one of the author’s slogans: “A finite state machine is an action of a free monoid on a finite set.”

Graphs are dealt with in a slightly unusual way (for me) as quadruples G := (V, A, \mbox{src}, \mbox{tgt}) of a set of vertices V, a set of arrows A, a functions \mbox{src}, \mbox{tgt} mapping each arrow into its source and destination vertex. This corresponds then to what I would call a multigraph, as parallel edges can exist, though it is called a graph in the book.

For each of the structures discussed, we finish by looking at their morphisms.

As an example, let {\mathcal M} := (M, e, \star) and {\mathcal M'} := (M', e', \star') be monoids (the tuple is of the set, identity element, and operation). Then a monoid homomorphism f: {\mathcal M} \to {\mathcal M'} is a function f: M \to M' preserving identity and operation, i.e. (i) f(e) = e' and f(m_1 \star m_2) = f(m_1) \star' f(m_2) for all m_1, m_2 \in M.

Similarly, for graphs, graph homomorphisms f: G \to G' for G := (V, A, \mbox{src}, \mbox{tgt}) and G' := (V', A', \mbox{src}', \mbox{tgt}') are defined as pairs of functions f_0: V \to V' and f_1: A \to A' preserving the endpoints of arrows, i.e. such that the diagrams below commute.

Order morphisms are similarly defined, this time as those preserving order.

Thus we are introduced to several different (somewhat) familiar mathematical structures, and in each case we’re learning about how to describe morphisms on these structures that somehow preserve the structure.

Categories, Functors, and Natural Transformations

Things really heat up in Chapter 5, where we actually begin to cover the meat of category theory. This is exciting and kept me hooked by the pool. The unusual presentation of certain aspects in previous chapters (e.g. the graphs) all made sense during this chapter.

A category \mathcal{C} is defined as a collection of various things: a set of objects \text{Ob}(\mathcal{C}), a set of morphisms \text{Hom}_{\mathcal{C}}(x,y) for every pair of objects x and y, a specified identity morphism \text{id}_x for every object x, and compositions \circ: \text{Hom}_{\mathcal{C}}(y,z) \times \text{Hom}_{\mathcal{C}}(x,y) \to \text{Hom}_{\mathcal{C}}(x,z). To qualify as a category, such things must satisfy identity laws and also associativity of composition.

Examples of the category of Monoids, of Groups, of Sets, of Graphs, of Pre-Orders, etc., are explored. Isomorphisms between objects in a category are carefully defined.

Functors are introduced as transformations from one category to another, via a function transforming objects and a function transforming morphisms, such that identities and composition and preserved. Functors are then used to rigorously explore the connections between preorders, graphs, and categories, and I finally understood why the author had focused so much on preorders in the previous chapters!

Things started to take a surprising and exciting turn for me when the author shows, via the use of functors, that a monoid is a category with one object (the set of the monoid becomes the morphisms in the category), and that groups can therefore be understood as categories with one object for which each morphism is an isomorphism. This leads to a playful discussion, out of which pops groupoids, amongst other things. Similar gymnastics lead to preorders reconstituted as categories in which hom-sets have one or fewer elements, and graphs reconstituted as functors from a certain two-object category to \mathbf{Set}. Propositional logic is re-examined from a categorical perspective, though I suspect this is just scratching the surface of the interplay between category theory and logic, and I’d love to read more about this. For me, this was a highlight – that by playing with these definitions and trying things out, all these wonderful familiar structures arise.

The final part of this chapter deals with natural transformations. Just as functors transform categories to categories, natural transformations transform functors to functors. Specifically, a natural transformation between functor F : {\mathcal C} \to {\mathcal D} and G : {\mathcal C} \to {\mathcal D} is defined as a morphism (in {\mathcal D}) \alpha_X for each object X in {\mathcal C}, such that the diagram below commutes for every morphism f : X \to Y in {\mathcal C}.

Screenshot 2019-08-01 at 04.52.46

I think this section on natural transformations is where the huge number of exercises in the book really came into its own; I doubt very much I would’ve grasped some of the subtleties of natural transformations without them.

We learn about two kinds of composition of natural transformations. The first combines two natural transformations on functors F,G,H : {\mathcal C} \to {\mathcal D}, say one \alpha: F \to G and one \beta: G \to H to obtain a natural transformation \beta \circ \alpha: F \to H. The other combines a natural transformation \gamma_1 between functors F_1, F_2 : {\mathcal C} \to {\mathcal D} with one \gamma_2 between functors G_1, G_2 : {\mathcal D} \to {\mathcal E}, to produce one, \gamma_2 \Diamond \gamma_1: G_1 \circ F_1 \to G_2 \circ F_2.

An equivalence relation {\mathcal C}' \simeq {\mathcal C} – weaker than isomorphism – is introduced on categories as a functor F : {\mathcal C} \to {\mathcal C}' such that there exists a functor F' : {\mathcal C}' \to {\mathcal C} and natural isomorphisms id_{\mathcal C} \xrightarrow{\cong} F' \circ F and id_{{\mathcal C}'} \xrightarrow{\cong} F \circ F'. This seemed odd to me — the lack of insistence on isomorphism for equivalence — but the examples all make sense. More importantly it is shown that this requirement is sufficient for the on-homomorphism part of the functor F to be bijective for every fixed pair of objects in {\mathcal C}, which feels right. I think my understanding could benefit from further interesting examples of usefully equivalent but non-isomorphic categories though – suggestions welcome!

Limits and Colimits

Chapter 6 is primarily given over to the definitions and discussion of limits and colimits, generalising the ideas in \mathbf{Set} to other categories. The chapter begins by looking again at product and coproduct. As an example, for a category {\mathcal C}, with objects X, Y \in \text{Ob}({\mathcal C}), a span is defined as a triple (Z, p, q) where Z is an object in {\mathcal C} and p : Z \to X and q : Z \to Y are morphisms in {\mathcal C}. A product X \times Y is then defined as as span on X and Y such that for any other span (Z, p, q) on X and Y, there exists a unique morphism t_{p,q} such that the following diagram commutes. Again, I rather like the author’s slogan for this: “none shall map to X and Y except through me!”

Screenshot 2019-10-06 at 15.36.29

The chapter then further generalises these ideas to the the ideas of limit and colimit. This section was a step up in abstraction and also one of the least “applied” parts of the book. It has certainly piqued my interest – what are limits and colimits in various structures I might use / encounter in my work? – but I found the examples here fairly limited and abstract compared to the copious ones in previous chapters. It’s in this chapter that universal properties begin to rear their head in a major way, and again I’m still not totally intuitively comfortable with this idea – I can crank the handle and answer the exercises, but I don’t feel like I’m yet at home with the idea of universal properties.

Categories at Work

In the final section of the book, we first study adjunctions. Given categories {\mathcal B} and {\mathcal A}, an adjunction is defined as a pair of functors L: {\mathcal B} \to {\mathcal A} and R: {\mathcal A} \to {\mathcal B} together with a natural isomorphism \alpha whose component for A \in \text{Ob}({\mathcal A}), B \in \text{Ob}({\mathcal B}) is \alpha_{B,A}: \text{Hom}_{\mathcal A}(L(B), A) \xrightarrow{\cong} \text{Hom}_{\mathcal B}(B, R(A)). Several preceding results and examples can then be re-expressed in this framework, and understood as examples of adjunctions. Adjuctions are also revisited later in the chapter to show a close link with monads.

Categories of functors are discussed, leading up to a discussion of the Yoneda lemma. This is really interesting stuff, as it seems to cross huge boundaries of abstraction. Consider a category {\mathcal C}. Spivak first defines a functor Y_c := \text{Hom}_{\mathcal{C}}(c, -) : {\mathcal C} \to \mathbf{Set} sending objects in {\mathcal C} to the set \text{Hom}_{\mathcal C}(c,d) and similarly for morphisms. This is then used in the following theorem:

Let {\mathcal C} be a category, c \in \text{Ob}(\mathcal{C}) be an object, and I: {\mathcal C} \to \mathbf{Set} be a set-valued functor. There is a natural bijection \text{Hom}_{\mathcal{C}-\mathbf{Set}}(Y_c, I) \xrightarrow{\cong} I(c).

This looks super weird: the right-hand side is just a plain old set and yet the left hand-side seems to be some tower of abstraction. I worked through a couple of examples I dreamt up, namely \mathcal{C} = \mathbf{1} (the category with one object) with I mapping that object to an arbitrary set, and \mathcal{C} = \mathbf{Set} with I = \text{id}_{\mathbf{Set}} and convinced myself this works just fine for those two examples. Unfortunately, the general proof is not covered in the book – readers are referred to Mac Lane – and while some discussion is provided regarding links to databases, I felt like I need a proof to properly grasp what’s going on here under the hood.

Monads are then covered. I’ve briefly seen monads in a very practical context of functional programming, specifically for dealing with I/O in Haskell, but I’ve never looked at them from a theoretical perspective before.

A monad on \mathbf{Set} (monads are only defined in the book on \mathbf{Set}, for reasons not totally apparent to me) is defined as a triple (T, \eta, \mu) consisting of a functor T: \mathbf{Set} \to \mathbf{Set}, a natural transformation \eta: \text{id}_{\mathbf{Set}} \to T, and natural transformation \mu: T \circ T \to T such that the following diagrams commute:

 

Screenshot 2019-12-08 at 14.52.30Screenshot 2019-12-08 at 14.52.41Screenshot 2019-12-08 at 14.52.50

I found this discussion enlightening and relatively straight-forward. Firstly, exception monads are discussed, like Maybe in Haskell, and a nice example of wrapping sets and their functions into power sets and their functions, for which I again enjoyed the exercises. Kleisli categories of monads are defined, and it then becomes apparent why this is such a powerful formalism to think about concepts like program state and I/O.

The Kleisli category \mathbf{Kls}(\mathcal{T}) of a monad \mathcal{T} = (T, \eta, \mu) on \mathbf{Set} is defined as a category with objects \text{Ob}(\mathbf{Kls}(\mathcal{T}) = \text{Ob}(\mathbf{Set}) and morphisms \text{Hom}_{\mathbf{Kls}(\mathcal{T})}(X,Y) = \text{Hom}_{\mathbf{Set}}(X,T(Y)). Identity is \text{id}_X: X \to X in \mathbf{Kls}(\mathcal{T}) is exactly \eta_X in \mathbf{Set}  and composition of morphisms f: X \to Y and g: Y \to Z in \mathbf{Kls}(\mathcal{T}) is given by \mu_z \circ T(g) \circ f in \mathbf{Set}. You can intuitively see the ‘baggage’ of T being carried through in \mathbf{Set} yet hidden from view in \mathbf{Kls}.

Finally, the book ends with a (very) brief discussion of operads. I’ve not come across operads before, and from this brief discussion I’m not really sure what they ‘buy’ me beyond monoidal categories.

General

The exercises are copious and great, especially in the earlier chapters. They kept me engaged throughout and helped me to really understand what I was reading, rather than just think I had. In the later exercises there are many special cases that turn out to be more familiar things. This is both enlightening “oh, an X is just a Y!” and frustrating “yeah, but I want to see Xs that aren’t Ys, otherwise what’s this all heavyweight general machinery for?” I guess this comes from trying to cover a lot of ground in one textbook.

I am struck by just how much mental gymnastics I feel I had to go through to see mathematical structures from the “outside in” as well as from the “inside out”, and how category theory helped me do that. This is a pleasing feeling.

The slogans are great and often amusing, e.g. when describing a functor from the category of Monoids to the category of Sets, “you can use a smartphone as a paperweight.”

The book ends rather abruptly after the brief discussion of operads, with no conclusion. I felt it would be nice to have a summary discussion, and perhaps point out the current directions the field of applied category theory is taking.

Overall, this book has been an enlightening read. It has helped me to see the commonality behind several disparate mathematical structures, and has encouraged me to look for such commonality and relationships in my own future work. Recommended!

Review: Verified Functional Programming in Agda

Agda is a dependently typed programming language. I’ve read about dependent types on and off for the past few years in magazine articles as well as discussed them a little with my collaborator Dan Ghica and others, but never really had the time to properly experiment until this Easter holiday. This book – by Aaron Stump – is interesting because it’s basic, practical, and jumps right into coding. I thought it would be good for someone like me who wants to play with the practice of dependent types and only delve into theoretical concerns as necessary or motivated to do so.

The book comes with its own Agda library, the Iowa Agda Library (IAL), a basic library of Booleans, lists, etc., simply written for the beginner to understand. It has a lot of exercises and code snippets based on that library which I found useful and relatively easy to follow.

The book assumes no functional programming expertise, and since I’ve done a (very) little bit of functional programming in Haskell in the distant past, this made the introductory material easy to follow and a useful revision aid.

Introductory Agda

The early parts of the book – Chapters 1 to 4 – are based on various data structures: Booleans, natural numbers, lists. Here the functional programming adds little to what I’ve seen before. The neat thing is the ability to use Agda as a theorem prover. This is what is described in the book as “external verification” – you write code and then you prove – separately – theorems about the code.

Firstly, let’s take a look at some example Agda code, taken from Chapter 4:


take : ∀{ℓ}{A : Set ℓ} → (n : ℕ) → 𝕃 A → 𝕃 A
take 0       _         = []
take (suc n) []        = []
take (suc n) (x :: xs) = x :: take n xs

There’s a few interesting things here already. Firstly, unicode! It feels surprisingly liberating to write unicode characters in code. Secondly the type declaration declares (via ∀), the function to be polymorphic – it operates on lists (𝕃 from IAL) of any type A (of any level ℓ within a hierarchy of type levels).

The real excitement comes when we start proving theorems. For example, given the function nthTail from IAL which returns all elements after the first n elements in a list (and the empty list if the list has fewer than n elements), we can write


ex46 : ∀{ℓ}{A : Set ℓ} (n : ℕ)(l : 𝕃 A) → (take n l) ++ (nthTail n l) ≡ l
ex46 0        _                           = refl
ex46 (suc n) []                           = refl
ex46 (suc n)  (x :: xs) rewrite ex46 n xs = refl

Here ex46 is just the name of my proof (it was Exercise 4.6 in the book). The type declaration for the function ex46 can be read as “I’ll produce a proof that concatenating (take n l) with (nthTail n l) just produces l for all natural numbers n and for all lists of elements of any type A. Here , taken from IAL, is propositional equality, and the only property of it we’re using is reflexivity; in fact the only inhabitant of this type is refl. The first two lines of the proof work because the definitions of take and nthTail are such that when the first argument is zero or the second argument is empty, the left and right hand side normalize to the same result, namely .l ≡ .l in the first case (.l being the name assigned to the second unnamed argument _) and [] ≡ [] in the second case. The third case explicitly uses induction: rewrite is a directive that applies the induction hypothesis ex46 n xs to the proof of ex46 (suc n) (x :: xs). Very pleasing!

Dependent Types Revealed

Chapter 5 is where we really start to see the power of dependent types. Up until this point, we’ve written functional programs and then proved properties about those programs. We have definitely used dependent types in the process, but they were somehow “invisible” to us as a user, and – at least for the exercises in the book – we could ignore the dependent-type based implementation without really affecting the usability of the approach. Chapter 5 shows how the proofs can be part of the programs themselves, using the type system to express the properties we wish to enforce. The simplest and clearest example given is a declaration of a vector datatype:


data 𝕍 {ℓ} (A : Set ℓ) : ℕ → Set ℓ where
[] : 𝕍 A 0
_::_ : {n : ℕ} (x : A) (xs : 𝕍 A n) → 𝕍 A (suc n)

Note here that a vector, unlike a list, has a specific length built into its type. As a result, you get various properties for free, for example Agda’s type system verifies that the length of the concatenation of two vectors is the sum of the individual lengths from a straight-forward definition of a concatenation operation; the code below type checks, and that’s all that necessary.


_++𝕍_ ∀{ℓ}{A : Set ℓ}{n m : ℕ} → 𝕍 A n → 𝕍 A m → 𝕍 A (n + m)
[] ++𝕍 ys = ys
(x :: xs) ++𝕍 ys = x :: (xs ++𝕍 ys)

A more complex example of Braun trees is described in detail in the book. These are binary trees where the data at a node has a value less than or equal to that of its children while the size of the node left and right subtrees are either equal or the left subtree is exactly one node larger than the right subtree. The author chooses to encode this latter property within the declaration of the Braun tree type, and then shows how to write code for tree insertion, deletion, etc.

I enjoyed this chapter, though it had me puzzling for quite some time over sigma types, a subtle and powerful idea that I think deserves a bit more discussion than the space given in the book; I ended up reading around this topic a bit before I felt comfortable. An example given considers a function that turns lists into vectors. Immediately we have a problem – what is the return type of this function? It should be (𝕍 A n) but n is not fixed! The solution presented is to return a type written in Agda as Σ ℕ (λ n → 𝕍 A n); an inhabitant of this type should be interpreted as “a pair consisting of a natural number and a function which, when applied to that natural number, returns the vector type we’re actually interested in.” This takes a little getting used to!

Type-Level Computation

Chapter 6, introduces the idea of proof by reflection, amongst other things. This looks super-powerful, and I’m motivated to dig further into this topic when I have the time. The author illustrates the concept by creating a list simplifier, which simplifies expressions on lists. By reflecting the operations on lists as corresponding constructors for list expressions, and then proving the soundness of the simplifier, he is able to use this soundness proof to, inter alia, prove properties about the operations operating on lists. The example is really interesting, though unlike in previous chapters of the book, it’s not really discussed in detail in Chapter 6, and you have to look carefully at the code accompanying the book to see how it all fits together.

The other primary example of Chapter 6 is a statically typed version of printf, the archetypal type-evasive function of my youth

int printf( const char *restrict format, ... );

Here, the author presents an example credited to Augustsson, showing that dependent types can come to the rescue essentially by providing a type for C’s “…” which depends on the format string.

The integer example from this chapter is the closest to my own interests, showing how integers can be defined using naturals in an interesting way: an integer is a natural together with a sign which is unit (trivial) if the natural is zero but has Boolean type otherwise, i.e. the type of the sign is dependent on the value of the natural, leading to the following type declaration and definition of ℤ [taken from here].


ℤ-pos-t : ℕ → Set
ℤ-pos-t 0 = ⊤
ℤ-pos-t (suc _) = 𝔹

data ℤ : Set where
mkℤ : (n : ℕ) → ℤ-pos-t n → ℤ

I had followed through the discussion in the book easily, but came totally unstuck when trying to implement a seemingly obvious exercise I set myself: prove that integer multiplication commutes using a pre-existing proof of the commutativity of multiplication over the naturals.

I had defined integer multiplication as:


_*ℤ_ : ℤ → ℤ → ℤ
_*ℤ_ (mkℤ 0 _)        _                = mkℤ 0 triv
_*ℤ_ (mkℤ (suc n) _)  (mkℤ 0 _)        = mkℤ 0 triv
_*ℤ_ (mkℤ (suc n) sn) (mkℤ (suc m) sm) = mkℤ ((suc n) * (suc m)) (sn xor sm)

where triv is the single inhabitant of the type.  I had therefore naively assumed that one could simply write the proof as:

*ℤcommutes (mkℤ 0 _)        (mkℤ 0 _) = refl
*ℤcommutes (mkℤ 0 _) (mkℤ (suc bN) _) = refl
*ℤcommutes (mkℤ (suc aN) _) (mkℤ 0 _) = refl
*ℤcommutes (mkℤ (suc a) sa) (mkℤ (suc b) sb) rewrite xor-comm sa sb | (*comm (suc a) (suc b)) = refl

However, this fails – for the final pattern – for a very interesting reason that was, however, totally incomprehensible to me at first. Agda reports:

𝔹 !=< ℤ-pos-t w of type Set
when checking that the type
(b a w : ℕ) →
w ≡ suc (a + b * suc a) →
(sa sb : 𝔹) →
mkℤ w (sb xor sa) ≡ mkℤ (suc (a + b * suc a)) (sb xor sa)
of the generated with function is well-formed

“Huh? ‘Generated with function?'”

Thus began a small odyssey of Easter holiday evenings spent learning how rewrites are implemented as withs, and how withs in turn are implemented using generated auxiliary functions. I found Norrell and Chapman’s tutorial, together with the good quality documentation provided me with the necessary additional background required to teach myself what was going on here and to produce a final, working proof:


*ℤcommutes : ∀ (a b : ℤ) → a *ℤ b ≡ b *ℤ a
*ℤcommutes (mkℤ 0 _)        (mkℤ 0 _)        = refl
*ℤcommutes (mkℤ 0 _)        (mkℤ (suc m) sm) = refl
*ℤcommutes (mkℤ (suc n) _)  (mkℤ 0 _)        = refl
*ℤcommutes (mkℤ (suc n) sn) (mkℤ (suc m) sm)
with m + n * suc m | *comm (suc n) (suc m)
... | .(n + m * suc n) | refl rewrite xor-comm sn sm = refl

I have mixed feelings about this experience. On the one hand, I’ve learnt a lot about Agda specifically (and dependent types in general) by being forced to go through this process to understand what’s going on under the hood. On the other hand, this seems like a lot of hoops to jump through for this simple proof, and the book under review didn’t have all the answers I was looking for to solve the exercise. I don’t see the latter point as a big deal, but if you’re going to do your explorations on holiday, make sure you have a wifi connection available so you can look beyond the book for help!

Selected Topics

The remainder of the book is eclectic but interesting.

Chapters 7 and 8 combine to produce a larger example piece of code – a Huffman coder / decoder. It’s good to see the ideas put into practice in a larger case study, of course, though these chapters were less interesting to me personally.

Chapter 9 really recaptured my attention, though! The author uses Agda’s types to reason about termination. He explicitly sets up a general datatype representing what it means for a relation DAG to not have any infinite chain of successor nodes starting from a given named node, something he calls a terminating relation. He then proves in Agda that > over ℕ is terminating in this sense. The chapter then goes on to apply this to producing a termination proof for a recursively defined division function. I’m aware of this kind of reasoning, and have even played a bit with tools making use of these ideas like Byron Cook et al.‘s Terminator. So it’s interesting to see this in the context of the type system of Agda where in fact it becomes rather important since all functions must terminate in Agda – otherwise none of the proofs discussed above make sense!

The final chapter of the book, Chapter 10, formalises some ideas in a particular logic within Agda, using Agda to prove soundness and completeness of the logic.

Conclusion

The book ends rather abruptly with Chapter 10 on intuitionistic logic and Kripke semantics. There is no rounding off, summarising lessons learnt or directions for future research or indeed practical programming advice.

Nevertheless, I learnt a lot from reading this book – and a lot more from doing its exercises! It’s given me an initial understanding of one of the most recently talked-about developments in programming. I’m left keen to try to apply this to some real-world problems in my research! Well recommended holiday reading.

Book Review: Complexity and Real Computation

Over several holiday and train journeys, I’ve been slowly reading and digesting this book by Blum, Cucker, Shub, and Smale. It’s one of a number of books I’ve read that are definitively not in my research field – hence holiday reading – and yet touches it tangentially enough to get me excited. I’ve been interested in computation defined in terms of real numbers ever since my own PhD thesis back in 1998-2001, which looked at ways to take a specific class of computation defined on the reals and synthesise hardware circuits operating in very efficient fixed-point arithmetic. While my own work has often involved specifying computation on the reals, including recently exploiting the differences between real and finite precision computation for circuits, this book takes a fundamental look at computation over the reals themselves. In particular, it looks at what happens to notions of computability and complexity if you consider real addition, multiplication, division, and subtraction as basic operators in a computation model: what is computable, what resulting complexity bounds can be proved? Ever since my first exposure to algebraic geometry, especially real algebraic geometry, I see it everywhere – and have even made some small forays [1,2] into harnessing its power for hardware design. It’s a delight to see computability and algebraic geometry coming together so neatly in this text.

Part I: Basic Development

The authors define a computational model that is very natural for analysis: a computation consists of a control-flow graph consisting of input nodes, computation nodes (polynomial or rational maps), branch nodes (checking for equality or inequality, depending on whether the ring / field being computed over is ordered), and output nodes. For the remainder of this review, I will consider only computation over the reals, though some of the results are more general. Theorems naturally flow from this setting. My favourites: the halting set of a machine is a countable union of semialgebraic sets; there are “small” systems of polynomial inequalities, of low degree, describing behaviour of a machine up to T time steps. As a by product, the first practical uses of this theory are illustrated, e.g. a proof that the Mandelbrot set is not decidable over {\mathbb R}. The interplay between the uncountable (program state) and the countable (program counter state) leads to some intricate and enjoyable analysis.

Chapter 3 generalises classical results for universal Turing machines, typically expressed in terms of unbounded tapes with binary storage at each location, to the case where an element of an arbitrary ring is stored at each location (imagine a real number stored at each location). Additional shift operations are introduced to deal with moving along the tape. It is shown that operation over unbounded sequences allows for uniformity of algorithm across dimensionality but does not add additional computational power for fixed dimensionality. In finite time we can only explore a finite portion of the tape, so we still retain the semi-algebraic (or countable union of semi-algebraic) sets for the halting sets derived in Chapter 2.

Chapter 4 extends the usual complexity-theoretic classes P and EXP to computation over a ring. Computation over {\mathbb Z}_2 corresponds to the classical case. There are no great surprises here for a reader who has read a basic text in computational complexity together with Chapters 1-3 of this book. The authors round off the chapter by using this new machinery to cast results from Goedel and Tarski from an interesting perspective. For example, Goedel becomes there exist definable undecidable sets over {\mathbb Z}, while Tarski becomes every definable set over {\mathbb R} is decidable over {\mathbb R}. Questions of definability are then put off until the very end of this book – and this review.

Chapter 5 shows how classical NP, NP-complete and NP-hard definitions carry forward into computation over a general ring. Then, specialising the results to particular rings and particular costs of elementary operations (unit? dependent on size of numbers?), the NP-completeness of various key problems are shown. In particular, it is shown that the classical 3-SAT NP-completeness result can be obtained by working over the ring {\mathbb Z}_2. The chapter ends with an interesting discussion of the P ?= NP question over various rings and with various costs. It’s interesting, for example, that the problem of determining whether there is a zero of a degree-4 polynomial is NP-complete over {\mathbb R} while that of a degree 3 is in P; analogous to the situation with 3-SAT versus 2-SAT in the classical setting.

Chapter 6 moves on to look specifically at machines over {\mathbb Z} with bit cost, i.e. the bigger the number, the more it costs to operate on it. It is shown that these machines are polynomially equivalent to those operating over {\mathbb Z}_2 with unit cost, i.e. anything you can write in terms of algebraic operations over {\mathbb Z} you can also write under the classical Turing model of computation, and vice versa, with only polynomial blow up in execution steps. This is not too surprising, and ultimately derives from the standard possibility of writing an integer as its binary expansion. The chapter then takes a detour via Farkas’s Lemma to show that linear programming feasibility (LPF) is in NP, a rather intricate result I first saw in Schrijver’s encyclopaedic book when working with my former PhD student Qiang Liu on discrete linear algebra for hardware memory optimization. It is noted that LPF’s status depends strongly on the choice of computation model and ring: in the (equivalent) classic setting of integers with bit cost it is NP-complete, over rationals with bit cost it is in P, while its status over {\mathbb R} is apparently open.

Part II: Some Geometry of Numerical Algorithms

This part of the book turns to look at specific algorithms such as Newton’s method and a homotopy method for nonlinear optimization. In each case, complexity results are given in terms of the number of basic real arithmetic operations required. Some fun analysis and quite a clear exposition of homotopy methods, which I’ve worked with before in the context of convex optimization, especially when using barrier methods for linear and convex quadratic programming, something I looked at quite carefully when trying to design FPGA-based computer architectures for this purpose.

The chapter on Bézout’s Theorem assumed a rather more thorough grounding in differential topology than I have, and I wasn’t entirely sure how it sat with the rest of the work, as the constructive / algorithmic content of the theorems was less apparent to me.

The final chapters of this part of the book took me back to more familiar territory – first introduced to me by the outstandingly clear encyclopaedic textbook by Nick Higham – the discussion of condition numbers for linear equations and the relationship to the loss of precision induced by solving such systems (roughly, the relative error in the right hand side of Ax = b gets amplified by the condition number of the matrix A). What was interesting here is the authors’ probabilistic analysis – looking at the expected condition number if the matrix elements are iid Gaussian variables, for example. I have skimmed another book also co-authored by Cucker – on this very topic, and I’m waiting to have the time to read it. Condition for polynomial problems is covered, leading to complexity results for logarithmic barrier interior point methods for linear equations over \mathbb{Q} (spoiler: it’s in \mathbb{P}).

Part III: Complexity Classes over the Reals

After Part II, which was only interesting to me in parts, I started to get excited again by Part III. Lower bounds on the computational complexity of various problem are derived here based on some very interesting ideas. The basic scheme is to bound the number of different connected components in a (semi-)algebraic set in terms of the degree of the polynomials and the number of variables involved. The number of different connected components can then be used to bound the number of steps an (algebraic) machine takes to decide membership of that set.

In Chapter 17, the framework is extended to probabilistic machines, and the standard definition of BPP is extended to computation over an arbitrary ring, giving BPP_{R}. The chapter then goes on to show that probabilistic machines over \mathbb{R} can be simulated by deterministic machines over \mathbb{R} without blowing up execution time, i.e. that P_\mathbb{R} = BPP_\mathbb{R}, whereas this is unknown for the classical setting. The essence of the argument makes use of a special real number (shown, non constructively, to exist), encoding all the different “coin flips” that a probabilistic program makes during its execution, which is then encoded as a machine constant in the simulating machine.

Parallelism – close to my heart – is covered in Chapter 18, through the introduction of complexity classes PL^k_\mathbb{R}, parallel polylogarithmic time, for sets that can be decided in time O(\log^k n) using a polynomial number of processors, and PAR_\mathbb{R}, parallel polynomial time, for sets that can be decided in polynomial time using an exponential number of processors (definitions following a SPMD computational model). Algorithms are derived for matrix inversion and product and a result is cited for parallel quantifier elimination over the reals.

Chapter 22 looks at the relative power of machines over \mathbb{Z}_2, i.e. the classical model, compared to real machines which take inputs restricted to be drawn from \{0,1\}. It’s fairly obvious that the latter setting can provide a lot of additional power, for example deciding membership of a set in \mathbb{Z}^\infty_2 can be done by encoding the characteristic function of the set as the digits of some real constant in the program. It is then shown that additive real machines (a restricted form of the machines discussed above, containing no multiplication or division) that additionally contain branching only on equality can solve exactly the same problems in polynomial time as in the classical model. This is unlike the case where branching can be over inequality, where some additional computational power is provided by this model.

Descriptive Complexity, Chapter 23 is interesting: the descriptive power of (a particular) first order logic extended with fixed point and maximisation operations and of existential second order logic are shown to correspond to a particular subclass of P_\mathbb{R} problems and EXP_\mathbb{R}, respectively. In the latter case, I think this essentially says that known results in the classical case carry forward to computation over the reals despite the move away from a finite ring. I am not familiar enough with computational complexity literature to comment on how the former result compares with the classical setting, and I didn’t feel that this point is was very well described in the book.

Conclusion

Overall, I found this book a little eclectic, especially Part II, reflective of the fact that it has clearly been put together drawing from a variety of different primary sources spanning several decades. This should not be seen as a criticism. The choice of material was wonderful for me – as holiday reading at least! – tangentially touching so many different topics I’ve seen during my career and drawing me in. In places, the development was over an arbitrary ring, in places over \mathbb{C} and in places over \mathbb{R}, and although for some sections it was absolutely clear why, for others it was less clear. Overall, though, I would very much recommend anyone who likes to keep a foot in both the continuous and discrete worlds to take a look at this book.

Book Reviews: Popular Science

A friend recently reintroduced me to the genre of popular science – specifically, popular physics – something I had left behind in my mid teens. I remember voraciously reading the books of John Gribbin as a teenager, sitting in bed in my parents’ house thinking about black holes and quantum physics. Since then I’ve been more focused on – and interested in – my particular field.

However, following prompting, I’ve recently read two popular science books: Seven Brief Lessons on Physics by Carlo Rovelli, and A Beautiful Question by Frank Wilczek. Both are good books, though very different. The former is a brief, almost poetically written waltz through relatively, quantum physics, and more recent unification efforts – it can easily be read in a single afternoon. The second is a more weighty tome, a very personal view from a Nobel Prize winner of beauty and its embodiment in the symmetries present in nature. The chapters of the latter vary in terms of how much I enjoyed them, primarily because many I felt some had too much information not to take a mathematical approach, yet this was not forthcoming. Meanwhile the former was a joy to read because it its brevity skimmed the surface and left me wanting to dig further, specifically into ideas of quantum loop gravity and into what  modern neuroscience may or may not be able to tell us about the emergence of notions of “self”.

Book Review: Essential Topology

Topology is an area of mathematics with which I have no prior experience. I guess it was felt historically that engineers don’t really need topology, and this has filtered down the generations of study. Yet I’ve always found the ideas of topology intriguing, even if I have not deeply understood them. They seem to pop up in the most wide variety of places. While running a math circle for primary school kids, we ended up discussing Euler’s Polyhedron Formula, and I realised that if I wanted to explore these ideas more deeply, I would need a proper grounding in topology. From the wonderful lectures of Tadashi Tokieda which I watch with my son, to the more theoretical end of computer science I bump up against in my day-to-day research, topology seems to be everywhere. I found this book, Essential Topology by Martin D. Crossley, while browsing in a bookshop and decided to take it on holiday as holiday reading.

As presented, the book naturally falls into three parts: an introductory section, a section on basic topology, and a section on more advanced algebraic topology. Although short, the book is written as a sequence of a good number of chapters, 11 in total, which make the material more digestible. Moreover, I found the two “interludes” between sections of the book to be a great innovation – invaluable as a mechanism for orienting my reading, providing the appropriate informal guidance about the journey on which the text was taking me.

The introductory section begins with looking at the question of continuity from a rigorous perspective, bringing in both the epsilon-delta approach to continuity with which I am familiar, and an approach based on open sets with which I was not – but which is easily generalised to functions other than from \mathbb{R} to \mathbb{R}. It then moves on to axiomatically defines topological spaces, some standard topologies, and bases for topologies.

The second section begins to explore some of the properties that can be proved using the equipment set up in the previous chapters: connectedness, compactness, the Hausdorff property, homeomorphisms, disjoint unions, product and quotient spaces. I enjoyed this section greatly.

The third section then goes on to discuss various more advanced topics, looking at various topological invariants that have been well studied. Some highlight for me included:

Chapter 6 discusses the idea of homotopy as an equivalence between maps: two maps f,g: S \rightarrow T are homotopic iff there is a continuous function F: S \times [0,1] \rightarrow T allowing for a kind of continuous deformation of one function into the other, i.e. with F(s,0) = f(s) and F(s,1) = g(s). This is shown to be rather a broad equivalence, for example all continuous functions on \mathbb{R} are homotopic. However, working with other topological spaces, things get quite interesting. A big part of the chapter is given over to working with circles \mathbb{S}^1, where it is shown that there is a countable set of homotopy classes of functions from \mathbb{S}^1 to \mathbb{S}^1. This is very clearly described, and I seem to remember through the mists of time some of these issues cropping up informally in the context of my control theory courses as an undergraduate (or, for that matter, fathoming {\tt unwrap} in Matlab as an undergraduate!) The chapter ends with a proof of the fantastically named “Hairy ball theorem,” from which – amongst other things – it follows that at any given point in time, there’s a part of the world with zero wind speed!

Chapter 7 discusses the Euler characteristic as a topological invariant and introduces some interesting theorems. After introducing the idea of a ‘triangulable space’, it is stated that if two such spaces are homotopy equivalent then they have the same Euler number. More remarkable (to me!) is that when restricted to surfaces, it is sufficient for two such surfaces to have the same Euler number and the same orientability in order for them to be homotopy equivalent. Unfortunately, the proofs of both these results are omitted – apparently they are rather involved – references are given. I certainly appreciated the results collected in this chapter, but I found the exercises quite hard compared to other chapters, possibly partly because of the proof issue, but also because I found it hard to visualise some aspects, e.g. triangulation of a surface of genus 2, and since such surfaces had only been defined informally in the preceding chapters I could not (easily) fall back on a purely algebraic approach to the geometry. I did find it particularly interesting to see the Euler characteristic rigorously defined for a general class of spaces – the very definition in the primary math circle that had brought me to this book in the first place!

Chapter 8 discusses homotopy groups. The basic idea is that one can work with homotopy classes of maps from {\mathbb S}^n, the n-dimensional sphere, to a space X (actually pointed homotopy classes of pointed maps, but lets keep things simple) and it turns out that these classes form a group structure: they have an identity element (constant maps), an addition operation, etc. I guess the purpose here is to bring out the topological structure of X, though the special role played by {\mathbb S}^n was not totally apparent to me – why is this particular space so fruitful to study? I wonder if any readers can comment on this point for me.

Chapter 9 provides an introduction to Simplicial Homology, the idea – roughly – that those combinations of simplices within a simplicial complex which ‘could’ form boundaries of other simplices but ‘do’ not, tell us something about the topology of the simplicial complex. Homology is then introduced in a different form (Singular Homology) in Chapter 10, and it is stated that the two approaches coincide for triangulable spaces.

In these latter chapters of the book, some theorems tend to be stated without proof. The author is always careful to refer to other textbooks for proof, and I guess this is a necessary aspect of trying to introduce a very large subject in a brief textbook, but for me this made them somewhat less appealing than those in the first two sections. Nevertheless, I now feel suitably armed to begin looking at the world through a more topological lens. I wonder what I will see. For now I’m off to eat a coffee cup.

Book Review: Out of the Labyrinth: Setting Mathematics Free

This book, by Kaplan and Kaplan, a husband and wife team, discusses the authors’ experience running “The Math Circle”. Given my own experience setting up and running a math circle with my wife, I was very interested in digging into this.

The introductory chapters make the authors’ perspective clear: mathematics is something for kids to enjoy and create independently, together, with guides but not with instructors. The following quote gets across their view on the difference between this approach and their perception of “school maths”:

Now math serves that purpose in many schools: your task is to try to follow rules that make sense, perhaps, to some higher beings; and in the end to accept your failure with humbled pride. As you limp off with your aching mind and bruised soul, you know that nothing in later life will ever be as difficult.

What a perverse fate for one of our kind’s greatest triumphs! Think how absurd it would be were music treated this way (for math and music are both excursions into sensuous structure): suffer through playing your scales, and when you’re an adult you’ll never have to listen to music again.

I find the authors’ perspective on mathematics education, and their anti-competitive emphasis, appealing. Later in the book, when discussing competition, Math Olympiads, etc., they note two standard arguments in favour of competition: that mathematics provides an outlet for adolescent competitive instinct and – more perniciously – that mathematics is not enjoyable, but since competition is enjoyable, competition is a way to gain a hearing for mathematics. Both perspectives are roundly rejected by the authors, and in any case are very far removed from the reality of mathematics research. I find the latter of the two perspectives arises sometimes in primary school education in England, and I find it equally distressing. There is a third argument, though, which is that some children who don’t naturally excel at competitive sports do excel in mathematics, and competitions provide a route for them to be winners. There appears to be a tension here which is not really explored in the book; my inclination would be that mathematics as competition diminishes mathematics, and that should competition for be needed for self-esteem, one can always find some competitive strategy game where mathematical thought processes can be used to good effect. However, exogenous reward structures, I am told by my teacher friends, can sometimes be a prelude to endogenous rewards in less mature pupils. This is an area of psychology that interests me, and I’d be very keen to read any papers readers could suggest on the topic.

The first part of the book offers the reader a detailed (and sometimes repetitive) view of the authors’ understanding of what it means to do mathematics and to learn mathematics, peppered with very useful and interesting anecdotes from their math circle. The authors take the reader through the process of doing mathematics: analysing a problem, breaking it down, generalising, insight, and describe the process of mathematics hidden behind theorems on a page. They are insistent that the only requirement to be a mathematician is to be human, and that by developing analytical thinking skills, anyone can tackle mathematical problems, a mathematics for The Growth Mindset if you will. In the math circles run by the authors, children create and use their own definitions and theorems – you can see some examples of this from my math circle here, and from their math circles here.

I can’t say I share the authors’ view of the lack of beauty of common mathematical notation, explored in Chapter 5. As a child, I fell in love with the square root symbol, and later with the integral, as extremely elegant forms of notation – I can even remember practising them so they looked particularly beautiful. This is clearly not a view held by the authors! However, the main point they were making: that notation invented by the children, will be owned and understood by the children, is a point well made. One anecdote made me laugh out loud: a child who invented the symbol “w” to stand for the unknown in an equation because the letter ‘w’ looks like a person shrugging, as if to say “I don’t know!”

In Chapter 6, the authors begin to explore the very different ways that mathematics has been taught in schools: ‘learning stuff’ versus ‘doing stuff’, emphasis on theorem or emphasis on proof, math circles in the Soviet Union, competitive versus collaborative, etc. In England, in my view the Government has been slowly shifting the emphasis of primary school mathematics towards ‘learning stuff,’ which cuts against the approach taken by the authors. The recent announcement by the Government on times tables is a case in point. To quote the authors, “in math, the need to memorize testifies to not understanding.”

Chapter 7 is devoted to trying to understand how mathematicians think, with the idea that everyone should be exposed to this interesting thought process. An understanding of how mathematicians think (generally accepted to be quite different to the way they write) is a very interesting topic. Unfortunately, I found the language overblown here, for example:

Instead of evoking an “unconscious,” with its inaccessible turnings, this explanation calls up a layered consciousness, the old arena of thought made into a stable locale that the newer one surrounds with a relational, dynamic context – which in its turn will contract and be netted into higher-order relations.

I think this is simply arguing for mathematical epistemology as akin to – in programming terms – summarizing functions by their pre and post conditions. I think. Though I can’t be sure what a “stable locale” or a “static” context would be, what “contraction” means, or how “higher order relations” might differ from “first order” ones in this context. Despite the writing not being to my taste, interesting questions are still raised regarding the nature of mathematical thought and how the human mind makes deductive discoveries. This is often contrasted in the text to ‘mechanical’ approaches, without ever exploring the question of either artificial intelligence or automated theorem proving, which would seem to naturally arise in this context. But maybe I am just demonstrating the computing lens through which I tend to see the world.

The authors write best when discussing the functioning of their math circle, and their passion clearly comes across.

The authors provide, in Chapter 8, a fascinating discussion of the ages at which they have seen various forms of abstract mathematical reasoning emerge: generalisation of when one can move through a 5×5 grid, one step at a time, visiting each square only once, at age 5 but not 4; proof by induction at age 9 but not age 8. (The topics, again, are a far cry from England’s primary national curriculum). I have recently become interested in the question of child development in mathematics, especially with regard to number and the emergence of place value understanding, and I’d be very interested to follow up on whether there is a difference between this between the US, where the authors work, and the UK, what kind of spread is observed in both places, and how age-of-readiness for various abstractions correlates with other aspects of a child’s life experience.

Other very valuable information includes their experience on the ideal size of a math circle: 5 minimum, 10 maximum, as they expect children to end up taking on various roles “doubter, conjecturer, exemplifier, prover, and critic.” If I run a math circle again, I would like to try exploring a single topic in greater depth (the authors use ten one hour sessions) rather than a topic per session as I did last time, in order to let the children explore the mathematics at their own rate.

The final chapter of the book summarises some ideas for math circle style courses, broken down by age range. Those the authors believe can appeal to any age include Cantorian set theory and knots, while those they put off until 9-13 years old include complex numbers, solution of polynomials by radicals, and convexity – heady but exciting stuff for a nine year old!

I found this book to be a frustrating read. And yet it still provided me with inspiration and a desire to restart the math circle I was running last academic year. Whatever my beef with the way the authors present their ideas, their core love – allowing children to explore and create mathematics by themselves, in their own space and time – resonates with me deeply. It turns out that the authors run a Summer School for teachers to learn their approach, practising on kids of all ages. I think this must be some of the best maths CPD going.

Book Review: Surely You’re Joking, Mr Feynman

This Summer I had a long flight to Singapore. A few hours into long flights I tend to lose the will to read anything challenging, so I decided to return to a book I first read as a young teen when it was lent to me by family friend fred harris, the book Surely You’re Joking, Mr Feynman. I remember at the time that this book was an eye opener. Here was a Nobel prize winning physicist with a real lust for life coupled with an inherent enthusiasm for applying the scientific method to all aspects of life and a healthy lack of respect for rules and authority. As a teenager equally keen to apply the scientific method to anything and everything, and keen to work out my own ideas on authority and rules, yet somewhat lacking the personal confidence of Feynman, I found the book hugely enjoyable.

I still find the book hugely enjoyable. I was chuckling throughout – the practical jokes, the encounters with non-scientific “experts” and authority figures. However, I was also somewhat taken aback by Feynman’s attitude to women, which I found misogynistic in places, something that had totally passed me by as a teenager. Feynman’s attitudes were probably unremarkable for his time (Feynman was born in 1918) so I’m not attributing blame here, but I find it odd that I didn’t notice this in the 1980s. Googling for it now, I find commentary about this point is all over the Internet. It just goes to show how many subtleties pass children and teens by, something parents and educators would do well to remember.

Review: Love and Math by Edward Frenkel

I have a friend who thinks she does not understand maths (apologies to the American readers, I will use “maths” in this review rather than the “math” in the book title). She thinks maths is “all about numbers”. She thinks she’s not particularly good at it. She is wrong. Actually, I think she is remarkably gifted, primarily because she asks amazingly prescient questions. She has an inquisitive nature that I fear is often drilled out of most people by our “rigorous” schooling in the tedium of official school maths.

Over the past few months, I have spent quite a bit of time discussing maths with this friend, introducing her to set theory and trying to overcome a fear of algebra. (It is remarkable how many super bright people can reason about insanely complex general phenomena in their head without algebra in a way I never could, yet stumble over a simple equation!)

I saw this book reviewed elsewhere – I can’t quite remember where. It was a short but positive review, and the title is attractive to me, given my recent discussions. The author, Edward Frenkel, is a professor of mathematics at UC Berkeley, has made considerable contributions to mathematics himself, and also has a direct personal knowledge of many of the larger-than-life characters he talks about in the book. I knew nothing about his work, or his life, both of which turn out to be rather interesting, before reading this book.

The book is part autobiography and part introduction to the joy of mathematics. We learn about the author’s early career, the anti-Semitism he faced in Russia and his conversion as a teen from physics to mathematics, before the two come back together through quantum physics later in his career. We learn about his immersion in the Langlands Program.

As a book about “doing mathematics”, I think he does quite well at getting his feelings across. The detail of much of the mathematics is missing; sketches of subjects are given: Galois Groups, the Shimura-Taniyama-Weil Conjecture, Lie groups and Lie algebras, etc., but only very brief sketches. While this gives the reader insight into the depth of the subject areas covered by modern mathematics (and certainly helps to show the breadth of maths beyond school arithmetic) it doesn’t really show the reader mathematics. We read about proof, but we don’t experience proof. We read about revelation, but we don’t feel the revelation ourselves. We are told that mathematics rests on clear and unequivocal definitions, but these are absent from the book. Initially, this left me feeling unsatisfied. Slowly, I put aside my usual desire to understand everything I was reading, and began to enjoy the book as a novel, a love affair between the author and his subject, in which specific theorems appear only as partially sketched characters. From this perspective, the bulk of the book was very enjoyable, and provides insight not only into the beauty of mathematics, but also the ways in which mathematicians think. As an engineer, albeit one who has a lot of time for – and use for – pure mathematics, my way of thinking remains quite distinct from some of the methods and approaches outlined in this book. However, late in the book, where the author covers the links between this mathematics and quantum physics, I feel the unexplained mathematics again gets in the way. The author seems to acknowledge this, when he says (p.221), “All this stuff, as my dad put it, is quite heavy: we’ve got Hitchin moduli spaces, mirror symmetry, A-branes, B-branes, automorphic sheaves […] But my point is not for you to learn them all…”

The final chapter of the book takes a violent switch in direction as we learn of Frenkel’s artistic projects, in particular his film Rites of Love and Math. Had I known about this work beforehand, the book would have made a lot more sense! In this chapter, Frenkel provides his philosophical opinions on the nature of the material world, the world of our consciousness, and – in his view – the entirely separate Platonic world of mathematics. He says his approach to making the film was “let the viewers first feel rather than understand” – much the same could be said of this book.

The experience of reading this book has got me thinking in two directions.

Firstly, this is not the book I initially expected from the first few chapters. But how easy would it be to write a book which both conveys the joy of solving problems, say in Lie algebras, which Frenkel explores in some depth, and at the same time covers enough mathematics for a reasonably rigorous approach to the problem for a general reader? Perhaps such a task is impossible, and I ask too much. Does this mean that the beauty of modern mathematics is necessarily harder to access than than the beauty of modern art? Does this mean that we must necessarily pay the price of “boring maths” in order to be able to access “beautiful maths”? Or is there actually no “boring maths” and the prerequisite knowledge can also be described in a way that appeals and excites, even if not in one volume. I hope for the latter.

Secondly, the book has made me realise that I am far less certain of my own philosophical viewpoint on reality, Platonism, and knowledge than I used to be. It has made me want to explore these issues again and read more. Perhaps I will start with Penrose.

Is this the book I wanted for my friend? Probably not, but it’s certainly stimulated my thinking, and probably would also stimulate the thinking of others, regardless of their current level of mathematical maturity.

Review: Three Views of Logic: Mathematics, Philosophy, and Computer Science

I read this book largely by accident, because I was attracted by the title. As an academic in (or rather, on the edge of) computer science, I come across logic regularly, both in teaching and research. Mathematics underpins almost everything I do, and I’m certainly interested in whether a mathematician’s view of logic differs significantly from that of a theoretical computer scientist (as a keen reader of mathematics, I’m well aware that standard mathematical practice of proof differs quite strongly from the formal approach studied in computer science, but this isn’t quite the same thing!) I once had a strong interest in philosophy, most significantly in epistemology, which is being rekindled by my involvement in education at the school level, and so combining all these factors, the title was very appealing. What I actually discovered when I started reading wasn’t exactly what I expected. But this book turns out to be an excellent, crystal clear, textbook suitable for undergraduates and those with just a limited level of mathematical maturity. The book is explicitly partitioned into three sections, but in practice I found that the first two sections, proof theory and computability theory (the “maths” and the “computer science”) were very familiar material for any computer scientist, and from my perspective there was no very clear difference in approach taken, just a difference in the range of topics covered.

Part 1, by Loveland, covers propositional and predicate logic, with a particular focus on automated proof by resolution. I found the description of resolution to be very clear, with a particular focus on the difference between resolution for propositional and predicate logic, and with one of the clearest descriptions of completeness results I’ve seen.

Part 2, by Hodel, covers computability theory. Again, the clarity is exemplary. The first chapter discusses concepts informally, the second precisely defines two machine models in a way very accessible to a computer engineer (effectively one supporting general recursion and one supporting only primitive recursion) and discusses their relative computational power. The insistence of an informal discussion first makes these distinctions come to life, and allows Hodel to frame the discussion around the Church-Turing thesis. The focus on compositionality when preserving partial recursiveness, and the emphasis on the ‘minimalisation’ operator (bounded and unbounded) was new to me, and again very clearly presented. Most introductory texts I’ve read only tend to hint at the link between Gödel’s Incompleteness Theorem and Church’s Undecidability Theorem, whereas Hodel makes this link precise and explicit in Section 6.6, Computability and the Incompleteness Theorem.

Part 3, by Sterrett, covers philosophical logic. In particular, Sterrett considers the existence of alternatives to (rather than extensions of) classical logics. She focuses on the logic of entailment aka relevance logic introduced by Anderson and Belnap, into which she goes into depth. This logic comes from rethinking the meaning ascribed to the connective, \rightarrow, logical implication. In classical logic, this is a most confusing connective (or at least was to my students when I taught an introductory undergraduate course in the subject long ago). I would give my students examples of true statements such as “George is the Pope” implies “George is Hindu” to emphasise the difference between the material implication of classical logic and our everyday use of the word. It is exactly this discrepancy addressed by Anderson and Belnap’s logic. I was least familiar with the content of this part of the book, therefore the initial sections came as something of a surprise, as I found them rather drawn out and repetitive for my taste, certainly compared to the crisp presentation in Parts 1 and 2. However, things got exciting and much more fast moving by Chapter 8, Natural Deduction, where there are clear philosophical arguments to be had on both sides. In general, I found the discussion very interesting. Clearly a major issue with the line of reasoning given by my Pope/Hindu example above is that of relevance. Relevance might be a slippery notion to formalise, but it is done so here in a conceptually simple way: “in order to derive an implication, there must be a proof of the consequent in which the antecedent was actually used to prove the consequent.” Actually making this work in practice requires a significant amount of baggage, based on tagging wffs with relevance indices, which get propagated through the rules of deduction, recalling to my mind, my limited reading on the Curry-Howard correspondence. The book finishes with a semantics of Anderson and Belnap’s logic, based on four truth values rather than the two (true/false) of classical logic.

I can’t emphasise enough how easy this book is to read compared to most I’ve read on the subject. For example, I read all of Part 1 on a single plane journey.  I will be recommending this book to any of my research students who need a basic grounding in computability or logic.