Some Notes on Metric Spaces

This post contains some summary informal notes of key ideas from my reading of Mícheál Ó Searcóid’s Metric Spaces (Springer, 2007). These notes are here as a reference for me, my students, and any others who may be interested. They are by no means exhaustive, but rather cover topics that seemed interesting to me on first reading. By way of a brief book review, it’s worth noting that Ó Searcóid’s approach is excellent for learning a subject. He has a few useful tricks up his sleeve, in particular:

  • Chapters will often start with a theorem proving equivalence of various statements (e.g. Theorem 8.1.1, Criteria for Continuity at a Point). Only then will he choose one of these statements as a definition, and he explains this choice carefully, often via reference to other mathematics.
  • The usual definition-theorem-proof style is supplemented with ‘question’ – these are relatively informally-stated questions and their answers. They have been carefully chosen to highlight some questions the reader might be wondering about at that point in the text and to demonstrate key (and sometimes surprising) answers before the formal theorem statement.
  • The writing is pleasant, even playful at times though never lacking formality. This is a neat trick to pull off.
  • There are plenty of exercises, and solutions are provided.

These features combine to produce an excellent learning experience.

1. Some Basic Definitions

A metric on a set X is a function d : X \times X \to {\mathbb R} such that:

  • Positivity: d(a,b) \geq 0 with equality iff a = b
  • Symmetry: d(a,b) = d(b,a)
  • Triangle inequality: d(a,b) \leq d(a,c) + d(c,b)

The combination of such a metric and a the corresponding set is a metric space.

Given a metric space (X,d), the point function at z is \delta_z : x \mapsto d(z,x).

A pointlike function u : X \to {\mathbb R}^\oplus is one where u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)

For metric spaces (X,d) and (Y,e), X is a metric subspace of Y iff X \subseteq Y and d is a restriction of e.

For metric spaces (X,d) and (Y,e), an isometry \phi : X \to Y is a function such that e(\phi(a),\phi(b)) = d(a,b). The metric subspace (\phi(X),e) is an isometric copy of (X,d).

Some standard constructions of metrics for product spaces:

  1. \mu_1 : (a,b) \mapsto \sum_{i=1}^n \tau_i(a_i,b_i)
  2. \mu_2 : (a,b) \mapsto \sqrt{\sum_{i=1}^n \left(\tau_i(a_i,b_i)\right)^2}
  3. \mu_\infty : (a,b) \mapsto \max\left\{\tau_i(a_i,b_i) | i \in {\mathbb N}\right\}

A conserving metric e on a product space is one where \mu_\infty(a,b) \leq e(a,b) \leq \mu_1(a,b). Ó Searcóid calls these conserving metrics because they conserve an isometric copy of the individual spaces, recoverable by projection (I don’t think this is a commonly used term). This can be seen because fixing elements of all-but-one of the constituent spaces makes the upper and lower bound coincide, resulting in recovery of the original metric.

A norm on a linear space V over {\mathbb R} or {\mathbb C} is a real function such that for x, y \in V and \alpha scalar:

  • ||x|| \geq 0 with equality iff x = 0
  • ||\alpha x|| = |\alpha|\; ||x||
  • ||x + y|| \leq ||x|| + ||y||

The metric defined by the norm is d(a,b) = ||a - b||.

2. Distances

The diameter of a set A \subseteq X of metric space (X,d) is \text{diam}(A) = \sup\{d(r,s) | r, s \in A\}.

The distance of a point x \in X from a set A \subseteq X is \text{dist}(x, A) = \inf\{ d(x,a) | a \in A\}.

An isolated point z \in S where S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) \neq 0.

An accumulation point or limit point z \in X of S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) = 0. Note that z doesn’t need to be in S. A good example is z = 0, X = {\mathbb R}, S = \{1/n | n \in {\mathbb N}\}.

The distance from subset A to subset B of a metric space is defined as \text{dist}(A,B) = \inf\{ d(a,b) | a \in A, b \in B\}.

A nearest point s \in S of S to z \in X is one for which d(z,s) = \text{dist}(z,S). Note that nearest points don’t need to exist, because \text{dist} is defined via the infimum. If a metric space is empty or admits a nearest point to each point in every metric superspace, it is said to have the nearest-point property.

3. Boundaries

A point a is a boundary point of S in X iff \text{dist}(a,S) = \text{dist}(a,S^c) = 0. The collection of these points is the boundary \partial S.

Metric spaces with no proper non-trivial subset with empty boundary are connected. An example of a disconnected metric space is X = (0,1) \cup (7,8) as a metric subspace of {\mathbb R}, while {\mathbb R} itself is certainly connected.

Closed sets are those that contain their boundary.

The closure of S in X is \bar{S} \triangleq X \cup \partial S. The interior is S \setminus \partial S. The exterior is (\bar{S})^c.

Interior, boundary, and exterior are mutually disjoint and their union is X.

4. Sub- and super-spaces

A subset S \subseteq X is dense in X iff \bar{S} = X, or equivalently if for every x \in X, \text{dist}(x,S) = 0. The archetypal example is that \mathbb{Q} is dense in \mathbb{R}.

A complete metric space X is one that is closed in every metric superspace of X. An example is \mathbb{R}.

5. Balls

Let b[a;r) = \{ x \in X | d(a,x) < r \} denote an open ball and similarly b[a;r] = \{ x \in X | d(a,x) \leq r \} denote a closed ball. In the special case of normed linear spaces, b[a;r) = a + rb[0;1) and similarly for closed balls, so the important object is this unit ball – all others have the same shape. A norm on a space V is actually defined by three properties such balls U must have:

  • Convexity
  • Balanced (i.e. x \in U \Rightarrow -x \in U)
  • For each x \in V \setminus \{0\}, the set \{ t \in \mathbb{R}^+ | t x \in U \},
    • is nonempty
    • must have real supremum s
    • sx \notin U

6. Convergence

The mth tail of a sequence x = (x_n) is the set \mbox{tail}_m(x) = \{x_m | n \in {\mathbb N}, n \geq m \}.

Suppose X is a metric space, z \in X and x= (x_n) is a sequence in X. Sequence x converges to z in X, denoted x_n \to z iff every open subset of X that contains z includes a tail of x. In this situation, z is unique and is called the limit of the sequence, denoted \mbox{lim }x_n.

It follows that for (X,d) a metric space, z \in X and (x_n) a sequence in X, the sequence (x_n) converges to z in X iff the real sequence (d(x_n,z))_{n \in \mathbb{N}} converges to 0 in {\mathbb R}.

For real sequences, we can define the:

  • limit superior, \mbox{lim sup } x_n = \mbox{inf } \{ \mbox{sup } \mbox{tail}_n(x) | n \in \mathbb{N} \} and
  • limit inferior, \mbox{lim inf } x_n = \mbox{sup } \{ \mbox{inf } \mbox{tail}_n(x) | n \in \mathbb{N} \}.

It can be shown that x_n \to z iff \mbox{lim sup } x_n = \mbox{lim inf } x_n = z.

Clearly sequences in superspaces converge to the same limit – the same is true in subspaces if the limit point is in the subspace itself. Sequences in finite product spaces equipped with product metrics converge in the product space iff their projections onto the individual spaces converge.

Every subsequence of a convergent sequence converges to the same limit as the parent sequence, but the picture for non-convergent parent sequences is more complicated, as we can still have convergent subsequences. There are various equivalent ways of characterising these limits of subsequences, e.g. centres of balls containing an infinite number of terms of the parent sequence.

A sequence (x_n) is Cauchy iff for every r \in \mathbb{R}^+, there is a ball of radius r that includes a tail of (x_n). Every convergent sequence is Cauchy. The converse is not true, but only if the what should be the limit point is missing from the space — adding this point and extending the metric appropriately yields a convergent sequence. It can be shown that a space is complete (see above for definition) iff every Cauchy sequence is also a convergent sequence in that space.

7. Bounds

A subset S of a metric space X is a bounded subset iff S = X = \emptyset or S is included in some ball of X. A metric space X is bounded iff it is a bounded subset of itself. An alternative characterisation of a bounded subset S is that it has finite diameter.

The Hausdorff metric is defined on the set S(X) of all non-empty closed bounded subsets of a set X equipped with metric d. It is given by h(A,B) = \max \{ \sup\{ \mbox{dist}(b, A) | b \in B\}, \sup\{ \mbox{dist}(a, B) | a \in A\} \}.

Given a set X and a metric space Y, f : X \to Y is a bounded function iff f(X) is a bounded subset of Y. The set of bounded functions from X to Y is denoted B(X,Y). There is a standard metric on bounded functions, s(f,g) = \sup \{ e(f(x),g(x)) | x \in X \} where e is the metric on Y.

Let X be a nonempty set and Y be a nonempty metric space. Let (f_n) be a sequence of functions from X to Y and g: X \to Y. Then:

  • (f_n) converges pointwise to g iff (f_n(z)) converges to g(z) for all z \in X
  • (f_n) converges uniformly to g iff \sup\{ e(f_n(x),g(x)) | x \in X \} is real for each n \in {\mathbb N} and the sequence ( \sup\{ e(f_n(x),g(x) | x \in X \})_{n \in {\mathbb N}} converges to zero in {\mathbb R}.

It’s interesting to look at these two different notions of convergence because the second is stronger. Every uniformly-convergent sequence of functions converges pointwise, but the converse is not true. An example is the sequence f_n : \mathbb{R}^+ \to \mathbb{R} given by f_n(x) = 1/nx. This converges pointwise but not uniformly to the zero function.

A stronger notion than boundedness is total boundedness. A subset S of a metric space X is totally bounded iff for each r \in {\mathbb R}^+, there is a finite collection of balls of X of radius r that covers S. An example of a bounded but not totally bounded subset is any infinite subset of a space with the discrete metric. Total boundedness carries over to subspaces and finite unions.

Conserving metrics play an important role in bounds, allowing bounds on product spaces to be equivalent to bounds on the projections to the individual spaces. This goes for both boundedness and total boundedness.

8. Continuity

Given metric spaces X and Y, a point z \in X and a function f: X \to Y, the function is said to be continuous at z iff for each open subset V \subseteq Y with f(z) \in V, there exists and open subset U of X with z \in U such that f(U) \subseteq V.

Extending from points to the whole domain, the function is said to be continuous on X iff for each open subset V \subseteq Y, f^{-1}(V) is open in X.

Continuity is not determined by the codomain, in the sense that a continuous function is continuous on any metric superspace of its range. It is preserved by function composition and by restriction.

Continuity plays well with product spaces, in the sense that if the product space is endowed with a product metric, a function mapping into the product space is continuous iff its compositions with the natural projections are all continuous.

For (X,d) and (Y,e) metric spaces, \mathcal{C}(X,Y) denotes the metric space of continuous bounded functions from X to Y with the supremum metric (f,g) \mapsto \sup\{ e(g(x),f(x)) | x \in X \}. \mathcal{C}(X,Y) is closed in the space of bounded functions from X to Y.

Nicely, we can talk about convergence using the language of continuity. In particular, let X be a metric space, and \tilde{\mathbb{N}} = \mathbb{N} \cup \{ \infty \}. Endow \tilde{\mathbb{N}} with the inverse metric (a,b) \mapsto |a^{-1} - b^{-1} | for a,b \in {\mathbb N}, (n,\infty) \mapsto n^{-1} and (\infty, \infty) \mapsto 0. Let \tilde{x} : \tilde{\mathbb{N}} \to X. Then \tilde{x} is continuous iff the sequence (x_n) converges in X to x_{\infty}. In particular, the function extending each convergent sequence with its limit is an isometry from the space of convergent sequences in X to the metric space of continuous bounded functions from \tilde{\mathbb{N}} to X.

9. Uniform Continuity

Here we explore increasing strengths of continuity: Lipschitz continuity > uniform continuity > continuity. Ó Searcóid also adds strong contractions into this hierarchy, as the strongest class studied.

Uniform continuity requires the \delta in the epsilon-delta definition of continuity to extend across a whole set. Consider metric spaces (X,d) and (Y,e), a function f : X \to Y, and a metric subspace S \subseteq X. The function f is uniformly continuous on S iff for every \epsilon \in \mathbb{R}^+ there exists a \delta \in \mathbb{R}^+ s.t. for every x, z \in S for which d(z,x) < \delta, it holds that e( f(z), f(x) ) < \epsilon.

If (X,d) is a metric space with the nearest-point property and f is continuous, then f is also uniformly continuous on every bounded subset of X. A good example might be a polynomial on \mathbb{R}.

Uniformly continuous functions map compact metric spaces into compact metric spaces. They preserve total boundedness and Cauchy sequences. This isn’t necessarily true for continuous functions, e.g. x \mapsto 1/x on (0,1] does not preserve the Cauchy property of the sequence (1/n).

There is a remarkable relationship between the Cantor Set and uniform continuity. Consider a nonempty metric space (X,d). Then X is totally bounded iff there exists a bijective uniformly continuous function from a subset of the Cantor Set to X. As Ó Searcóid notes, this means that totally bounded metric spaces are quite small, in the sense that none can have cardinality greater than that of the reals.

Consider metric spaces (X,d) and (Y,e) and function f: X \to Y. The function is called Lipschitz with Lipschitz constant k \in \mathbb{R}^+ iff e( f(a), f(b) ) \leq k d(a,b) for all a, b \in X.

Note here the difference to uniform continuity: Lipschitz continuity restricts uniform continuity by describing a relationship that must exist between the \epsilons and \deltas – uniform leaves this open. A nice example from Ó Searcóid of a uniformly continuous non-Lipschitz function is x \mapsto \sqrt{1 - x^2} on [0,1).

Lipschitz functions preserve boundedness, and the Lipschitz property is preserved by function composition.

There is a relationship between Lipschitz functions on the reals and their differentials. Let I be a non-degenerate intervals of \mathbb{R} and f: I \to \mathbb{R}. Then f is Lipschitz on I iff f' is bounded on I.

A function with Lipschitz constant less than one is called a strong contraction.

Unlike the case for continuity, not every product metric gives rise to uniformly continuous natural projections, but this does hold for conserving metrics.

10. Completeness

Let (X,d) be a metric space and u : X \to \mathbb{R}. The function u is called a virtual point iff:

  • u(a) - u(b) \leq d(a,b) \leq u(a) + u(b) for all a,b \in X
  • \text{inf} \; u(X) = 0
  • 0 \notin u(X)

We saw earlier that a metric space X is complete iff it is closed in every metric superspace of X. There are a number of equivalent characterisations, including that every Cauchy sequence in X converses in X.

Consider a metric space (X,d). A subset of S \subseteq X is a complete subset of X iff (S,d) is a complete metric space.

If X is a complete metric space and S \subseteq X, then S is complete iff S is closed in X.

Conserving metrics ensure that finite products of complete metric spaces are complete.

A non-empty metric space (X,d) is complete iff (\mathcal{S},h) is complete, where \mathcal{S}(X) denotes the collection of all non-empty closed bounded subsets of X and h denotes the Hausdorff metric.

For X a non-empty set and (Y,e) a metric space, the metric space B(X,Y) of bounded functions from X to Y with the supremum metric is a complete metric space iff Y is complete. An example is that the space of bounded sequences in \mathbb{R} is complete due to completeness of \mathbb{R}.

We can extend uniformly continuous functions from dense subsets to complete spaces to unique uniformly continuous functions from the whole: Consider metric spaces (X,d) and (Y,e) with the latter being complete. Let S \subseteq X be a dense subset of X and f : S \to Y be a uniformly continuous function. Then there exists a uniformly continuous function \tilde{f} : X \to Y such that \tilde{f}|_S = f. There are no other continuous extensions of f to X.

(Banach’s Fixed-Point Theorem). Let (X,d) be a non-empty complete metric space and f : X \to X be a strong contraction on X with Lipschitz constant k \in (0,1). Then f has a unique fixed point in X and, for each w \in X, the sequence (f^n(w)) converges to the fixed point. Beautiful examples of this abound, of course. Ó Searcóid discusses IFS fractals – computer scientists will be familiar with applications in the semantics of programming languages.

A metric space (Y,e) is called a completion of metric space (X,d) iff (Y,e) is complete and (X,d) is isometric to a dense subspace of (Y,e).

We can complete any metric space. Let (X,d) be a metric space. Define \tilde{X} = \delta(X) \cup \text{vp}(X) where \delta(X) denotes the set of all point functions in X and \text{vp}(X) denotes the set of all virtual points in X. We can endow \tilde{X} with the metric s given by (u,v) \mapsto \sup\{ |u(x) - v(x)| | x \in X \}. Then \tilde{X} is a completion of X.

Here the subspace (\delta(X),s) of (\tilde{X},s) forms the subspace isometric to (X,d).

11. Connectedness

A metric space X is a connected metric space iff X cannot be expressed as the union of two disjoint nonempty open subsets of itself. An example is \mathbb{R} with its usual metric. As usual, Ó Searcóid gives a number of equivalent criteria:

  • Every proper nonempty subset of X has nonempty boundary in X
  • No proper nonempty subset of X is both open and closed in X
  • X is not the union of two disjoint nonempty closed subsets of itself
  • Either X = \emptyset or the only continuous functions from X to the discrete space \{0,1\} are the two constant functions

Connectedness is not a property that is relative to any metric superspace. In particular, if X is a metric space, Z is a metric subspace of X and S \subseteq Z, then the subspace S of Z is a connected metric space iff the subspace S of X is a connected metric space. Moreover, for a connected subspace X of X with S \subseteq A \subseteq \bar{S}, the subspace A is connected. In particular, \bar{S} itself is connected.

Every continuous image of a connected metric space is connected. In particular, for nonempty S \subseteq \mathbb{R}, S is connected iff S is an interval. This is a generalisation of the Intermediate Value Theorem (to see this, consider the continuous functions f : X \to \mathbb{R}).

Finite products of connected subsets endowed with a product metric are connected. Unions of chained collections (i.e. sequences of subsets whose sequence neighbours are non-disjoint) of connected subsets are themselves connected.

A connected component U of a metric space X is a subset that is connected and which has no proper superset that is also connected – a kind of maximal connected subset. It turns out that the connected components of a metric space X are mutually disjoint, all closed in X, and X is the union of its connected components.

A path in metric space X is a continuous function f : [0, 1] \to X. (These functions turn out to be uniformly continuous.) This definition allows us to consider a stronger notion of connectedness: a metric space X is pathwise connected iff for each a, b \in X there is a path in X with endpoints a and b. An example given by Ó Searcóid of a space that is connected but not pathwise connected is the closure in \mathbb{R}^2 of \Gamma = \{ (x, \sin (1/x) | x \in \mathbb{R}^+ \}. From one of the results above, \bar{\Gamma} is connected because \Gamma is connected. But there is no path from, say, (0,0) (which nevertheless is in \bar{\Gamma}) to any point in \Gamma.

Every continuous image of a pathwise connected metric space is itself pathwise connected.

For a linear space, an even stronger notion of connectedness is polygonal connectedness. For a linear space X with subset S and a, b \in S, a polygonal connection from a to b in X is an n-tuple of points (c_1, \ldots c_n) s.t. c_1 = a, c_n = b and for each i \in \{1, 2, \ldots, n-1\}, \{(1 - t)c_i + t c_{i+1} | t \in [0,1] \} \subseteq S. We then say a space is polygonally connected iff there exists a polygonal connection between every two points in the space. Ó Searcóid gives the example of \{ z \in \mathbb{C} | \; |z|= 1 \} as a pathwise connected but not polygonally connected subset of \mathbb{C}.

Although in general these three notions of connectedness are distinct, they coincide for open connected subsets of normed linear spaces.

12. Compactness

Ó Searcóid gives a number of equivalent characterisations of compact non-empty metric spaces X, some of the ones I found most interesting and useful for the following material include:

  • Every open cover for X has a finite subcover
  • X is complete and totally bounded
  • X is a continuous image of the Cantor set
  • Every real continuous function defined on X is bounded and attains its bounds

The example is given of closed bounded intervals of \mathbb{R} as archetypal compact sets. An interesting observation is given that ‘most’ metric spaces cannot be extended to compact metric spaces, simply because there aren’t many compact metric spaces — as noted above in the section on bounds, there are certainly no more than |\mathbb{R}|, given they’re all images of the Cantor set.

If X is a compact metric space and S \subseteq X then S is compact iff S is closed in X. This follows because S inherits total boundedness from X, and completeness follows also if S is closed.

The Inverse Function Theorem states that for X and Y metric spaces with X compact, and for f : X \to Y injective and continuous, f^{-1}: f(X) \to X is uniformly continuous.

Compactness plays well with intersections, finite unions, and finite products endowed with a product metric. The latter is interesting, given that we noted above that for non conserving product metrics, total boundedness doesn’t necessarily carry forward.

Things get trickier when dealing with infinite-dimension spaces. The following statement of the Arzelà-Ascoli Theorem is given, which allows us to characterise the compactness of a closed, bounded subset of \mathcal{C}(X,Y) for compact metric spaces X and Y:

For each x \in X, define \hat{x}: S \to Y by \hat{x}(f) = f(x) for each f \in S. Let \hat{X} = \{\hat{x} | x \in X \}. Then:

  • \hat{X} \subseteq B(S,Y) and
  • S is compact iff x \to \hat{x} from X to B(S,Y) is continuous

13. Equivalence

Consider a set X and the various metrics we can equip it with. We can define a partial order \succeq on these metrics in the following way. d is topologically stronger than e, d \succeq e iff every open subset of (X,e) is open in (X,d). We then get an induced notion of topological equivalence of two metrics, when d \succeq e and e \succeq d.

As well as obviously admitting the same open subsets, topologically equivalent metrics admit the same closed subsets, dense subsets, compact subsets, connected subsets, convergent sequences, limits, and continuous functions to/from that set.

It turns out that two metrics are topologically equivalent iff the identity functions from (X,d) to (X,e) and vice versa are both continuous. Following the discussion above relating to continuity, this hints at potentially stronger notions of comparability – and hence of equivalence – of metrics, which indeed exist. In particular d is uniformly stronger than e iff the identify function from (X,d) to (X,e) is uniformly continuous. Also, d is Lipschitz stronger than e iff the identity function from (X,d) to (X,e) is Lipschitz.

The stronger notion of a uniformly equivalent metric is important because these metrics additionally admit the same Cauchy sequences, totally bounded subsets and complete subsets.

Lipschitz equivalence is even stronger, additionally providing the same bounded subsets and subsets with the nearest-point property.

The various notions of equivalence discussed here collapse to a single one when dealing with norms. For a linear space X, two norms on X are topologically equivalent iff they are Lipschitz equivalent, so we can just refer to norms as being equivalent. All norms on finite-dimensional linear spaces are equivalent.

Finally, some notes on the more general idea of equivalent metric spaces (rather than equivalent metrics.) Again, these are provided in three flavours:

  • topologically equivalent metric spaces (X,d) and (Y,e) are those for which there exists a continuous bijection with continuous inverse (a homeomorphism) from X to Y.
  • for uniformly equivalent metric spaces, we strengthen the requirement to uniform continuity
  • for Lipschitz equivalent metric spaces, we strengthen the requirement to Lipschitz continuity
  • strongest of all, isometries are discussed above

Note that given the definitions above, the metric space (X,d) is equivalent to the metric space (X,e) if d and e are equivalent, but the converse is not necessarily true. For equivalent metric spaces, we require existence of a function — for equivalent metrics this is required to be the identity.

ResearchED on Curriculum

A colleague recently pointed me to the ResearchED Guide to the Curriculum, a volume of essays edited by Clare Sealy. Following the guidance of Lemov and Badillo‘s essay in this volume that ‘reading a book should be a writing-intensive experience’, I’ve written down some of my thoughts after reading this book. They come from the perspective of someone who teaches (albeit in higher education rather than in a school) and is a researcher (but not in education). Of course, my background undoubtedly skews my perspective and limits my awareness of much educational theory.

The context here is that schools in England have been very busy over the last couple of years rethinking their curriculum, not least because the new Ofsted school inspection framework places it centre-stage. So now is a good moment to engage with schools over some of the more tricky questions involved.

I found this collection of essays very thought provoking, and would recommend engaging, whether or not you consider yourself to a fan of the “curriculum revolution” underway in English schools.

Knowledge and Teaching

Many of the contributions relate to a knowledge-based curriculum, but none give a working definition of knowledge. I think it’s useful for educators to reflect on, and leaders to engage with, epistemology at some level. When ideas like a “knowledge-based curriculum” start to be prioritised, then we need to understand what various meanings this might have, and precisely to what other ideas these terms may be being used in opposition. Of course this becomes even more important when politics enters the picture: I find it hard to envisage a definition of a knowledge-based curriculum that is broad enough to encompass both Michael Gove’s approach to knowledge in history and Michael F.D. Young‘s principles espoused in this book. A central problem in the theory of knowledge, of course, is the theory of truth; it’s interesting that Ashbee‘s essay in the same volume riffs on this theory when asking whether it is right that students should learn something false (the presence of electron shells in an atom) in order to facilitate understanding later on. Again, I think this could do with a more sophisticated analysis of truth and falsity here – it is by no means universally accepted that the presence of electron shells can be said to be ‘false’, and I do think the philosophical standpoint on such questions has implications for curriculum design – especially in the sciences.

The same holds for teaching. The role of teaching in imparting knowledge needs to be fully explored. Even if we accept the premise that, to quote Young, ‘schools in a democracy should all be working towards access to powerful knowledge for all their pupils’ (and also leave to one side the definition of a democracy) this leaves open the question of the role of the teacher in providing that access. At one extreme seems to lie the Gradgrindian approach best summarised by Dickens in Hard Times of students as ‘little vessels then and there arranged in order, ready to have imperial gallons of facts poured into them until them were full to the brim’, at the other an unschooling approach. But both can legitimately claim to be pursuing this aim. In the middle of these extremes, the teacher’s role in setting up experiences, and in developing understanding for example through Adey and Shayer’s concept of ‘cognitive conflict’ could explored more deeply.

It’s interesting that in his essay in this book, Young – described by Sealy as one of the ‘godfathers’ of the knowledge-based curriculum – has plenty to say about problematic ways this concept has been interpreted, in particular that “a school adopting a knowledge-led curriculum can spend too much time on testing whether students have memorised the knowledge of previous years“, and that “a focus on memorisation does not necessarily encourage students to develop a ‘relationship to knowledge’ that leads to new questions.” These concerns echo my own fears, and I see the latter also arise in higher education as students make the leap between undergraduate and postgraduate work.

My own teaching as an academic has spanned the full range from largely chalk-and-talk unidirectional presentations to undergraduate students to fairly laissez-faire mentoring of PhD students through their own discovery of the background material required for their research. It’s interesting to reflect on the different level of resourcing required to follow these models, a topic Young and Aurora both pick up in their essays: the need for a curriculum model that incorporates how teachers might engage with external constraints (resource, externally imposed exam syllabuses, etc.) in the short-term, even as we work towards a better long-term future for our students.

Memorisation is mentioned by several authors, and of course can be important, but – as Young says – it’s also important that students come to view it as “a step to acquiring new knowledge“, not as acquiring new knowledge. So my question to schools is this: how is that desirable outcome student perception reflected in your curriculum? How does your curriculum help develop that view by students?

My concerns over some of the ‘knowledge-based’ or ‘knowledge-led’ work in schools in recent years is broadly in line with Young’s view in this volume that teaching viewed as the transmission of knowledge excludes the process by which students develop a relationship with knowledge (my emphasis). I was also pleased by Young’s assertion that schools should treat subjects not just as bodies of knowledge but as communities of teachers and researchers and pupils as neophyte members of such communities. To me, this is wholly consistent with the exciting ideas behind Claxton’s Building Learning Power framework I reviewed some years ago here.

What Do We Want Students to Be Able to Do?

In addition to more traditional answers, Ashbee suggests some that should make schools pause for thought. For example, she suggests that while others may have chosen the curriculum content, students should be equipped to critically evaluate the inclusion of the knowledge in the curriculum they have studied and to ask what else could have been included. I like this idea: a key question for schools, though, is where do our curricula equip students for this task?

One aspect largely absent from this volume is a critical discussion of assessment, including testing, and its role in the curriculum, both in obvious terms of shaping the curriculum in the long- and short-term, and the – perhaps less obvious – nature of forms of assessments in themselves driving students’ behaviour and understanding of the nature of learning.

Planning Lessons

In an essay on Curriculum Coherence, Neil Almond discusses the role of sequencing in a subject curriculum. Almond uses the analogy of a box set, contrasting The Simpsons (minimal ordering required) to Game of Thrones (significant ordering required.)

Three aspects of the timing of lessons are, I think, missing in this discussion and deserve more explicit consideration.

Firstly, any necessary order of discussion of topics is rarely a total (linear) order, to put it mathematically. It’s maybe not even a partial order. Explicit dependencies between topic areas have been explored in depth by Cambridge Mathematics, who have built a directed graph representation of a curriculum. The lack of totality of the order provides significant freedom in practice; it is less clear what best practice might be, as a department or teacher, of how to take advantage of this freedom. It’s also less clear when to take advantage of this freedom: should this be a department-level once-and-for-all decision, one delegated to teachers to determine on the fly, or something in between? And why?

Secondly, even once this freedom has been exploited by mapping the dependence structure of concepts into a total order, there still remains the question of mapping from that order into time. Again, there is flexibility: should this unit take one week or two, or a whole term, and – importantly – curricula need to consider who should exercise this flexibility, when and why. Within mathematics, this has come under a lot of scrutiny in recent years, through discussions around the various definitions of mastery.

Finally, one aspect that the box set analogy obscures is the extent to which the sequencing of lessons is to be co-created with the students. Simpsons and Game of Thrones writers don’t have the option to co-create sequencing on the fly with their audience – schools and universities do. To what extent should this freedom be utilised, by whom, when, and to what end?

Linking Universities with Schools and Secondaries with Primaries

Ashbee discusses the very interesting question of how school curricula can engage with the mechanisms for knowledge generation in the broader discipline. For example, experiment, peer review, art exhibitions, all help reflect the norms of the discipline’s cutting edge back into the school curriculum. This is why I was sad, recently, to see Ofqual consulting on the removal of student-conducted experiments from 2021 GCSE science examinations, to give teachers time to cram in more facts: ‘science’ without experiment is not science.

Since one of the key venues for knowledge generation is the academy, increasing interaction between schools and universities should be very productive at the moment with increased school thinking about curriculum fundamentals. I am pleased to have played a very small part in the engagement of my university, both through my own outreach and through discussions leading up to our recent announcement of the opening of a new Imperial Maths School. More of all this, please, universities!

The theme of linking phases of education also appears in Andrew Percival‘s case study of primary curriculum development, where he emphasises the benefit his primary school obtained through teachers joining subject associations, e.g. in Design and Technology, making links with secondary specialists, and introducing self-directed study time for primary teaching staff to develop their subject knowledge. Those of us in all sectors should seek out links through joint professional associations.

Lessons for Leaders

Christine Counsell‘s essay tackles the topic of how school senior leadership teams (SLTs) should engage with developing and monitoring their departments under a knowledge-based curriculum. The main issue here is in the secondary sector, as SLTs will not include all subject specialisms. My colleague probably had this essay in mind when pointing out this edited volume, as many of the lessons and ideas here apply equally well to school governors engaging with subject leaders. I would agree with this. But actually, I would go further and say that many of Counsell’s suggestions for SLTs actually echo previous best practice in governance from before the new national curriculum. In meetings between subject leaders and school governors, governors have always played the role of knowledgable outsider, whose aim is to guide a subject leader in conversation to reflect on their role in the development of the teaching of their subject and its norms. It’s quite interesting to see this convergence. I was also struck by Counsell’s insistence on the importance of discussing curriculum content with middle leaders rather than relying on proxies such as attainment results alone, which can actually act to conceal the curriculum; in my day job this mirrors the importance we try to give in staff appraisal to discussing the research discoveries of academic staff, not focusing on the number or venue of publications. I think many of the arguments made are transferrable between schools and universities. Counsell also identifies positive and negative roles played by SLTs in developing a positive culture in middle leadership, and hence provides some useful material around which governance questions can be posed to probe SLT themselves.

When are Digits Correct?

Often, we compute with iterative algorithms. Start with some value, often an initial guess to be refined, and keep iterating until some stopping criterion is met. If we actually think about what goes on in a modern digital computer when we execute these algorithms, we soon see that – often – the same digits end up being computed time and again. As we converge to a value, it’s reasonable to expect that most of the time the most significant digits become stable. But we still compute them, time and again at each iteration, wasting computational resource.

In general, in standard binary representations, this re-computation may not be avoidable – most-significant digits might be stable for 1000 iterations and then flip, e.g. from 0.99999 to 1.00000. As a child, I used to play with such iterations using my HP32S calculator – a gift from fred harris – it provided endless entertainment.

There is, however, a class of number representations in which these digit flips can be avoided: redundant number representations. These representations have a long history – indeed, as my friend and colleague Miloš Ercegovac has identified, they can be traced back as far as a 1727 publication in Phil. Trans. Royal Soc by John Colson FRS. Miloš developed these ideas to a mature state between the 1970s and today, in the guise of Online Arithmetic.

Together with my PhD students He Li (now research staff at Cambridge) and Ian McInerney and collaborator James Davis, I have recently done some work on methods to detect and establish exactly when digits become stable using such schemes and what the implications might be for hardware that makes use of this stability. In our recent IEEE Transactions on Computers paper, we adapt standard forward error analyses of stationary iterative methods to this setting. We mathematically derive some conditions that can be checked at run-time to determine when you don’t need to compute certain digits in any future iteration, and also present a toy hardware implementation taking advantage of this approach using a non-standard arithmetic processor design.

We hope that – in the future – only what needs to be computed will be computed.

Award of GCSEs and A-Levels in 2020

Readers of my blog based in England may know that due to COVID-19, GCSEs (typically taken at age 16) and A-Levels (age 18) are not going ahead as exams this year. Yesterday, the Office for Qualifications and Examinations Regulation (Ofqual) published a consultation on the methods to be used to ensure fairness in the award of these important qualifications. I intend to respond to this consultation, which is only open for two weeks, and have produced a draft response below. Before I submit it, I would welcome any feedback. Equally, others should feel free to borrow from my response if it helps them.

Centre Assessment Grades

To what extent do you agree or disagree that we should incorporate the requirement for exam boards to collect information from centres on centre assessment grades and their student rank order, in line with our published information document, into our exceptional regulatory requirements for this year?

Agree

To what extent do you agree or disagree that exam boards should only accept centre assessment grades and student rank orders from a centre when the Head of Centre or their nominated deputy has made a declaration as to their accuracy and integrity?

Strongly Agree

To what extent do you agree or disagree that Heads of Centre should not need to make a specific declaration in relation to Equalities Law?

Disagree

To what extent do you agree or disagree that students in year 10 and below who had been entered to complete exams this summer should be issued results on the same basis as students in year 11 and above?

Strongly Agree

To what extent do you agree or disagree that inappropriate disclosure of centre assessment judgements or rank order information should be investigated by exam boards as potential malpractice?

Neither Agree not Disagree

Do you have any comments about our proposals for centre assessment grades?

  1. While a separate Equalities Law declaration is not necessary, the Head of Centre should be able to declare that they have taken equality law into consideration as part of their declaration.
  2. Ofqual should liaise with the National Governance Association and with teaching unions to provide guidance to governing bodies and staff on appropriate challenge and support to schools in order to ensure processes underlying Head of Centre declaration are appropriately evidenced.
  3. While I understand and support the motivation for labelling inappropriate disclosure of centre assessments as malpractice, care must be taken and guidance given to centres over what is deemed “inappropriate”. I would not want to be in the situation where a teacher is unable to calm a student in a way they normally would, for example by telling them that “I can’t see any way you won’t get a Grade 7”. There may be an equalities implication for those students suffering from extreme anxiety, and this should be considered when drawing up guidance for centres.
  4. While I accept that there is little time to provide detailed guidance for centres to follow when drawing up rank-order lists, the publication of examples of good practice may help centres, and I would recommend this is considered.

Issuing Results

To what extent do you agree or disagree that we should incorporate into the regulatory framework a requirement for all exam boards to issue results in the same way this summer, in accordance with the approach we will finalise after this consultation, and not by any other means?

Strongly Agree

Do you have any comments about our proposal for the issuing of results?

None

Impact on Students

To what extent do you agree or disagree that we should only allow exam boards to issue results for private candidates for whom a Head of Centre considers that centre assessment grades and a place in a rank order can properly be submitted?

Agree

To what extent do you agree or disagree that the arrangements we put in place to secure the issue of results this summer should extend to students in the rest of the UK?

Strongly agree

To what extent do you agree or disagree that the arrangements we put in place to secure the issue of results this summer should extend to all students, wherever they are taking the qualifications?

Neither agree nor disagree

Do you have any comments about the impact of our proposals on any particular groups of students?

  1. Unfortunately, I see no other option than that proposed for private candidates. However, I am concerned that the definition of “properly” in the criterion given is made much more explicit and in objective terms to the heads of centres.
  2. I suggest legal advice is sought over the enforceability of arrangements within centres outside the UK, in particular over the implications of breach of a head of centre’s declaration before proceeding with treating them the same as those within the UK.
  3. I am concerned over the impact of the proposed arrangements for some groups of students who may be differentially affected by the change in routine due to lockdown, e.g. those with Autistic Spectrum Conditions (ASC). In order to be as fair as possible to these students, I suggest that explicit guidance be given to centres emphasising that centres are free to disregard any dip in attainment since lockdown when coming up with their rank-order list, and again emphasising their duties under equalities legislation.

Statistical standardisation of centre assessment grades

To what extent do you agree or disagree with the aims outlined above?

Agree

To what extent do you agree or disagree that using an approach to statistical standardisation which emphasises historical evidence of centre performance given the prior attainment of students is likely to be fairest for all students?

Agree

To what extent do you agree or disagree that the trajectory of centres’ results should NOT be included in the statistical standardisation process?

Agree

To what extent do you agree or disagree that the individual rank orders provided by centres should NOT be modified to account for bias regarding different students according to their particular protected characteristics or their socio-economic backgrounds?

Agree

To what extent do you agree or disagree that we should incorporate the standardisation approach into our regulatory framework?

Agree

Do you have any comments about our proposals for the statistical standardisation of centre assessment grades?

  1. I am unclear from the consultation on whether standardisation is to occur on an exam-board basis or across exam boards. If it is on an exam-board basis, it is not clear what will happen when centres have changed exam board over the time window used to judge prior grade distribution at the school, especially if the change is for the first time this year.
  2. I have several statistical concerns over the proposed methodology, given the level of detail discussed so far. In particular,
    (i) there is a recognition that small centres or small cohorts will be difficult to deal with – this is a significant issue, and may be exacerbated depending on the definition of cohort (see #3, below), leading to significant statistical uncertainty;
    (ii) it is hugely important to avoid 2020 results being affected by outlier results in previous years. One possibility is to use median results from the previous three years – I would avoid using mean results or a single year’s results.
    Given these concerns, my view is that it would be more appropriate to award a “grade range” to students (e.g. “9-7”, which may of course include degenerate ranges like just “7”). This allows statistical uncertainty arising from the various measures integrated into the standardisation algorithm to be explicitly quantified and provide a transparent per-pupil result. It would allow universities and sixth-forms to decide for themselves whether to admit a pupil optimistically, pessimistically or on the basis of the interval midpoint.
  3. It is unclear from the consultation whether the past grade distributions used will be on a per-subject basis. If not, this is likely to violate proposed Aim 1 of the standardisation process. However, if so, this is likely to result in some very small cohorts for optional subjects at particular centres, so extreme statistical care must be taken in using these cohorts as the basis for grading in 2020. A possible solution us to produce grade ranges, as above.
  4. From a statistical perspective, estimation of grade distributions at a per-centre level (rather than estimation of mean grade, for example) is fraught with danger and highly sensitive to cohort size. It is very important that you do not consider the empirical frequency distribution of grades in a centre over the last 1,2 or 3 years as the underlying probability distribution but rather as a sample from the latter, using an appropriate statistical method to estimate the distribution from the sample. Such methods would also allow the incorporation of notions of variance, which could be factored into the “grade ranges” for students, explained in #2. As an extreme example: if a centre had no Grade 6’s last year, only 5’s and 7’s, we should not bias our model to no Grade 6’s this year, surely.
  5. There is an additional option for standardisation, not considered in the consultation document, which is less subject to the statistical problems of distribution estimation. You could extract just one or two parameters from your model (e.g. desired mean, desired standard deviation) and use these to normalise the distribution from each centre, rather than fit the complete distributions. Such aggregate statistics will be less susceptible to variation, especially for smaller cohorts.
  6. I am unclear how it is possible to award grades to students at centres without any historical outcomes and with no prior attainment data or prior attainment data covering a statistically-insignificant portion of the cohort. For these centres, some form of moderation or relying on Autumn term exam results may be required.
  7. I am concerned by the statement in the consultation that “we will evaluate the optimal span of historical centre outcomes (one, 2 or 3 years). We will select the approach that is likely to be the most accurate in standardising students’ grades.” There is no discussion of how “most accurate” can be judged; there is no data upon which to make this decision, so I would urge caution and an outlier-rejection strategy (see #2 above).
  8. While I broadly agree that there is insufficient data upon which to base rank-order modification based on protected characteristics or socio-economic backgrounds, of the three approaches discussed in the consultation document, the “second approach” is currently very vague and needs further refinement before I can offer an opinion on it. I am happy to be contacted for further comment on this in the future.
  9. I am concerned by the absence of a mechanism to flag unusual rank order differences between subjects in a centre. It should be possible to identify, for example, pupils ranked very high in Subject A and very low in Subject B compared to the typical centile differences in rankings between these subjects, for further investigation by the exam boards. The sensitivity of such a test could be set an appropriate level to the amount of staff time available to investigate.

 

Appealing the results

To what extent do you agree or disagree that we should not provide for a review or appeals process premised on scrutiny of the professional judgements on which a centre’s assessment grades are determined?

Agree

To what extent do you agree or disagree that we should not provide for a student to challenge their position in a centre’s rank order?

Agree

To what extent do you agree or disagree that we should not provide for an appeal in respect of the process or procedure used by a centre?

Strongly disagree

To what extent do you agree or disagree that we should provide for a centre to appeal to an exam board on the grounds that the exam board used the wrong data when calculating a grade, and/or incorrectly allocated or communicated the grades calculated?

Strongly Agree

To what extent do you agree or disagree that for results issued this summer, exam boards should only consider appeals submitted by centres and not those submitted by individual students?

Strongly disagree

To what extent do you agree or disagree that we should not require an exam board to ensure consent has been obtained from all students who might be affected by the outcome of an appeal before that appeal is considered?

Agree

To what extent do you agree or disagree that exam boards should not put down grades of other students as a result of an appeal submitted on behalf of another student?

Strongly agree

To what extent do you agree or disagree that exam boards should be permitted to ask persons who were involved in the calculation of results to be involved in the evaluation of appeals in relation to those results?

Disagree

To what extent do you agree or disagree that exam boards should be able to run a simplified appeals process?

Neither agree nor disagree

To what extent do you agree or disagree that we should not provide for appeals in respect of the operation or outcome of the statistical standardisation model?

Strongly agree

To what extent do you agree or disagree with our proposal to make the Exam Procedures Review Service (EPRS) available to centres for results issued this summer?

Strongly agree

Do you have any comments about our proposals for appealing results?

  1. I disagree with the absence of an appeal procedure against centre procedure. While recognising the difficulties faced by centres and the exceptional circumstances, there is an element of natural justice that must be maintained. Without such an appeal process, there is no safeguard against centres using completely inappropriate mechanisms to derive grade and rank orders, beyond the signed statement from the head of centre. While the consultation suggests that detailed guidance will not be sent to centres on the procedures they should follow, it is reasonable to expect a centre – if challenged by a sufficient number of candidates – to explain the procedure they did follow, and for an appeal body to find this to be reasonable or unreasonable in the circumstances. The outcome of any successful appeal may have to be the cancelling of all grades in a certain subject at a certain centre, requiring a fall-back to the Autumn 2020 exams, but the mere existence of such a mechanism may help focus centres on ensuring justifiable procedures are in place.
  2. The consultation document leaves open the question of what role staff of exam boards who were involved in the calculation of results would have in appeals. It appears proper for them to be involved in providing evidence to an independent appeals committee, but not to form such a committee.

 

An Autumn exam series

To what extent do you agree or disagree that entries to the autumn series should be limited to those who were entered for the summer series, or those who the exam board believes have made a compelling case about their intention to have entered for the summer series (as well as to students who would normally be permitted to take GCSEs in English language and mathematics in November)?

Agree

To which qualifications the emergency regulations will apply

To what extent do you agree or disagree that we should apply the same provisions as GCSE, AS and A level qualifications to all Extended Project Qualifications and to the Advanced Extension Award qualification?

Strongly agree

Do you have any comments about the qualifications to which the exceptional regulatory measures will apply?

None

Building the arrangements into our regulatory framework

To what extent do you agree or disagree that we should confirm that exam boards will not be permitted to offer opportunities for students to take exams in May and June 2020?

Disagree

To what extent do you agree or disagree with our proposals that exam boards will not be permitted to offer exams for the AEA qualification or to moderate Extended Project Qualifications this summer?

Disagree

Do you have any comments about our proposals for building our arrangements into our regulatory framework?

I have sympathy with the proposals in this section, but they need to be balanced against the harm done to those candidates who will be unable to use centre-based assessments and against Ofqual’s duties under the Equalities legislation, given that this may disproportionately affect disabled students (see pp.51-52 of the consultation document.) On balance, it may be better to leave this as an operational decision between exam boards and exam centres to allow exams in May and June, if possible, only for these students.

 

Equality impact assessment

Are there other potential equality impacts that we have not explored? What are they?

As previously noted, I am concerned over the impact of the proposed arrangements for some groups of students who may be differentially affected by the change in routine due to lockdown, e.g. those with Autistic Spectrum Conditions (ASC). In order to be as fair as possible to these students, I suggest that explicit guidance be given to centres emphasising that centres are free to disregard any dip in attainment since lockdown when coming up with their rank-order list, and again emphasising their duties under equalities legislation.

We would welcome your views on how any potential negative impacts on particular groups of students could be mitigated:

If Ofqual were to adopt a “grade range” approach, outlined above, then the quoted research into the reliability of predicted GCSE, AS and A-levels prior to 2015 could be used to inform the degree of uncertainty in the range, mitigating the impact on particular groups of students.

Learning in Vienna

IMAG1453

I was delighted to be asked by Axel Jantsch to speak recently at the opening of the new Christian Doppler Laboratory (CDL) for Embedded Machine Learning. This is a new collaborative laboratory between TU Wien (led by Jantsch) and TU Graz (Bischof) as well as several strong industrial partners from Germany and Austria. Its scope and content is very closely aligned to the EPSRC Center for Spatial Computational Learning, which I launched last November, the latter bringing together Imperial and Southampton in the UK with Toronto in Canada, UCLA in the USA, and – just announced – Sydney in Australia. I was therefore delighted to bring fraternal greetings from our centre, and to begin discussions over how we could work together in the future to build a truly global collaborative research effort in this space.

In addition to hearing about the work plan and objectives for the new CDL, the meeting heard from three external academics (Christoph Lampert, Bernt Schiele and myself) on their recent relevant research. I spoke about the work I recently published in my article in the Philosophical Transactions of the Royal Society. I found both Christoph and Bernt’s talks inspiring – I briefly summarise the key points of interest to me, below.

Christoph Lampert from IST spoke on Embedded and Adaptive Models for Visual Scene Understanding. He explained that while others are working on more efficient hardware and more efficient ML models, his group is focusing on matching models to problems and making models adaptive to the environment in which they’re applied. With this in mind, he focused on two recent papers, one from WACV 2020 and one from ICCV 2019. The WACV paper is on object detection, proposing compute-efficient method that avoids the twin pitfalls of proposal-based systems like Faster R-CNN and grid-based methods like YOLO. The ICCV paper is on multi-exit architectures, where latency constraints can cause an early termination of deep neural network inference computation and we want to have a useful result still in these circumstances. The paper discusses how to train such networks using ideas from Hinton’s et al.’s “Knowledge Distillation”. This also reminded me of some work from our research group [1,2] featuring early exits or a focus on getting good results quickly in latency-constrained systems.

Bernt Schiele from MPI and Saarland University spoke on Bright and Dark Sides of Computer Vision and Machine Learning. He spoke about several very interesting topics, including scene context (e.g. ‘Not Using the Car to See the Sidewalk‘, CVPR 2019), Disentangling Adversarial Robustness and Generalization (CVPR 2019) and reverse engineering and stealing deep models (e.g. ‘Knock-off Nets’, CVPR 2019). All are very interesting and timely topics, but the work on disentangling adversarial robustness and generalisation was particularly interesting to me, since I’ve given the topic of generalisation some thought in the context of efficient DNN accelerator hardware. Schiele argued for a stronger distinction to be made in the research community between different classes of adversarial examples — he focused on the idea of “on-manifold adversarial examples” where an adversarial example needs to actually be a correct instance of the class, rather than an arbitrary perturbation of an image — the latter commonly used in the literature, which Schiele referred to as a ‘regular’ adversarial example. His talk showed how on-manifold examples could be studied. The main take-home messages were that ‘regular’ robustness and generalisation are not contradictory, but that ‘on-manifold’ adversarial examples can be found, and such robustness in this instance is generalisation.

Highlights from FPGA 2020

John's Blog

Here are a few personal highlights from the FPGA 2020 conference, which took place this week in Seaside, California. (Main photo credit: George Constantinides.)

Jakub Szefer‘s invited talk on “Thermal and Voltage Side Channels and Attacks in Cloud FPGAs” described a rather nifty side-channel through which secrets could be leaked in the context of cloud-based FPGAs. The context is that one user of an FPGA would like to transmit information secretly to the next user of the same FPGA. Jakub suggests transmitting this information via the physical temperature of the FPGA. His proposal is that if the first user would like to transmit a ‘1’, they heat up the FPGA by running a circuit that uses a lot of current, such as a ring oscillator; if they would like to transmit a ‘0’, they don’t. The second user, once they arrive, measures the temperature of…

View original post 1,412 more words

When to Schedule?

On Tuesday, Jianyi Cheng will present our recent work Combining Dynamic and Static Scheduling in High-level Synthesis at the ACM International Symposium on FPGAs in Monterey. This is joint work between Jianyi and his two supervisors, John Wickerson and myself, as well as our collaborators from EPFL, Lana Josipović and her PhD supervisor Paolo Ienne.

As I’ve described in previous blog posts [1,2], Lana has been doing interesting work over the last few years, developing a tool flow for dynamically-scheduled high-level synthesis (HLS). Typically in modern HLS tools like VivadoHLS or LegUp, scheduling decisions are made statically – at compile time. However, Lana’s tool flow makes these decisions dynamically, at run time, using handshaking circuitry, reminiscent of Page and Luk’s work compiling occam into FPGAs.

In our paper, we have teamed up with EPFL to build a flow that can result in the best of both worlds. Static scheduling can be very efficient, sharing resources and leading to low area designs. Dynamic scheduling can be very fast, working around actual rather than potential data dependencies. Jianyi’s paper allows the definition of statically scheduled functions within a dynamically scheduled program. He shows that over a range of benchmarks, the results are about half the area of the fully dynamically-scheduled designs while about 1.7x faster than the fully statically-scheduled designs.

Screenshot 2020-02-21 at 11.07.12

Arithmetic for Neural Networks

Last month, the Royal Society Phil Trans A published my paper Rethinking Arithmetic for Deep Neural Networks, part of a special issue put together by Jack Dongarra, Laura Grigori and Nick Higham. In this blog post, I aim to briefly introduce the ideas in the paper. The paper is open access, so please read it for further detail. In addition, my slides from a talk given on an early version of this work are available from Nick Higham’s blog, and an mp3 recording of me talking to these slides has been made available by the Royal Society here.

The central theme of the paper is that hardware accelerator circuits for neural networks can actually be treated as neural networks. Consider the two graphs below. One of them represents a simple deep neural network where each node performs an inner product and a ReLU operation. The other represents the result of (i) deciding to used 4-bit fixed-point arithmetic, and then (ii) synthesising the resulting network into a circuit, technology-mapped to 2-input Boolean gates.

Although there are obvious differences (in structure, in number of nodes) – there is a core commonality: a computation described as a graph operating on parameterisable functional nodes.

So what can we gain from this perspective?

1. Binarized Neural Networks are universal. The paper proves that any network you want to compute can be computed using binarized neural networks with zero loss in accuracy. It’s simply not the case that some problems need high precision. But, as a corollary, it is necessary to not be tied too closely to the original network topology if you want to be guaranteed not to lose accuracy.

2. Boolean topologies are tricky things. So if we want to rethink topologies, what first principles should we use to do so? In the paper, I suggest looking to inspiration from the theory of metric spaces as one step towards producing networks that generalise well. Topology, node functionality, and input / output encoding interact in subtle, interesting, and under-explored ways.

3. This viewpoint pays practical dividends. My PhD student Erwei Wang and collaborators James Davis and Peter Cheung and I have developed a Neural Network flow called LUTNet, which uses Boolean lookup tables as the main computational node in a neural network, leading to very efficient FPGA implementations.

I’m hugely excited by where this work could go, as well as the breadth of the fields it draws on for inspiration. Please do get in touch if you would like to collaborate on any of the open questions in this paper, or any other topic inspired by this work.

 

 

Book Review: Category Theory for the Sciences

I have been reading Spivak’s book ‘Category Theory for the Sciences’ on-and-off for the last 18 months during vacations as some of my ‘holiday reading.’ I’ve wanted to learn something about category theory for a while, mainly because I feel it’s something that looks fun and different enough to my day-to-day activity not to feel like work. I found Spivak’s book on one of my rather-too-regular bookshop trips a couple of years ago, and since it seemed like a more applied introduction, I thought I’d give it a try. This blog post combines my views on the strengths of this book from the perspective of a mathematically-minded engineer, together with a set of notes on the topic which might be useful for me – and maybe other readers – to refer back to in the future.

Screenshot 2019-08-01 at 07.40.30.png
Category Theory by the Pool, Summer 2019

The book has a well-defined structure. The first few chapters cover topics that should be familiar to applied mathematicians and engineers: sets, monoids, graphs, orders, etc., but often presented in an unusual way. This sets us up for the middle section of the book which presents basic category theory, showing how the structures previously discussed can be viewed as special cases within a more general framework. The final chapters of the book discuss some topics like monads that can then be explored using the machinery of this framework.

Set

Chapter 3 covers the category \mathbf{Set} of sets.

One of the definitions here that’s less common in engineering is that of coproduct. The coproduct of sets X and Y, denoted X \sqcup Y is defined as the disjoint union of X and Y. I first came across disjoint union as a concept when doing some earlier holiday reading on topology. We also here get the first experience of the author’s use of informal ‘slogans’ to capture the ideas present in more formal definitions and theorems: “any time behavior is determined by cases, there is a coproduct involved.” I really like this approach – it’s the kind of marginalia I usually write in books myself, but pre-digested by the author!

The notation \text{Hom}_{\mathbf{Set}}(X,Y) is used for the set of functions from X to Y, which we are encouraged to refer to as morphisms, all without really explaining why at this stage – it of course all becomes clear later in the book.

To give a flavour of the kind of exercises provided, we are asked to prove an isomorphism \text{Hom}_{\text{\bf Set}}(X \sqcup Y, A) \xrightarrow{\cong} \text{Hom}_{\text{\bf Set}}(X, A) \times \text{Hom}_{\text{\bf Set}}(Y, A). This reminds me of digital circuit construction: the former corresponds to a time-multiplexed circuit capable of producing outputs from A with inputs from either X or Y, together with an identifier for which (to feed a multiplexer select line); the latter corresponds to two circuits producing both the outputs rather than just the selected one.

The fibre product, X \times_Z Y, given functions f : X \to Z and g : Y \to Z is defined as the set X \times_Z Y = \{ (x, z, y) | f(x) = z = g(y)\}. Any set isomorphic to this fibre product is called a pullback of the diagram X \xrightarrow{f} Z \xleftarrow{g} Y.

The fibre sum, X \sqcup_W Y, given functions f : W \to X and g : W \to Y is the quotient of X \sqcup W \sqcup Y by the equivalence relation \sim generated by w \sim f(w) and w \sim g(w) for all w \in W. A pushout of X and Y over W is any set Z which is isomorphic to X \sqcup_W Y.

A good intuitive description in the text is to think of the fibre sum of X and Y as gluing the two sets together, with W as the glue. A nice example from one of the exercises is W = \mathbb{N}, X = \mathbb{Z}, Y a singleton set, f : W \to X defined by w \mapsto -(w+1). Then we’re left with X \sqcup_W Y \cong \{ -, 0, 1, \ldots \}, where - is an element corresponding to the collapsing of all negative elements in X to a single point.

The familiar notions of surjective and injective functions are recast as monomorphisms and epimorphisms, respectively. A function f : X \to Y is a monomorphism if for all sets A and pairs of functions g, g' : A \to X, if fg = fg' then g = g'. A function f : X \to Y is an epimorphism if for all sets B and pairs of functions h, h' : Y \to B, if hf = h'f then h = h'. By this time in the book, we get the hint that we’re learning this new language as it will generalise later beyond functions from sets to sets – the chapter ends with various slightly more structured concepts, like set-indexed functions, where the idea is always to define the object and the type of morphism that might make sense for that object.

Monoids, Groups, Graphs, Orders & Databases

In Chapter 4, the reader is introduced to several other structures. With the exception of the databases part, this was familiar material to me, but with slightly unfamiliar presentation.

We learn the classical definition of a monoid and group, and also discuss the free monoid generated by a set X as being the monoid consisting of set \mbox{List}(X) of lists of elements of X, with the empty list as the identity and list concatenation as the operation. We discuss the idea of a monoid action on a set S, i.e. a function \circlearrowleft: M \times S \to S associated with the monoid M having the properties e \circlearrowleft s = s for any s \in M, where e is the identity, and m \circlearrowleft (n \circlearrowleft s) = (m \star n) \circlearrowleft s, where m and n are any elements of M, s is any element of S, and \star is the monoid operation. The discussion of monoids is nice and detailed, and I enjoyed the exercises, especially those leading up to one of the author’s slogans: “A finite state machine is an action of a free monoid on a finite set.”

Graphs are dealt with in a slightly unusual way (for me) as quadruples G := (V, A, \mbox{src}, \mbox{tgt}) of a set of vertices V, a set of arrows A, a functions \mbox{src}, \mbox{tgt} mapping each arrow into its source and destination vertex. This corresponds then to what I would call a multigraph, as parallel edges can exist, though it is called a graph in the book.

For each of the structures discussed, we finish by looking at their morphisms.

As an example, let {\mathcal M} := (M, e, \star) and {\mathcal M'} := (M', e', \star') be monoids (the tuple is of the set, identity element, and operation). Then a monoid homomorphism f: {\mathcal M} \to {\mathcal M'} is a function f: M \to M' preserving identity and operation, i.e. (i) f(e) = e' and f(m_1 \star m_2) = f(m_1) \star' f(m_2) for all m_1, m_2 \in M.

Similarly, for graphs, graph homomorphisms f: G \to G' for G := (V, A, \mbox{src}, \mbox{tgt}) and G' := (V', A', \mbox{src}', \mbox{tgt}') are defined as pairs of functions f_0: V \to V' and f_1: A \to A' preserving the endpoints of arrows, i.e. such that the diagrams below commute.

Order morphisms are similarly defined, this time as those preserving order.

Thus we are introduced to several different (somewhat) familiar mathematical structures, and in each case we’re learning about how to describe morphisms on these structures that somehow preserve the structure.

Categories, Functors, and Natural Transformations

Things really heat up in Chapter 5, where we actually begin to cover the meat of category theory. This is exciting and kept me hooked by the pool. The unusual presentation of certain aspects in previous chapters (e.g. the graphs) all made sense during this chapter.

A category \mathcal{C} is defined as a collection of various things: a set of objects \text{Ob}(\mathcal{C}), a set of morphisms \text{Hom}_{\mathcal{C}}(x,y) for every pair of objects x and y, a specified identity morphism \text{id}_x for every object x, and compositions \circ: \text{Hom}_{\mathcal{C}}(y,z) \times \text{Hom}_{\mathcal{C}}(x,y) \to \text{Hom}_{\mathcal{C}}(x,z). To qualify as a category, such things must satisfy identity laws and also associativity of composition.

Examples of the category of Monoids, of Groups, of Sets, of Graphs, of Pre-Orders, etc., are explored. Isomorphisms between objects in a category are carefully defined.

Functors are introduced as transformations from one category to another, via a function transforming objects and a function transforming morphisms, such that identities and composition and preserved. Functors are then used to rigorously explore the connections between preorders, graphs, and categories, and I finally understood why the author had focused so much on preorders in the previous chapters!

Things started to take a surprising and exciting turn for me when the author shows, via the use of functors, that a monoid is a category with one object (the set of the monoid becomes the morphisms in the category), and that groups can therefore be understood as categories with one object for which each morphism is an isomorphism. This leads to a playful discussion, out of which pops groupoids, amongst other things. Similar gymnastics lead to preorders reconstituted as categories in which hom-sets have one or fewer elements, and graphs reconstituted as functors from a certain two-object category to \mathbf{Set}. Propositional logic is re-examined from a categorical perspective, though I suspect this is just scratching the surface of the interplay between category theory and logic, and I’d love to read more about this. For me, this was a highlight – that by playing with these definitions and trying things out, all these wonderful familiar structures arise.

The final part of this chapter deals with natural transformations. Just as functors transform categories to categories, natural transformations transform functors to functors. Specifically, a natural transformation between functor F : {\mathcal C} \to {\mathcal D} and G : {\mathcal C} \to {\mathcal D} is defined as a morphism (in {\mathcal D}) \alpha_X for each object X in {\mathcal C}, such that the diagram below commutes for every morphism f : X \to Y in {\mathcal C}.

Screenshot 2019-08-01 at 04.52.46

I think this section on natural transformations is where the huge number of exercises in the book really came into its own; I doubt very much I would’ve grasped some of the subtleties of natural transformations without them.

We learn about two kinds of composition of natural transformations. The first combines two natural transformations on functors F,G,H : {\mathcal C} \to {\mathcal D}, say one \alpha: F \to G and one \beta: G \to H to obtain a natural transformation \beta \circ \alpha: F \to H. The other combines a natural transformation \gamma_1 between functors F_1, F_2 : {\mathcal C} \to {\mathcal D} with one \gamma_2 between functors G_1, G_2 : {\mathcal D} \to {\mathcal E}, to produce one, \gamma_2 \Diamond \gamma_1: G_1 \circ F_1 \to G_2 \circ F_2.

An equivalence relation {\mathcal C}' \simeq {\mathcal C} – weaker than isomorphism – is introduced on categories as a functor F : {\mathcal C} \to {\mathcal C}' such that there exists a functor F' : {\mathcal C}' \to {\mathcal C} and natural isomorphisms id_{\mathcal C} \xrightarrow{\cong} F' \circ F and id_{{\mathcal C}'} \xrightarrow{\cong} F \circ F'. This seemed odd to me — the lack of insistence on isomorphism for equivalence — but the examples all make sense. More importantly it is shown that this requirement is sufficient for the on-homomorphism part of the functor F to be bijective for every fixed pair of objects in {\mathcal C}, which feels right. I think my understanding could benefit from further interesting examples of usefully equivalent but non-isomorphic categories though – suggestions welcome!

Limits and Colimits

Chapter 6 is primarily given over to the definitions and discussion of limits and colimits, generalising the ideas in \mathbf{Set} to other categories. The chapter begins by looking again at product and coproduct. As an example, for a category {\mathcal C}, with objects X, Y \in \text{Ob}({\mathcal C}), a span is defined as a triple (Z, p, q) where Z is an object in {\mathcal C} and p : Z \to X and q : Z \to Y are morphisms in {\mathcal C}. A product X \times Y is then defined as as span on X and Y such that for any other span (Z, p, q) on X and Y, there exists a unique morphism t_{p,q} such that the following diagram commutes. Again, I rather like the author’s slogan for this: “none shall map to X and Y except through me!”

Screenshot 2019-10-06 at 15.36.29

The chapter then further generalises these ideas to the the ideas of limit and colimit. This section was a step up in abstraction and also one of the least “applied” parts of the book. It has certainly piqued my interest – what are limits and colimits in various structures I might use / encounter in my work? – but I found the examples here fairly limited and abstract compared to the copious ones in previous chapters. It’s in this chapter that universal properties begin to rear their head in a major way, and again I’m still not totally intuitively comfortable with this idea – I can crank the handle and answer the exercises, but I don’t feel like I’m yet at home with the idea of universal properties.

Categories at Work

In the final section of the book, we first study adjunctions. Given categories {\mathcal B} and {\mathcal A}, an adjunction is defined as a pair of functors L: {\mathcal B} \to {\mathcal A} and R: {\mathcal A} \to {\mathcal B} together with a natural isomorphism \alpha whose component for A \in \text{Ob}({\mathcal A}), B \in \text{Ob}({\mathcal B}) is \alpha_{B,A}: \text{Hom}_{\mathcal A}(L(B), A) \xrightarrow{\cong} \text{Hom}_{\mathcal B}(B, R(A)). Several preceding results and examples can then be re-expressed in this framework, and understood as examples of adjunctions. Adjuctions are also revisited later in the chapter to show a close link with monads.

Categories of functors are discussed, leading up to a discussion of the Yoneda lemma. This is really interesting stuff, as it seems to cross huge boundaries of abstraction. Consider a category {\mathcal C}. Spivak first defines a functor Y_c := \text{Hom}_{\mathcal{C}}(c, -) : {\mathcal C} \to \mathbf{Set} sending objects in {\mathcal C} to the set \text{Hom}_{\mathcal C}(c,d) and similarly for morphisms. This is then used in the following theorem:

Let {\mathcal C} be a category, c \in \text{Ob}(\mathcal{C}) be an object, and I: {\mathcal C} \to \mathbf{Set} be a set-valued functor. There is a natural bijection \text{Hom}_{\mathcal{C}-\mathbf{Set}}(Y_c, I) \xrightarrow{\cong} I(c).

This looks super weird: the right-hand side is just a plain old set and yet the left hand-side seems to be some tower of abstraction. I worked through a couple of examples I dreamt up, namely \mathcal{C} = \mathbf{1} (the category with one object) with I mapping that object to an arbitrary set, and \mathcal{C} = \mathbf{Set} with I = \text{id}_{\mathbf{Set}} and convinced myself this works just fine for those two examples. Unfortunately, the general proof is not covered in the book – readers are referred to Mac Lane – and while some discussion is provided regarding links to databases, I felt like I need a proof to properly grasp what’s going on here under the hood.

Monads are then covered. I’ve briefly seen monads in a very practical context of functional programming, specifically for dealing with I/O in Haskell, but I’ve never looked at them from a theoretical perspective before.

A monad on \mathbf{Set} (monads are only defined in the book on \mathbf{Set}, for reasons not totally apparent to me) is defined as a triple (T, \eta, \mu) consisting of a functor T: \mathbf{Set} \to \mathbf{Set}, a natural transformation \eta: \text{id}_{\mathbf{Set}} \to T, and natural transformation \mu: T \circ T \to T such that the following diagrams commute:

 

Screenshot 2019-12-08 at 14.52.30Screenshot 2019-12-08 at 14.52.41Screenshot 2019-12-08 at 14.52.50

I found this discussion enlightening and relatively straight-forward. Firstly, exception monads are discussed, like Maybe in Haskell, and a nice example of wrapping sets and their functions into power sets and their functions, for which I again enjoyed the exercises. Kleisli categories of monads are defined, and it then becomes apparent why this is such a powerful formalism to think about concepts like program state and I/O.

The Kleisli category \mathbf{Kls}(\mathcal{T}) of a monad \mathcal{T} = (T, \eta, \mu) on \mathbf{Set} is defined as a category with objects \text{Ob}(\mathbf{Kls}(\mathcal{T}) = \text{Ob}(\mathbf{Set}) and morphisms \text{Hom}_{\mathbf{Kls}(\mathcal{T})}(X,Y) = \text{Hom}_{\mathbf{Set}}(X,T(Y)). Identity is \text{id}_X: X \to X in \mathbf{Kls}(\mathcal{T}) is exactly \eta_X in \mathbf{Set}  and composition of morphisms f: X \to Y and g: Y \to Z in \mathbf{Kls}(\mathcal{T}) is given by \mu_z \circ T(g) \circ f in \mathbf{Set}. You can intuitively see the ‘baggage’ of T being carried through in \mathbf{Set} yet hidden from view in \mathbf{Kls}.

Finally, the book ends with a (very) brief discussion of operads. I’ve not come across operads before, and from this brief discussion I’m not really sure what they ‘buy’ me beyond monoidal categories.

General

The exercises are copious and great, especially in the earlier chapters. They kept me engaged throughout and helped me to really understand what I was reading, rather than just think I had. In the later exercises there are many special cases that turn out to be more familiar things. This is both enlightening “oh, an X is just a Y!” and frustrating “yeah, but I want to see Xs that aren’t Ys, otherwise what’s this all heavyweight general machinery for?” I guess this comes from trying to cover a lot of ground in one textbook.

I am struck by just how much mental gymnastics I feel I had to go through to see mathematical structures from the “outside in” as well as from the “inside out”, and how category theory helped me do that. This is a pleasing feeling.

The slogans are great and often amusing, e.g. when describing a functor from the category of Monoids to the category of Sets, “you can use a smartphone as a paperweight.”

The book ends rather abruptly after the brief discussion of operads, with no conclusion. I felt it would be nice to have a summary discussion, and perhaps point out the current directions the field of applied category theory is taking.

Overall, this book has been an enlightening read. It has helped me to see the commonality behind several disparate mathematical structures, and has encouraged me to look for such commonality and relationships in my own future work. Recommended!

Machine Learning at FPT 2019

Next week, the IEEE International Conference on Field-Programmable Technology (FPT) will take place in Tianjin in China. I’m proud that my former PhD student Qiang Liu will be General Chair of the conference.

I am a coauthor of two papers to be presented at FPT, one led by my former BEng student Aaron Zhao, now a PhD student at Cambridge supervised by my colleague Rob Mullins, and one led by my former postdoc, Ameer Abdelhadi, now with COHESA / UofT. The paper by Aaron is also in collaboration with two of my former PhD students, Xitong Gao, now with the Chinese Academy of Sciences, and Junyi Liu, now with Microsoft Research.

The first paper, led by Aaron, is entitled ‘Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs’, and can be found on arXiv here. This paper is a serious look at getting an automated CNN flow for FPGAs that makes good use of some of the arithmetic flexibility available on these devices. Powers-of-two (“free” multiplication) and fixed-point (“cheap” multiplication) are both leveraged.

The second paper, led by Ameer, looks at the computation of a set of approximate nearest neighbours. This is useful in a number of machine learning settings, both directly as a non-neural deep learning inference algorithm and indirectly within sophisticated deep learning algorithms like Neural Turing Machines. Ameer has shown that this task can be successfully accelerated in an FPGA design, and explores some interesting ways to parameterise the algorithm to make the most of the hardware, leading to tradeoffs between algorithm accuracy and performance.

If you’re at FPT this year, please go and say hello to Aaron, Ameer and Qiang.