A-Levels and GCSEs in 2020

This week was A-Level results day. It was also the day that Ofqual published its long-awaited standardisation algorithm. Full details can be found in the 319-page report. In this blog post, I’ve set down my initial thoughts after reading the report.

Prelude

I would like to begin by saying that Ofqual was not given an easy task: produce a system to devise A-level and GCSE grades without exams or coursework. Reading the report, it is clear that they worked hard to do the best they could within the confines they operate, and I respect that work. Nevertheless, I have several concerns to share.

Concerns

1. Accounting for Prior Attainment

The model corrects for differences between historical prior attainment and prior attainment of the 2020 cohort in the following way (first taking into account any learners without prior attainment measures.) For any particular grade, the proportion to be awarded is equal to the historical proportion at that grade adjusted by a factor referred to in the report as q_{kj} - p_{kj}. (See p.92-93 of the report, which incidentally has a typo here — c_k should read c_{kj}.) As noted by the Fischer Family Trust, it appears that this factor is based solely on national differences in value added, and this could cause a problem. To illustrate this requires an artificial example. Imagine that Centre A has a historical transition matrix looking like this – all of its 200 students have walked away with A*s in this subject in recent years, whether they were in the first or second GCSE decile (and half were in each). Well done Centre A!

GCSE DecileA*A
11000
21000

Meanwhile, let’s say the national transition matrix looks more like this:

GCSE DecileA*A
190%10%
210%90%

Let’s now look at 2020 outcomes. Assume that this year, Centre A has an unusual cohort: all students were second decile in prior attainment. It seems natural to expect that it would still get mainly A*s, consistent with its prior performance, but this is not the outcome of the model. Instead, its historical distribution of 100% A*s is adjusted downwards because of the national transition matrix. The proportion of A*s at Centre A will be reduced by 40% – now only 60% of them will get A*s! This happens because the national transition matrix expects a 50/50 split of Decile 1 and Decile 2 students to end up with 50% A* and a Decile 2-only cohort to end up with 10% A*, resulting in a downgrade of 40%.

2. Model accuracy

Amongst the various possible standardisation options, Ofqual evaluated accuracy based on trying to predict 2019 exam grades and seeing how well they matched to awarded exams. This immediately presents a problem: no rank orders were submitted for 2019 students, so how is this possible? The answer provided is “the actual rank order within the centre based on the marks achieved in 2019 were used as a replacement“, i.e. they back-fitted 2019 marks to rank orders. This only provides a reasonable idea of accuracy if we assume that teacher-submitted rank orders in 2020 would exactly correspond to mark orders of their pupils, as noted by Guy Nason. Of course this will not be the case, so the accuracy estimates in the Ofqual report are likely to be significant overestimates. And they’re already not great, even under a perfect-ranking assumption: Ofqual report that only 12 out of 22 GCSE subjects were accurate to within one grade, with some subjects having only 40% accuracy in terms of predicting the attained grade – so one is left wondering what the accuracy might actually be for 2020 once rank-order uncertainty is taken into account.

There may also be a systematic variation in the accuracy of the model across different grades, but this is obscured by using the probability of successful classification across any grade as the primary measure of accuracy. Graphs presented in the Ofqual report suggest, for example, that the models are far less accurate at Grade 4 than at Grade 7 in GCSE English.

3. When is a large cohort a large cohort?

A large cohort, and therefore one for which teacher-assessed grades are used at all, is defined in the algorithm to be one with at least 15 students. But how do we count these 15 students? The current cohort or the historic cohort, or something else? The answer is given in Ofqual’s report: the harmonic mean of the two. As an extreme example of this, centre cohorts can be considered “large” with only 8 pupils this year – so long as they had at least 120 in the recent past. It seems remarkable that a centre could have fewer pupils than GCSE grades and still be “large”!

4. Imputed marks fill grade ranges

As the penultimate step in the Ofqual algorithm, “imputed marks” are calculated for each student – a kind of proxy mark equally spaced between grade end-points. So, for example, if Centre B only has one student heading for a Grade C at this stage then – by definition – it’s a mid-C. If they had two Grade C students, they’d be equally spaced across the “C spectrum”. This means that in the next step of the algorithm, cut-score setting, these students are vulnerable to changing grades. For centres which tend to fill the full grade range anyway, this may not be an issue. But I worry that we may see some big changes at the edges of centre distributions as a result of this quirk.

5. No uncertainty quantification

Underlying many of these concerns is, perhaps, a more fundamental one. Grades awarded this year come with different levels of uncertainty, depending on factors like how volatile attainment at the centre has been in the past, the size of the cohorts, known uncertainty in grading, etc. Yet none of this is visible in the awarded grade. In practice, this means that some Grade Cs are really “B/C”s while some are “A-E”, and we don’t know the difference. It is not beyond possibility to quantify the uncertainty – in fact I proposed awarding grade ranges in my original consultation response to Ofqual. This issue has been raised independently by the Royal Statistical Society and even for normal exam years, given the inherent unreliability of exam grades, by Dennis Sherwood. For small centres, rather than a statistically reasonable approach to widen the grade range, the impact of only awarding a single grade with unquantified uncertainty is that Ofqual have had to revert to teacher-assessed grades, leading to an unfair a “mix and match” system where some centres have had their teacher-assessed grades awarded while some haven’t.

What Must Happen Now?

I think everyone can agree that centres need to immediately receive all the intermediate steps in the calculations of their grades. Many examinations officers are currently scratching their heads, after having received only a small part of this information. The basic principle must be that centres are able to recalculate their grades from first principles if they want to. This additional information should include the proportion of pupils in both historical and current cohorts with matched prior attainment data for each subject and which decile each student falls into, the national transition matrices used for each subject, the values of q_{kj} and p_{kj} for each subject / grade combination, the imputed marks for each 2020 student, and the national imputed mark cut-points for each grade boundary in each subject.

At a political level, serious consideration should now be given to awarding teacher-assessed grades (CAGs) this year. While I was initially supportive of a standardisation approach – and I support the principles of Ofqual’s “meso-standardisation” – I fear that problems with the current standarisation algorithm are damaging rather than preserving public perception of A-Level grades. We may have now reached the point that the disadvantages of sticking to the current system are worse than the disadvantages of simply accepting CAGs for A-Levels.

Ofqual states in their report that “A key motivation for the design of the approach to standardisation [was] as far as possible [to] ensure that a grade represents the same standard, irrespective of the school or college they attended.”.  Unfortunately, my view is that this has not been achieved by the Ofqual algorithm. However, despite my concerns over Ofqual’s algorithm, it is also questionable whether any methodology meeting this objective could be implemented in time under a competitive education system culture driven by high-stakes accountability systems. Something to think about for our post-COVID world.

Some Notes on Metric Spaces

This post contains some summary informal notes of key ideas from my reading of Mícheál Ó Searcóid’s Metric Spaces (Springer, 2007). These notes are here as a reference for me, my students, and any others who may be interested. They are by no means exhaustive, but rather cover topics that seemed interesting to me on first reading. By way of a brief book review, it’s worth noting that Ó Searcóid’s approach is excellent for learning a subject. He has a few useful tricks up his sleeve, in particular:

  • Chapters will often start with a theorem proving equivalence of various statements (e.g. Theorem 8.1.1, Criteria for Continuity at a Point). Only then will he choose one of these statements as a definition, and he explains this choice carefully, often via reference to other mathematics.
  • The usual definition-theorem-proof style is supplemented with ‘question’ – these are relatively informally-stated questions and their answers. They have been carefully chosen to highlight some questions the reader might be wondering about at that point in the text and to demonstrate key (and sometimes surprising) answers before the formal theorem statement.
  • The writing is pleasant, even playful at times though never lacking formality. This is a neat trick to pull off.
  • There are plenty of exercises, and solutions are provided.

These features combine to produce an excellent learning experience.

1. Some Basic Definitions

A metric on a set X is a function d : X \times X \to {\mathbb R} such that:

  • Positivity: d(a,b) \geq 0 with equality iff a = b
  • Symmetry: d(a,b) = d(b,a)
  • Triangle inequality: d(a,b) \leq d(a,c) + d(c,b)

The combination of such a metric and a the corresponding set is a metric space.

Given a metric space (X,d), the point function at z is \delta_z : x \mapsto d(z,x).

A pointlike function u : X \to {\mathbb R}^\oplus is one where u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)

For metric spaces (X,d) and (Y,e), X is a metric subspace of Y iff X \subseteq Y and d is a restriction of e.

For metric spaces (X,d) and (Y,e), an isometry \phi : X \to Y is a function such that e(\phi(a),\phi(b)) = d(a,b). The metric subspace (\phi(X),e) is an isometric copy of (X,d).

Some standard constructions of metrics for product spaces:

  1. \mu_1 : (a,b) \mapsto \sum_{i=1}^n \tau_i(a_i,b_i)
  2. \mu_2 : (a,b) \mapsto \sqrt{\sum_{i=1}^n \left(\tau_i(a_i,b_i)\right)^2}
  3. \mu_\infty : (a,b) \mapsto \max\left\{\tau_i(a_i,b_i) | i \in {\mathbb N}\right\}

A conserving metric e on a product space is one where \mu_\infty(a,b) \leq e(a,b) \leq \mu_1(a,b). Ó Searcóid calls these conserving metrics because they conserve an isometric copy of the individual spaces, recoverable by projection (I don’t think this is a commonly used term). This can be seen because fixing elements of all-but-one of the constituent spaces makes the upper and lower bound coincide, resulting in recovery of the original metric.

A norm on a linear space V over {\mathbb R} or {\mathbb C} is a real function such that for x, y \in V and \alpha scalar:

  • ||x|| \geq 0 with equality iff x = 0
  • ||\alpha x|| = |\alpha|\; ||x||
  • ||x + y|| \leq ||x|| + ||y||

The metric defined by the norm is d(a,b) = ||a - b||.

2. Distances

The diameter of a set A \subseteq X of metric space (X,d) is \text{diam}(A) = \sup\{d(r,s) | r, s \in A\}.

The distance of a point x \in X from a set A \subseteq X is \text{dist}(x, A) = \inf\{ d(x,a) | a \in A\}.

An isolated point z \in S where S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) \neq 0.

An accumulation point or limit point z \in X of S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) = 0. Note that z doesn’t need to be in S. A good example is z = 0, X = {\mathbb R}, S = \{1/n | n \in {\mathbb N}\}.

The distance from subset A to subset B of a metric space is defined as \text{dist}(A,B) = \inf\{ d(a,b) | a \in A, b \in B\}.

A nearest point s \in S of S to z \in X is one for which d(z,s) = \text{dist}(z,S). Note that nearest points don’t need to exist, because \text{dist} is defined via the infimum. If a metric space is empty or admits a nearest point to each point in every metric superspace, it is said to have the nearest-point property.

3. Boundaries

A point a is a boundary point of S in X iff \text{dist}(a,S) = \text{dist}(a,S^c) = 0. The collection of these points is the boundary \partial S.

Metric spaces with no proper non-trivial subset with empty boundary are connected. An example of a disconnected metric space is X = (0,1) \cup (7,8) as a metric subspace of {\mathbb R}, while {\mathbb R} itself is certainly connected.

Closed sets are those that contain their boundary.

The closure of S in X is \bar{S} \triangleq X \cup \partial S. The interior is S \setminus \partial S. The exterior is (\bar{S})^c.

Interior, boundary, and exterior are mutually disjoint and their union is X.

4. Sub- and super-spaces

A subset S \subseteq X is dense in X iff \bar{S} = X, or equivalently if for every x \in X, \text{dist}(x,S) = 0. The archetypal example is that \mathbb{Q} is dense in \mathbb{R}.

A complete metric space X is one that is closed in every metric superspace of X. An example is \mathbb{R}.

5. Balls

Let b[a;r) = \{ x \in X | d(a,x) < r \} denote an open ball and similarly b[a;r] = \{ x \in X | d(a,x) \leq r \} denote a closed ball. In the special case of normed linear spaces, b[a;r) = a + rb[0;1) and similarly for closed balls, so the important object is this unit ball – all others have the same shape. A norm on a space V is actually defined by three properties such balls U must have:

  • Convexity
  • Balanced (i.e. x \in U \Rightarrow -x \in U)
  • For each x \in V \setminus \{0\}, the set \{ t \in \mathbb{R}^+ | t x \in U \},
    • is nonempty
    • must have real supremum s
    • sx \notin U

6. Convergence

The mth tail of a sequence x = (x_n) is the set \mbox{tail}_m(x) = \{x_m | n \in {\mathbb N}, n \geq m \}.

Suppose X is a metric space, z \in X and x= (x_n) is a sequence in X. Sequence x converges to z in X, denoted x_n \to z iff every open subset of X that contains z includes a tail of x. In this situation, z is unique and is called the limit of the sequence, denoted \mbox{lim }x_n.

It follows that for (X,d) a metric space, z \in X and (x_n) a sequence in X, the sequence (x_n) converges to z in X iff the real sequence (d(x_n,z))_{n \in \mathbb{N}} converges to 0 in {\mathbb R}.

For real sequences, we can define the:

  • limit superior, \mbox{lim sup } x_n = \mbox{inf } \{ \mbox{sup } \mbox{tail}_n(x) | n \in \mathbb{N} \} and
  • limit inferior, \mbox{lim inf } x_n = \mbox{sup } \{ \mbox{inf } \mbox{tail}_n(x) | n \in \mathbb{N} \}.

It can be shown that x_n \to z iff \mbox{lim sup } x_n = \mbox{lim inf } x_n = z.

Clearly sequences in superspaces converge to the same limit – the same is true in subspaces if the limit point is in the subspace itself. Sequences in finite product spaces equipped with product metrics converge in the product space iff their projections onto the individual spaces converge.

Every subsequence of a convergent sequence converges to the same limit as the parent sequence, but the picture for non-convergent parent sequences is more complicated, as we can still have convergent subsequences. There are various equivalent ways of characterising these limits of subsequences, e.g. centres of balls containing an infinite number of terms of the parent sequence.

A sequence (x_n) is Cauchy iff for every r \in \mathbb{R}^+, there is a ball of radius r that includes a tail of (x_n). Every convergent sequence is Cauchy. The converse is not true, but only if the what should be the limit point is missing from the space — adding this point and extending the metric appropriately yields a convergent sequence. It can be shown that a space is complete (see above for definition) iff every Cauchy sequence is also a convergent sequence in that space.

7. Bounds

A subset S of a metric space X is a bounded subset iff S = X = \emptyset or S is included in some ball of X. A metric space X is bounded iff it is a bounded subset of itself. An alternative characterisation of a bounded subset S is that it has finite diameter.

The Hausdorff metric is defined on the set S(X) of all non-empty closed bounded subsets of a set X equipped with metric d. It is given by h(A,B) = \max \{ \sup\{ \mbox{dist}(b, A) | b \in B\}, \sup\{ \mbox{dist}(a, B) | a \in A\} \}.

Given a set X and a metric space Y, f : X \to Y is a bounded function iff f(X) is a bounded subset of Y. The set of bounded functions from X to Y is denoted B(X,Y). There is a standard metric on bounded functions, s(f,g) = \sup \{ e(f(x),g(x)) | x \in X \} where e is the metric on Y.

Let X be a nonempty set and Y be a nonempty metric space. Let (f_n) be a sequence of functions from X to Y and g: X \to Y. Then:

  • (f_n) converges pointwise to g iff (f_n(z)) converges to g(z) for all z \in X
  • (f_n) converges uniformly to g iff \sup\{ e(f_n(x),g(x)) | x \in X \} is real for each n \in {\mathbb N} and the sequence ( \sup\{ e(f_n(x),g(x) | x \in X \})_{n \in {\mathbb N}} converges to zero in {\mathbb R}.

It’s interesting to look at these two different notions of convergence because the second is stronger. Every uniformly-convergent sequence of functions converges pointwise, but the converse is not true. An example is the sequence f_n : \mathbb{R}^+ \to \mathbb{R} given by f_n(x) = 1/nx. This converges pointwise but not uniformly to the zero function.

A stronger notion than boundedness is total boundedness. A subset S of a metric space X is totally bounded iff for each r \in {\mathbb R}^+, there is a finite collection of balls of X of radius r that covers S. An example of a bounded but not totally bounded subset is any infinite subset of a space with the discrete metric. Total boundedness carries over to subspaces and finite unions.

Conserving metrics play an important role in bounds, allowing bounds on product spaces to be equivalent to bounds on the projections to the individual spaces. This goes for both boundedness and total boundedness.

8. Continuity

Given metric spaces X and Y, a point z \in X and a function f: X \to Y, the function is said to be continuous at z iff for each open subset V \subseteq Y with f(z) \in V, there exists and open subset U of X with z \in U such that f(U) \subseteq V.

Extending from points to the whole domain, the function is said to be continuous on X iff for each open subset V \subseteq Y, f^{-1}(V) is open in X.

Continuity is not determined by the codomain, in the sense that a continuous function is continuous on any metric superspace of its range. It is preserved by function composition and by restriction.

Continuity plays well with product spaces, in the sense that if the product space is endowed with a product metric, a function mapping into the product space is continuous iff its compositions with the natural projections are all continuous.

For (X,d) and (Y,e) metric spaces, \mathcal{C}(X,Y) denotes the metric space of continuous bounded functions from X to Y with the supremum metric (f,g) \mapsto \sup\{ e(g(x),f(x)) | x \in X \}. \mathcal{C}(X,Y) is closed in the space of bounded functions from X to Y.

Nicely, we can talk about convergence using the language of continuity. In particular, let X be a metric space, and \tilde{\mathbb{N}} = \mathbb{N} \cup \{ \infty \}. Endow \tilde{\mathbb{N}} with the inverse metric (a,b) \mapsto |a^{-1} - b^{-1} | for a,b \in {\mathbb N}, (n,\infty) \mapsto n^{-1} and (\infty, \infty) \mapsto 0. Let \tilde{x} : \tilde{\mathbb{N}} \to X. Then \tilde{x} is continuous iff the sequence (x_n) converges in X to x_{\infty}. In particular, the function extending each convergent sequence with its limit is an isometry from the space of convergent sequences in X to the metric space of continuous bounded functions from \tilde{\mathbb{N}} to X.

9. Uniform Continuity

Here we explore increasing strengths of continuity: Lipschitz continuity > uniform continuity > continuity. Ó Searcóid also adds strong contractions into this hierarchy, as the strongest class studied.

Uniform continuity requires the \delta in the epsilon-delta definition of continuity to extend across a whole set. Consider metric spaces (X,d) and (Y,e), a function f : X \to Y, and a metric subspace S \subseteq X. The function f is uniformly continuous on S iff for every \epsilon \in \mathbb{R}^+ there exists a \delta \in \mathbb{R}^+ s.t. for every x, z \in S for which d(z,x) < \delta, it holds that e( f(z), f(x) ) < \epsilon.

If (X,d) is a metric space with the nearest-point property and f is continuous, then f is also uniformly continuous on every bounded subset of X. A good example might be a polynomial on \mathbb{R}.

Uniformly continuous functions map compact metric spaces into compact metric spaces. They preserve total boundedness and Cauchy sequences. This isn’t necessarily true for continuous functions, e.g. x \mapsto 1/x on (0,1] does not preserve the Cauchy property of the sequence (1/n).

There is a remarkable relationship between the Cantor Set and uniform continuity. Consider a nonempty metric space (X,d). Then X is totally bounded iff there exists a bijective uniformly continuous function from a subset of the Cantor Set to X. As Ó Searcóid notes, this means that totally bounded metric spaces are quite small, in the sense that none can have cardinality greater than that of the reals.

Consider metric spaces (X,d) and (Y,e) and function f: X \to Y. The function is called Lipschitz with Lipschitz constant k \in \mathbb{R}^+ iff e( f(a), f(b) ) \leq k d(a,b) for all a, b \in X.

Note here the difference to uniform continuity: Lipschitz continuity restricts uniform continuity by describing a relationship that must exist between the \epsilons and \deltas – uniform leaves this open. A nice example from Ó Searcóid of a uniformly continuous non-Lipschitz function is x \mapsto \sqrt{1 - x^2} on [0,1).

Lipschitz functions preserve boundedness, and the Lipschitz property is preserved by function composition.

There is a relationship between Lipschitz functions on the reals and their differentials. Let I be a non-degenerate intervals of \mathbb{R} and f: I \to \mathbb{R}. Then f is Lipschitz on I iff f' is bounded on I.

A function with Lipschitz constant less than one is called a strong contraction.

Unlike the case for continuity, not every product metric gives rise to uniformly continuous natural projections, but this does hold for conserving metrics.

10. Completeness

Let (X,d) be a metric space and u : X \to \mathbb{R}. The function u is called a virtual point iff:

  • u(a) - u(b) \leq d(a,b) \leq u(a) + u(b) for all a,b \in X
  • \text{inf} \; u(X) = 0
  • 0 \notin u(X)

We saw earlier that a metric space X is complete iff it is closed in every metric superspace of X. There are a number of equivalent characterisations, including that every Cauchy sequence in X converses in X.

Consider a metric space (X,d). A subset of S \subseteq X is a complete subset of X iff (S,d) is a complete metric space.

If X is a complete metric space and S \subseteq X, then S is complete iff S is closed in X.

Conserving metrics ensure that finite products of complete metric spaces are complete.

A non-empty metric space (X,d) is complete iff (\mathcal{S},h) is complete, where \mathcal{S}(X) denotes the collection of all non-empty closed bounded subsets of X and h denotes the Hausdorff metric.

For X a non-empty set and (Y,e) a metric space, the metric space B(X,Y) of bounded functions from X to Y with the supremum metric is a complete metric space iff Y is complete. An example is that the space of bounded sequences in \mathbb{R} is complete due to completeness of \mathbb{R}.

We can extend uniformly continuous functions from dense subsets to complete spaces to unique uniformly continuous functions from the whole: Consider metric spaces (X,d) and (Y,e) with the latter being complete. Let S \subseteq X be a dense subset of X and f : S \to Y be a uniformly continuous function. Then there exists a uniformly continuous function \tilde{f} : X \to Y such that \tilde{f}|_S = f. There are no other continuous extensions of f to X.

(Banach’s Fixed-Point Theorem). Let (X,d) be a non-empty complete metric space and f : X \to X be a strong contraction on X with Lipschitz constant k \in (0,1). Then f has a unique fixed point in X and, for each w \in X, the sequence (f^n(w)) converges to the fixed point. Beautiful examples of this abound, of course. Ó Searcóid discusses IFS fractals – computer scientists will be familiar with applications in the semantics of programming languages.

A metric space (Y,e) is called a completion of metric space (X,d) iff (Y,e) is complete and (X,d) is isometric to a dense subspace of (Y,e).

We can complete any metric space. Let (X,d) be a metric space. Define \tilde{X} = \delta(X) \cup \text{vp}(X) where \delta(X) denotes the set of all point functions in X and \text{vp}(X) denotes the set of all virtual points in X. We can endow \tilde{X} with the metric s given by (u,v) \mapsto \sup\{ |u(x) - v(x)| | x \in X \}. Then \tilde{X} is a completion of X.

Here the subspace (\delta(X),s) of (\tilde{X},s) forms the subspace isometric to (X,d).

11. Connectedness

A metric space X is a connected metric space iff X cannot be expressed as the union of two disjoint nonempty open subsets of itself. An example is \mathbb{R} with its usual metric. As usual, Ó Searcóid gives a number of equivalent criteria:

  • Every proper nonempty subset of X has nonempty boundary in X
  • No proper nonempty subset of X is both open and closed in X
  • X is not the union of two disjoint nonempty closed subsets of itself
  • Either X = \emptyset or the only continuous functions from X to the discrete space \{0,1\} are the two constant functions

Connectedness is not a property that is relative to any metric superspace. In particular, if X is a metric space, Z is a metric subspace of X and S \subseteq Z, then the subspace S of Z is a connected metric space iff the subspace S of X is a connected metric space. Moreover, for a connected subspace X of X with S \subseteq A \subseteq \bar{S}, the subspace A is connected. In particular, \bar{S} itself is connected.

Every continuous image of a connected metric space is connected. In particular, for nonempty S \subseteq \mathbb{R}, S is connected iff S is an interval. This is a generalisation of the Intermediate Value Theorem (to see this, consider the continuous functions f : X \to \mathbb{R}).

Finite products of connected subsets endowed with a product metric are connected. Unions of chained collections (i.e. sequences of subsets whose sequence neighbours are non-disjoint) of connected subsets are themselves connected.

A connected component U of a metric space X is a subset that is connected and which has no proper superset that is also connected – a kind of maximal connected subset. It turns out that the connected components of a metric space X are mutually disjoint, all closed in X, and X is the union of its connected components.

A path in metric space X is a continuous function f : [0, 1] \to X. (These functions turn out to be uniformly continuous.) This definition allows us to consider a stronger notion of connectedness: a metric space X is pathwise connected iff for each a, b \in X there is a path in X with endpoints a and b. An example given by Ó Searcóid of a space that is connected but not pathwise connected is the closure in \mathbb{R}^2 of \Gamma = \{ (x, \sin (1/x) | x \in \mathbb{R}^+ \}. From one of the results above, \bar{\Gamma} is connected because \Gamma is connected. But there is no path from, say, (0,0) (which nevertheless is in \bar{\Gamma}) to any point in \Gamma.

Every continuous image of a pathwise connected metric space is itself pathwise connected.

For a linear space, an even stronger notion of connectedness is polygonal connectedness. For a linear space X with subset S and a, b \in S, a polygonal connection from a to b in X is an n-tuple of points (c_1, \ldots c_n) s.t. c_1 = a, c_n = b and for each i \in \{1, 2, \ldots, n-1\}, \{(1 - t)c_i + t c_{i+1} | t \in [0,1] \} \subseteq S. We then say a space is polygonally connected iff there exists a polygonal connection between every two points in the space. Ó Searcóid gives the example of \{ z \in \mathbb{C} | \; |z|= 1 \} as a pathwise connected but not polygonally connected subset of \mathbb{C}.

Although in general these three notions of connectedness are distinct, they coincide for open connected subsets of normed linear spaces.

12. Compactness

Ó Searcóid gives a number of equivalent characterisations of compact non-empty metric spaces X, some of the ones I found most interesting and useful for the following material include:

  • Every open cover for X has a finite subcover
  • X is complete and totally bounded
  • X is a continuous image of the Cantor set
  • Every real continuous function defined on X is bounded and attains its bounds

The example is given of closed bounded intervals of \mathbb{R} as archetypal compact sets. An interesting observation is given that ‘most’ metric spaces cannot be extended to compact metric spaces, simply because there aren’t many compact metric spaces — as noted above in the section on bounds, there are certainly no more than |\mathbb{R}|, given they’re all images of the Cantor set.

If X is a compact metric space and S \subseteq X then S is compact iff S is closed in X. This follows because S inherits total boundedness from X, and completeness follows also if S is closed.

The Inverse Function Theorem states that for X and Y metric spaces with X compact, and for f : X \to Y injective and continuous, f^{-1}: f(X) \to X is uniformly continuous.

Compactness plays well with intersections, finite unions, and finite products endowed with a product metric. The latter is interesting, given that we noted above that for non conserving product metrics, total boundedness doesn’t necessarily carry forward.

Things get trickier when dealing with infinite-dimension spaces. The following statement of the Arzelà-Ascoli Theorem is given, which allows us to characterise the compactness of a closed, bounded subset of \mathcal{C}(X,Y) for compact metric spaces X and Y:

For each x \in X, define \hat{x}: S \to Y by \hat{x}(f) = f(x) for each f \in S. Let \hat{X} = \{\hat{x} | x \in X \}. Then:

  • \hat{X} \subseteq B(S,Y) and
  • S is compact iff x \to \hat{x} from X to B(S,Y) is continuous

13. Equivalence

Consider a set X and the various metrics we can equip it with. We can define a partial order \succeq on these metrics in the following way. d is topologically stronger than e, d \succeq e iff every open subset of (X,e) is open in (X,d). We then get an induced notion of topological equivalence of two metrics, when d \succeq e and e \succeq d.

As well as obviously admitting the same open subsets, topologically equivalent metrics admit the same closed subsets, dense subsets, compact subsets, connected subsets, convergent sequences, limits, and continuous functions to/from that set.

It turns out that two metrics are topologically equivalent iff the identity functions from (X,d) to (X,e) and vice versa are both continuous. Following the discussion above relating to continuity, this hints at potentially stronger notions of comparability – and hence of equivalence – of metrics, which indeed exist. In particular d is uniformly stronger than e iff the identify function from (X,d) to (X,e) is uniformly continuous. Also, d is Lipschitz stronger than e iff the identity function from (X,d) to (X,e) is Lipschitz.

The stronger notion of a uniformly equivalent metric is important because these metrics additionally admit the same Cauchy sequences, totally bounded subsets and complete subsets.

Lipschitz equivalence is even stronger, additionally providing the same bounded subsets and subsets with the nearest-point property.

The various notions of equivalence discussed here collapse to a single one when dealing with norms. For a linear space X, two norms on X are topologically equivalent iff they are Lipschitz equivalent, so we can just refer to norms as being equivalent. All norms on finite-dimensional linear spaces are equivalent.

Finally, some notes on the more general idea of equivalent metric spaces (rather than equivalent metrics.) Again, these are provided in three flavours:

  • topologically equivalent metric spaces (X,d) and (Y,e) are those for which there exists a continuous bijection with continuous inverse (a homeomorphism) from X to Y.
  • for uniformly equivalent metric spaces, we strengthen the requirement to uniform continuity
  • for Lipschitz equivalent metric spaces, we strengthen the requirement to Lipschitz continuity
  • strongest of all, isometries are discussed above

Note that given the definitions above, the metric space (X,d) is equivalent to the metric space (X,e) if d and e are equivalent, but the converse is not necessarily true. For equivalent metric spaces, we require existence of a function — for equivalent metrics this is required to be the identity.