It Probably Works!

Followers of my research will know that I’ve long been interested in rounding errors and how they can be controlled to best achieve efficient hardware designs. Going back 20 years, I published a book on this topic based on my PhD dissertation, where I addressed the question of how to choose the precision / word-length (often called ‘bit width’ in the literature) of fixed point variables in a digital signal processing algorithm, in order to achieve a controlled tradeoff between signal-to-noise ratio and implementation cost.

Fast forward several years, and my wonderful collaborators Fredrik Dahlqvist, Rocco Salvia, Zvonimir Rakamarić and I have a new paper out on this topic, to be presented by Rocco and Fredrik at CAV 2021 next week. In this post, I summarise what’s new here – after all, the topic has been studied extensively since Turing!

I would characterise the key elements of this work as: (i) probabilistic, i.e. we’re interested in showing that computation probably achieves its goal, (ii) floating point (especially of the low custom-precision variety), and (iii) small-scale computation on straight-line code, i.e. we’re interested in deep analysis of small kernels rather than very large code, code with complex control structures, or code operating on very large data structures.

Why would one be interested in showing that something probably works, rather than definitely works? In short because worst-case behaviour is often very far from average case behaviour of numerical algorithms, a point discussed with depth in Higham and Mary’s SIAM paper. Often, ‘probably works’ is good enough, as we’ve seen recently with the huge growth of machine learning techniques predicated on this assumption.

In recent work targeting large-scale computation, Higham and Mary and, independently, Ipsen, have considered models of rounding error that are largely / partially independent of the statistical distribution of the error induced by a specific rounding operation. Fredrik was keen to take a fresh look at the kind of distributions one might see in practice, and in our paper has derived a ‘typical distribution’ that holds under fairly common assumptions.

Rocco and Fredrik then decided that a great way to approximate the probabilistic behaviour of the program is to sandwich whatever distribution is of interest between two other easy to compute distributions, utilising the prior idea of a p-box.

One of the core problems of automated analysis of numerical programs has always been that of ‘dependence’. Imagine adding together two variables each in the range [-1,1]. Clearly their sum is in the range [-2,2]. But what if we knew, a priori, that these two variables were related somehow? For example in the expression X + (-X), which is clearly always zero. Ideally, an automated system should be able to produce a tighter result that [-2,2] for this! Over the years, many approaches to dealing with this issue have arisen, from very the very simple approach of affine arithmetic to the more complex semialgebraic techniques Magron, Donaldson and myself developed using sequences of semidefinite relaxations. In our CAV paper, we take the practical step of cutting-out regions of the resulting probability space with zero probability using modern SMT solver technology. Another interesting approach used in our paper is in the decision of which nonlinear dependences to keep and which to throw away for scalability reasons. Similar to my work with Magron, we keep first-order dependence on small rounding error variables but higher-order dependence on input program variables.

I am really excited by the end result: not only a wonderful blend of ideas from numerical analysis, programming languages, automated reasoning and hardware, but also a practical open-source tool people can use: Please give it a try!

Readers interested in learning more about the deeply fascinating topic of numerical properties of floating point would be well advised to read Higham’s outstanding book on the topic. Readers interested in the proofs of the theorems presented in our CAV paper should take a look at the extended version we have on arXiv. Those interested in some of the issues arising (in the worst case setting) when moving beyond straight-line code could consult this paper with Boland. Those interested in the history of this profoundly beautiful topic, especially in its links to linear algebra, would do well to read Wilkinson.

Scheduling with Probabilities

Readers of this blog may remember that Jianyi Cheng, my PhD student jointly supervised by John Wickerson, has been investigating ways to combine dynamic and static scheduling in high-level synthesis (HLS). The basic premise has been that static scheduling, when it works well due to static control, works very well indeed. Meanwhile, for programs exhibiting highly dynamic control flow, static scheduling can be very conservative, a problem addressed by our colleagues Lana Josipović, Radhika Ghosal and Paolo Ienne at EPFL. Together with Lana and Paolo, we developed a scheme to combine the best of both worlds, which we published at FPGA 2020 (and recently extended in IEEE Transactions on CAD). I blogged about this work previously here. We provided a tool flow allowing us to stitch large efficient statically-scheduled components into a dynamic circuit.

However, when scheduling a circuit statically, there are many design choices that can be made, typically to trade off time (throughput, latency) against area. So while our previous work was useful to stitch pre-existing statically-scheduled components into a dynamically-scheduled environment, we had no way of automatically designing those components to optimally fit the dynamic environment.

Enter Jianyi’s latest contribution – to be presented at FCCM 2021 next week.

In his paper “Probabilistic Scheduling in High-Level Synthesis”, Jianyi tackles this problem. He demonstrates that the dynamic environment, including data-dependent decisions and even load-store queues, can be adequately modelled using a Petri net formalism, and uses the PRISM model checker from Kwiatowska et al. to extract an appropriate initiation interval for each statically-scheduled component.

One of Jianyi’s Petri net models of some memory accesses.

The initiation intervals inferred by Jianyi’s tool can then be given to a commercial HLS tool – in our case Vitis HLS – to schedule each component. The components – together with any remaining dynamically-scheduled code – is then integrated using our previously published framework, producing the complete FPGA-ready design. The whole process provides a quality of result very close to an exhaustive search of possible initiation intervals, without having to perform multiple scheduling runs, and so in a fraction of the time.

Easter Coq

This Easter I set myself a little challenge to learn a little bit of Coq – enough to construct a proof of a simple but useful theorem in computer arithmetic. Long-time readers of this blog will know that this is not my first outing with dependent types, though I’ve never used them in anger. Four years ago – also during the Easter break! – I read Stump‘s book on Agda and spent some time playing with proofs and programming, as I documented here.

This blog post documents some of the interesting things in Coq I observed over the last few days. I’ve decided to write the majority of this post in Coq itself, below, before finishing off with some concluding remarks. In this way, anyone really interested can step through the definitions and proofs themselves.

 * A first datapath identity
 * George A. Constantinides, 2/4/21
 * This is an attempt to learn some basic Coq by proving a standard identity used in computer arithmetic,
 * namely \bar{x} + 1 = \bar{x – 1}. 
 * This identity is useful because it allows Boolean operations to move through arithmetic operations.
 * The code is for learning and teaching purposes only. It is not intended to be an efficient or elegant
 * approach, nor is it intended to make best use of existing Coq libraries. On the contrary, I have often used
 * many steps when one would do, so we can step through execution and see how it works.

Require Import Coq.Program.Equality.
Require Import Coq.Logic.Eqdep_dec.
Require Import Coq.Arith.Peano_dec.

(* Create my own bitvector type. It has a length, passed as a nat, and consists of bools. *)
Inductive bv : nat -> Set :=
| nilbv : bv 0
| consbv : forall n : nat, bool -> bv n -> bv (S n).

(* Head and tail of a bitvector, with implicit length arguments *)
Definition hdi {n : nat} (xs : bv (S n)) :=
  match xs with
  | consbv _ head _ => head

Definition tli {n : nat} (xs : bv (S n)) :=
  match xs with
  | consbv _ _ tail => tail

(* The basic carry and sum functions of a Boolean full adder *)
Definition carryfunc (a : bool) (b: bool) (c: bool) : bool :=
  orb (orb (andb a b) (andb a c)) (andb b c).

Definition sumfunc (a : bool) (b : bool) (c : bool) : bool :=
  xorb (xorb a b) c.

 * A ripple carry adder, with implicit length argument
 * Note that this definition makes use of a trick known as the ‘convoy pattern’ [1]
 * to get the dependent typing to work in a match clause. We use a ‘return’ clause
 * to make the type of the match result be a function which is then applied to the unmatched
 * argument. In this way the type system can understand that x and y have the same dependent type.
 * Note also the use of Fixpoint for a recursive definition.

Fixpoint rcai {n : nat} (x : bv n) (y : bv n) (cin : bool) : (bool * bv n) :=
  match x in bv n return bv n -> ( bool * bv n ) with
  | nilbv => fun _ => (cin, nilbv) (* an empty adder passes its carry in to its carry out *)
  | consbv n1 xh xt => fun y1 =>
       let (cout, sumout) := rcai xt (tli y1) (carryfunc cin xh (hdi y1)) in
                        (cout, consbv n1 (sumfunc cin xh (hdi y1)) sumout)
  end y.

(* We define addition modulo 2^n by throwing away the carry out, using snd, and then define an infix operator *)
Definition moduloadder {n : nat} (x : bv n) (y : bv n) : (bv n) :=
  snd (rcai x y false).

Infix “+” := moduloadder.

(* Bitwise negation of a word *)
Fixpoint neg {n : nat} (x : bv n) : (bv n) :=
  match x with
  | nilbv => nilbv
  | consbv n1 xh xt => consbv n1 (negb xh) (neg xt)

(* The word-level constant zero made of n zeros *)
Fixpoint bvzero {n : nat} : (bv n) :=
  match n with
  | O => nilbv
  | (S n1) => consbv n1 false bvzero

(* The word-level constant one with n leading zeros *)
Definition bvone {n : nat} :=
  consbv n true bvzero.

(* Additive inverse of a word, defined as ‘negate all the bits and add one’  *)
Definition addinv {n : nat} (x : bv (S n)) : (bv (S n)) :=
  neg(x) + bvone.

(* Subtraction modulo 2^n is defined as addition with the additive inverse and given its own infix operator *)
Definition modulosub {n : nat} (x : bv (S n)) (y : bv (S n)) :=
  x + (addinv y).

Infix “-” := modulosub.

(* a bit vector of just ones *)
Fixpoint ones {n : nat} : (bv n) :=
  match n with
  | O => nilbv
  | S n1 => consbv n1 true ones

(* OK, now we have some definitions, let’s prove some theorems! *)

(* Our first lemma (‘Lemma’ versus ‘Theorem’ has no language significance in Coq) says that inverting a
 * bitvector of ones gives us a bitvector of zeros.
 * There’s a couple of interesting points to note even in this simple proof by induction:
 * 1. I had to use ‘dependent destruction’,
 *    which is defined in the Coq.Program.Equality library, to get the destruction of variable x to take into account
 *    the length of the bitvector.
 * 2. The second use of inversion here didn’t get me what I wanted / expected, again due to dependent typing, for
 *    reasons I found explained in [2]. The solution was to use a theorem inj_pair_eq_dec, defined in 
 *    Coq.Logic.Eqdep_dec. This left me needing to prove that equality on the naturals is decidable. Thankfully,
 *    Coq.Arith.Peano_dec has done that.

Lemma invertzeros : forall {n : nat} (x : bv n),
  x = bvzero -> neg x = ones.
  intros n x H.
  induction n.
  dependent destruction x.
  auto. (* base case proved *)
  dependent destruction x.
  simpl bvzero in H.

  inversion H.

  simpl bvzero in H.
  inversion H. (* inversion with dependent type starts here…          *)
  apply inj_pair2_eq_dec in H2. (* goes via this theorem                                 *)
  2: apply eq_nat_dec. (* and completes via a proof of decidability of equality *)

  apply IHn.
  apply H2.

 * The next lemma says that if you fix one input to a ripple carry adder to zero and feed in the carry-in as zero
 * too, then the carry out will not be asserted and the sum will just equal the fixed input.
 * I proved this by induction, reasoning by case on the possible Boolean values of the LSB.
 * The wrinkle to notice here is that I didn’t know how to deal with a ‘let’ clause, but thanks to Yann Herklotz
 * ( who came to my aid by explaining that a ‘let’ is syntactic sugar for a match.

Lemma rcai_zero: forall (n : nat) (x : bv n),
  rcai x bvzero false = (false, x).
  intros n x.
  induction n.
  dependent destruction x.
  auto. (* base case proved *)
  dependent destruction x.
  simpl bvzero.
  simpl rcai.
  destruct b.
  unfold sumfunc. simpl.
  unfold carryfunc. simpl.

  destruct (rcai x bvzero false) eqn: H.

  rewrite IHn in H.
  inversion H.

  rewrite IHn in H.
  inversion H.

  unfold sumfunc. simpl.
  unfold carryfunc. simpl.

  destruct (rcai x bvzero false) eqn: H. (* The trick Yann taught me *)

  rewrite IHn in H.
  inversion H.

  rewrite IHn in H.
  inversion H.

 * The next lemma proves that -1 is a vector of ones
 * One thing to note here is that I needed to explicitly supply the implicit argument n to addinv using @.

Lemma allones: forall {n : nat}, @addinv n bvone = ones.
  intros n.
  induction n.
  auto. (* base case proved *)

  unfold bvone.
  unfold addinv.
  unfold bvone.
  unfold “+”.


  unfold carryfunc.
  unfold sumfunc.

  destruct (rcai (neg bvzero) bvzero false) eqn: H.


  rewrite rcai_zero in H.
  inversion H.

  apply invertzeros.

 * This lemma captures the fact that one way you can add one to a bitvector using a ripple carry adder is
 * to add zero and assert the carry in port. 

Lemma increment_with_carry : forall (n : nat) (x : bv (S n)),
  x + bvone = snd (rcai x bvzero true).
  intros n x.
  dependent destruction x.

  (* first peel off the LSB from the two operands *)

  simpl bvzero.
  simpl rcai.

  unfold bvone.
  unfold “+”.
  simpl rcai.

  (* now case split by the LSB of x to show the same thing *)

  destruct b.

  unfold carryfunc.
  unfold sumfunc.

  unfold carryfunc.
  unfold sumfunc.

(* This lemma says that if you add a vector of ones to a value x using a ripple carry adder, while asserting the
 * carry in port, then the sum result will just be x. Of course this is because -1 + 1 = 0, though I didn’t prove
 * it that way.
 * A neat trick I found to use in this proof is to use the tactic ‘apply (f_equal snd)’ on one of the hypotheses
 * in order to isolate the sum component in the tuple produced by the ripple carry function rcai.

Lemma rcai_ones_cin_identity : forall (n : nat) (x : bv n),
  snd (rcai x ones true) = x.
  intros n x.
  induction n.
  dependent destruction x.
  dependent destruction x.
  simpl ones.
  simpl rcai.

  (* case analysis *)
  destruct b.
  unfold carryfunc.
  unfold sumfunc.
  destruct (rcai x ones true) eqn: H.
  apply (f_equal snd) in H. (* a neat trick *)
  simpl in H.
  rewrite IHn in H.

  unfold carryfunc.
  unfold sumfunc.
  destruct (rcai x ones true) eqn: H.
  apply (f_equal snd) in H.
  simpl in H.
  rewrite IHn in H.

 * This lemma is actually the main content of what we’re trying to prove, just not wrapped up in
 * very readable form yet.
 * Note the use of ‘rewrite <-‘ to use an existing lemma to rewrite a term from the RHS of the equality
 * in the lemma to the LHS. Without the ‘<-‘ it would do it the other way round.

Lemma main_helper : forall (n : nat) (x : bv (S n)),
  neg (x + ones) = neg x + bvone.
  intros n x.
  induction n.
  dependent destruction x.
  destruct b.
  dependent destruction x.
  dependent destruction x.
  auto. (* base case proved *)

  dependent destruction x.
  unfold bvone.
  unfold “+”.
  simpl rcai.

  destruct b.
  unfold carryfunc.
  unfold sumfunc.

  rewrite rcai_zero.

  destruct (rcai x (consbv n true ones) true) eqn: H.
  simpl neg.
  simpl snd.

  apply (f_equal snd) in H.
  simpl snd in H.
  rewrite rcai_ones_cin_identity in H.

  unfold carryfunc.
  unfold sumfunc.

  destruct (rcai (neg x) (consbv n false bvzero)) eqn: H.
  apply (f_equal snd) in H.
  simpl snd in H.

  rewrite <- increment_with_carry in H.

  simpl snd.

  destruct (rcai x (consbv n true ones) false) eqn: H1.
  simpl snd.
  simpl neg.

  apply (f_equal snd) in H1.
  simpl snd in H1.

  rewrite <- H1.
  rewrite <- H.

  apply IHn.

Theorem main_theorem: forall (n : nat) (x : bv (S n)),
  neg x + bvone = neg (xbvone).
  intros n x.
  unfold “-“.
  rewrite allones.
  rewrite <- main_helper.

 * References
 * [1]
 * [2];

Some Lessons

So what have I learned from this experience, beyond a little bit of Coq? Firstly, it was fun. It was a nice way to spend a couple of days of my Easter holiday. I am not sure I would want to do it under time pressure, though, as it was also frustrating at times. If I ever wanted to use Coq in anger for my work, I would want to take a couple of months – or more – to really spend time with it.

On the positive side, Coq really forced me to think about foundations. What do I actually mean when I write \overline{x} + 1 = \overline{x - 1}? Should I be thinking in {\mathbb Z}, in {\mathbb Z}/n\mathbb{Z}, or in digits, and when? How should bitvector arithmetic behave on zero-sized bitvectors? (Oh, and I certainly did not expect to be digging out a proof of decidability of natural equality from Coq’s standard library to prove this theorem!) The negative side is the same: Coq really forced me to think about foundations. And I remain to be convinced that I want to do that when I’m not on Easter holiday and in a philosophical mood.

I loved the type system and the expression of theorems. I’m luke warm about the proof process. At least the way I wrote the proofs – which was probably intolerably amateur – it felt like someone could come along and change the tactics at some point and my proof would be broken. Maybe this is not true, but this is what it felt like. This was a different feeling to that I remember when playing with Agda four years ago, which felt like everything needed to be explicit but somehow felt more nailed down and permanent. In Agda, the proofs are written in the same language as the types and I enjoyed that, too. Both languages are based on dependent types, and so as – I understand – is Lean. My colleague Kevin Buzzard is a strong advocate of Lean. Perhaps that’s one for another Easter holiday!

Thinking about this proof from a hardware perspective – designing efficient bit-parallel arithmetic hardware – it is clear that we do not need to have proved the theorem for all n. Each bit slice occupies silicon area, and as this is a finite resource, it would be sufficient to have one proof for each feasible value of n. Of course, this makes things much easier to prove, even if it comes with much more baggage. I can fire up an SMT solver and prove the theorem completely automatically for a specific value of n. As an example, if you paste the code below into the Z3 prover (hosted at rise4fun), the solver will report unsat, i.e. there is provably no satisfying value of the variable x violating the theorem for n = 4.

(declare-fun x () (_ BitVec 4))
(assert (not (= (bvadd (bvneg x) #x1) (bvneg (bvadd x #xF)))))

There are pluses and minuses to this. On the plus side, the SMT query is fast and automatic. On the minus side, in addition to only being valid for n = 4, it gives me – and perhaps some future AI – none of the intuition as to why this theorem holds. When I read mathematics, the proofs are not incidental, they are core to the understanding of what I’m reading.

Will this also be true for future AI-driven EDA tools?


In case this is useful to anyone (or to me in the future): I got syntax highlighting playing well for Coq with by using coqdoc to generate HTML and CSS, then hacking at the CSS so that it didn’t affect the rest of my WordPress theme, pasting it into the CSS customiser, and putting the generated HTML in a HTML block. Take care to avoid the CSS class .comment, used by coqdoc for code comments but also used by WordPress for blog post comment formatting!

Thanks again to Yann Herklotz for help understanding let bindings in Coq.

False Negatives and False Positives

Yesterday, I posted some comments on Twitter regarding the press focus on lateral flow test (LFT) false positives on the eve of the return of most students to school in England. It seems that I should probably have written a short blog post about this rather than squeezing it into a few tweets, given the number of questions I’ve had about the post since then. This is my attempt.

The press seem to be focusing on the number of false positives we are likely to see on return to school and the unnecessarily lost learning this may cause. My view is that given the current high case numbers and slow rate of decline, this is not the primary issue we should be worried about. Primarily, we should be worried about the false negative rate of these tests. My concern is that the small number of true positives caught by these tests may have less impact on reducing the rate of infection than behavioural relaxation induced by these tests has on increasing the rate of infection. Time will tell, of course.

Let me explain some of the data that has been released, the conclusions I draw from it, and why false positive rate is important for these conclusions regarding false negative rates.

For any given secondary-age pupil, if given both a LFT and a PCR test, excluding void tests there are four possible outcomes: LFT+ & PCR+ (LFT positive & PCR positive), LFT+ & PCR-, LFT- & PCR+ and LFT- & PCR-. We can imagine these in a table of probabilities, as below.

PCR+??0.34% to 1.45%

Here the total LFT+ figure of 0.19% comes from the SchoolsWeek article for secondary school students based on Test and Trace data for late February, while the total PCR+ figure is the confidence interval provided in the REACT-1 9b data from, which says “In the latter half of round 9 (9b), prevalence varied from 0.21% (0.14%, 0.31%) in those aged 65 and over to 0.71% (0.34%, 1.45%) in those aged 13 to 17 years.” Note that REACT-1 9b ran over almost the same period as the Test and Trace data on which the SchoolsWeek article is based.

What I think we all really want to know is what is the probability that a lateral flow test would give me a negative result when a PCR test would give me a positive result? We cannot get this information directly from the table, but we can start to fill in some of the question marks. Clearly the total LFT- probability will be 100% – 0.19% = 99.81%, and the total PCR- probability will be 98.55% to 99.66%. What about the four remaining question marks?

Let’s consider the best case specificity of these tests: that all the 0.19% of LFT+ detected were true positives, i.e. the tests were producing no false positives at all. In that case, the table would look something like this:

PCR+~0.19%~0.15% to ~1.26%0.34% to 1.45%
PCR-0.00%98.55% to 99.66%98.55% to 99.66%

Under these circumstances, we can see from the table that the LFTs are picking up 0.19% out of 0.34% to 1.45%, so we can estimate that the most they’re picking up is 0.19/0.34 = 56% of the true positive cases.

However, this assumed no false positives at all, which is a highly unrealistic assumption. What if we consider a more realistic assumption on false positives? The well-cited Oxford study gives a confidence interval for self-trained operatives of 0.24% to 0.60% false positives. Note that the lower end of this confidence interval would suggest that we should see at least 0.24% x 98.55% = ~0.24% of positive LFTs just from false positives alone. This is a higher value than the LFT positive rate we saw over this period of 0.19% (as noted by Deeks here). So this means it’s also entirely feasible that none of the LFT+ results were true positives, i.e. the results table could look more like this:

PCR+0%0.34% to 1.45%0.34% to 1.45%
PCR-~0.19%~98.36% to ~99.47%98.55% to 99.66%

Now this time round, we can see that the tests are picking up 0% out of 0.34% to 1.45%, so we can estimate that they’re picking up 0% of the true positive cases (i.e. 100% false negatives).

This is why I think we do need to have a conversation about false positives. Not because of a few days of missed school, as reported in the press, but because hiding behind these numbers may be a more significant issue of a much higher false negative rate than we thought, leading to higher infections in schools as people relax after receiving negative lateral flow tests.

Perhaps most importantly, I think the Government needs to commit to following the recommendations of the Royal Statistical Society, which would enable us to get to the bottom of exactly what is going on here with false positive and false negative rates.

(Note that I have assumed throughout this post that LFTs are being used as a ‘quick and easy’ substitute for PCRs, so that the ideal LFT outcome is to mirror a PCR outcome. I am aware that there are those who do not think this is the case, and I am not a medical expert so will not pass further comment on this issue.)

GCSEs and A-Levels in 2021: Current Plans

This week, a flurry of documents were released by Ofqual and the Department for Education in response to the consultation over what is to replace GCSEs and A-Levels in 2021 I blogged about previously.

In this post, I examine these documents to draw out the main lessons and evaluate them against my own submission to the consultation, which was based on my earlier post. The sources of information I draw on for this post are:

  1. A letter (‘Direction’) from the Secretary of State for Education to Ofqual sent to Ofqual on the 23rd February (and published on the 25th).
  2. Guidance on awarding qualifications in Summer 2021 published on the 25th February.
  3. Ofqual’s new Consultation on the general qualifications alternative awarding framework published on the 25th February, and to close on 11th March.

In my post on 16th January, I concluded that the initial proposals were complex and ill-defined, with scope to produce considerable workload for the education sector while still delivering a lack of comparability. The announcements this week from the Secretary of State and Ofqual have not helped allay my fears.

Curriculum Coverage

The decision has been made by Government that “teachers’ judgements this year should only be made on the content areas that have been taught.” However, the direction from the Secretary of State also insists that “teachers should assess as much course content as possible to ensure in the teachers’ judgement that there has been sufficient coverage of the curriculum to enable progression to further education, training, or employment, where relevant.”

Presumably, these two pieces of information are supposed to be combined to form a grade, or at least the latter modulates the decision over whether a grade is awardable. However, in what way this should happen is totally opaque. Let’s assume we have Student A who has only covered a small part of the GCSE maths curriculum, in the judgement of teachers ‘insufficient to enable progression to further education’ (In what? A-Level maths, or something else? Surely the judgement may depend on this.) However, Student A is ‘performing at’ a high grade (say Grade 8) in that small part of the curriculum. What do they get? A Grade 8? Lower than a Grade 8? No grade? How well will they do compared to Student B who has broad curriculum coverage, but little depth?

The issue of incomplete curriculum coverage has not nearly been addressed by these proposals.

The Return of Criterion Referencing

Several of us made the point in our submissions to the consultation that GCSE grades are norm-referenced, not criterion-referenced (with the exception of Grade 8, 5 and 2, as per the Grade Descriptors). As a result, if national comparability of exams is to be removed, then solid grade descriptors would need to be produced. Several of us suggested that the existence of so many GCSE grades, with such a low level of repeatability between assessors (see, e.g. Dennis Sherwood’s articles on this topic), suggests that grades should be thinned out this year. It seems that the Government agrees with this principle, as they have directed Ofqual to produce ‘grade descriptors for at least alternate grades’. This is good in as far as it goes, but if grade descriptors for only alternate grades are produced – quite rightly – then surely only alternate grades should be awarded!

In addition, if grade descriptors are to be useable no matter what narrow range of the curriculum has been covered, they will necessarily be very broad in nature, further narrowing the precision with which such judgements can be made. I await exemplar grade descriptors with some trepidation.

As I pointed out in my submission, calling 2021 and 2020 results GCSEs and A-Levels just misleadingly suggests comparability to previous years – better to wipe the slate clean and call them something else.

Fairness to Students: Lack of Comparability

One of my main concerns in the original proposal – not addressed in the final outcome – is that of fairness between centres and even between individual pupils in a centre. Centres will be allowed to use work conducted under a wide variety of circumstances, and while they are supposed to sign off that they are ‘confident’ that this work was that of the student without ‘inappropriate’ levels of support, I am not clear how they are supposed to gain that confidence for work conducted during lockdown, for example – which is explicitly allowed. There are many opportunities for unfairness here even within a centre.

Now if we bring in different centres having covered different parts of the curriculum and using different methodologies for quality assurance, the scope for unfairness is dramatic. The problem for students is that such unfairness will be much harder to identify come Summer compared to 2020, when ‘mutant algorithms’ could be blamed.

It seems odd to me that centres are asked to “use consistent sources of evidence for a class or cohort” yet there does not appear to be any reasonable attempt to maintain that consistency between cohorts at different centres. Exam boards will be asked to sample some subjects in some centres to check grades, but given the points I raise above over curriculum coverage it is very unclear how it will be possible to reach the conclusion that the incorrect grade has been awarded in all but the most extreme cases.

The guarantee of an Autumn exam series (as in 2020) is, however, to be welcomed.

Kicking the Can to Exam Boards

It is now the job of exam boards to come up with “a list of those sources of and approaches to collecting evidence that are considered most effective in determining grades” by the end of March. Good luck to them.

Exam boards must also undertake checks of all centres’ internal quality assurance processes before grades are submitted to them on the 18th of June. It is unclear what these checks will entail – I find this hard to imagine. Schools: please do share your experience of this process with me.

Teacher Workload

One of the big concerns I had in my original consultation response was over the increase in teacher workload. One aspect of this has been addressed: unlike in the draft proposals, teachers will no longer be responsible for appeals. However, there is still a very considerable additional workload involved in: getting to grips with the assessment materials released by the exam boards, developing an in-house quality assurance system, getting that agreed by the exam board, making an assessment of students, showing the students the evidence on which this assessment is based (this is a requirement), and submitting the grades, all over the window 1st April 2021 – 18th June 2021. I asked in my consultation response whether additional funding will be made available for schools, e.g. to provide cover for the release time required for this work. No answer has been forthcoming.

The development of an in-house quality assurance system is non trivial. The proposed GQAA framework imposes the requirements for such a system to have:

  • ‘a set policy on its approach to making judgements in relation to each Teacher Assessed Grade, including how Additional Assessment Materials and any other evidence will be used,’
  • ‘internal arrangements to standardise the judgements made in respect of the Centre’s Learners and a process for internal sign-off of each Teacher Assessed Grade,’
  • ‘a comparison of the Teacher Assessed Grades to results for previous cohorts at the Centre taking the same qualification to provide a high-level cross-check to ensure that Teacher Assessed Grades overall are not overly lenient or harsh compared to results in previous years,’
  • ‘specific support for newly qualified Teachers and Teachers less familiar with assessment, and
  • ‘a declaration by the head of Centre.’

The third bullet point seems logically impossible to achieve. Results this year will not be comparable to previous years as they will be using a different system based on different evidence. So there appears to be no way to check whether TAGs (not CAGs this year!) are comparable to those in previous years.

Private Candidates

Private candidates got a really bad deal last year. This year we are told that private candidates “should be assessed in a similar way to other students”, but that this will be “using an adapted range of evidence”. I’m not completely convinced that these two statements are logically consistent. It will be interesting to hear the experience of private candidates this year.

What about 2022?

It is unfortunate that schools will be left with such a narrow window of time, so we must start to think about 2022 right now. However, I note that in the current Consultation on the General Qualifications Alternative Awarding Framework, there is scope for whatever is decided now to bleed into 2022: ”We have not proposed a specific end date for the framework because it is possible some measures will be required for longer than others. Instead we propose that the GQAA Framework will apply until we publish a notice setting an end date.”

All the more reason to get it right now.

GCSEs and A-Levels in 2021

I have collected my initial thoughts after reading the Ofqual consultation, released on the 15th January 2021, over GCSE and A-Level replacements for this year. Alongside many others, I submitted proposals for 2020 which I felt would have avoided some of the worst outcomes we saw in Summer last year. My hope is that, this year, some of the suggestions will be given greater weight.

The basic principle underlying the Ofqual consultation is that teachers will be asked to grade students, that they can use a range of different evidence sources to do so, and that exam boards will be asked to produce mini tests / exams as one such source of evidence. This is not unlike the approach used in Key Stage 1 assessments (“SATs”) in primary schools in recent years. The actual process to be used to come up with a summary grade based on various sources of information is not being consulted over now, and it appears this will come from exam boards in guidance issued to teachers at some undetermined time in the future. This is a significant concern, as the devil really will be in the detail.

Overall, I am concerned that the proposed process is complex and ill-defined. There is scope to produce considerable workload for the education sector while still delivering a lack of comparability between centres / schools. I outline my concerns in more detail below.

Exam Board Papers – What are They For?

Ofqual is proposing that exam boards provide teachers papers (‘mini exams’) to “support consistency within and between schools and colleges” and that they “could also help with appeals”. However, it is very unclear how these papers will achieve these objectives. Papers might be sat at school or at home (p.18), they might be under supervision and they might not. Teachers might be asked to ‘remotely supervise’ these tests (p.18). These choices could vary on a per-pupil basis. The taking of tests may even be optional, and certainly teachers “should have some choice” over which questions are answered by their students. Grades will not be determined by these papers, so at best they will form one piece of evidence. If consistency is challenged, will the grades on these papers (when combined in some, as yet undetermined way) overrule other sources of information? This could be a cause of some confusion and needs significant clarity. The scope for lack of comparability of results between centres is significant when placing undue weight on these papers, and I am left wondering whether the additional workload for teachers and exam boards required to implement this proposal is really worth it.

If tests are to be taken (there are good reasons to suggest that they may be bad idea in their currently-envisaged form – see below), then I agree with Ofqual that – in principle – the ideal place to take them is in school (p.18). However, it is absolutely essential that school leaders do not end up feeling pressured to open to all students in an unsafe environment, due to the need for these tests. This is a basic principle, and I would resist any move to place further pressure on school leaders to fully open their schools until it is safe to do so.

Quality Assurance by Exam Boards

The main mechanism being proposed to ensure comparability and fairness between two centres / schools is random sampling (p.20-21). The exam board will sample the evidence base of a particular school for a particular subject, and query this with the school if they feel there is inadequate evidence to support the grades (it is not clear in the consultation whether the sampling will be of all pupils or individual pupils at that centre). This is a reasonable methodology for that particular subject / centre / student, but there is a major piece of information missing to enable judgement of whether this is sufficient for quality assurance of the system as a whole: what proportion of student grades will be sampled in this way? My concern is that the resources available to exam boards will be too small for this to be a large enough sample and that therefore the vast majority of grades awarded will be effectively unmoderated. This approach appears to be motivated by avoiding the bungled attempt at algorithmic moderation proposed in 2020, but without adequate resourcing, comparability between centres is not guaranteed to be better than it was under the abandoned 2020 scheme, and may even be worse.

Moreover, the bar for changing school grades appears to be set very high: “where robust investigation indicates that guidance has not been followed, or malpractice is found” (p.21), so I suspect we are heading towards a system of largely unmoderated centre-assessed grades. In 2020, centres were not aware at the point of returning CAGs that these would end up being given in unmoderated form, and therefore many centres appear to have been cautious when awarding high grades. Will this still be the case in 2021?

Curriculum Coverage

It is acknowledged throughout the consultation that centres / schools will have been unable to cover the entire curriculum in many cases. There appear to be two distinct issues to be dealt with here:

A. How to assess a subject with incomplete coverage

There are many ways this could be done. For the sake of argument, consider this question in the simplest setting of an exam. Here, the most direct approach would be simply to assess the entire curriculum, acknowledging that many more students would be unable to answer all questions this year, but re-adjusting grade boundaries to compensate. This may not be the best approach for student wellbeing, however, and in any case the proposal to use non-controlled assessment methods opens up much more flexibility. My concern is that flexibility almost always comes at the cost of comparability.

Ofqual are proposing that teachers have the ability to differentially weight different forms of assessment (e.g. practicals in the sciences). Is is unclear in the consultation whether this is on a per-student or on a per-centre basis – either brings challenges to fairness and transparency, and this point needs to be clarified quite urgently. They are also effectively proposing that teachers can give zero weight to some elements of the curriculum by choosing not to set / use assessments based on these elements. It is as yet undecided whether past work and tests can be used, or whether only work from now on – once students are aware it can be used for these purposes. It is opaque in the consultation how they are proposing to combine these various partial assessments. One approach I would not like to see is a weighted average of the various pieces of evidence available. A more robust approach, and one which may overcome some objections to using prior work, may be to allow teachers to select a number of the highest-graded pieces of work produced to date – a ‘curated portfolio’ approach. This may mitigate against both incomplete curriculum coverage and different student attitudes to summatively-assessed work versus standard class / homework.

B. How to ensure fairness

The consultation acknowledges that students in different parts of the country may have covered different amounts of the curriculum, due to local COVID restrictions. There is an unavoidable tension, therefore, between ‘assessment as a measure of what you can do’ and ‘assessment as a measure of what you can do, under the circumstances you were in’. This tension will not go away, and the Government needs to pick an option as a political decision. Some forms of assessment may mitigate this problem, to a degree, such as the ‘curated portfolio’ proposal made above, but none will solve it.


It is proposed that students are able to appeal to the exam board only ‘on the grounds that the school or college had not acted in line with the exam board’s procedural requirements’ (p.23). I am rather unclear how students are supposed to obtain information over the procedures followed at the school / college, so this sets a very high bar for appeals to the board. Meanwhile, the procedure for appeal to the school (p.23) appears to have a very low bar, and thus could potentially involve a significant extra workload for school staff. There is some suggestion that schools could be allowed to engage staff from other schools to handle marking appeals. If adequately financially resourced, Ofqual may wish to make this mandatory, to avoid conflicts of interest.

It is unclear in the consultation whether students will be able to appeal on the basis of an unfair weighting being applied to different elements of the curriculum (p.14). This could add an additional layer of complexity.

Grade Boundary Cliff-Edges

Grade boundaries have always been problematic. Can we really say that a student one mark either side of a Grade A boundary is that different in attainment? Last year, a bungled attempt was made to address this concern by requiring submission of student rankings within grade boundaries. Centre-Assessed Grades (CAGs) last year were optimistic, but this should come as no surprise – given a candidate I believe has a 50/50 chance of either getting an A or a B, why on earth would I choose a B? This issue will persist under the proposals for 2021, and I believe may be amplified by the knowledge that an algorithmic standardisation process will not be used. I suspect we may see even more complaints about ‘grade inflation’ in 2021, with significant knock-on effects for university admissions and funding. The root cause of this problem appears to be the aim to maintain the illusion of comparability between years for GCSE and A-Level results.


There are very significant workload implications for teachers, for school leaders, and for exam boards in these proposals – far more so than in 2020 arrangements. This workload has explicitly not yet been quantified in the consultation. I believe it needs to be quantified and funded: centres should receive additional funding to support this work, and teachers need to be guaranteed additional non-contact time to undertake the considerable additional work being requested of them.

Private Candidates

Private candidates, such as home-educated students, got a very poor deal last year. This must not be repeated, especially since many of the students who would be taking GCSEs and A-Levels this year are exactly the same home-educated students who decided to postpone for one year as a result of the changes last year. I am concerned to ensure comparability of outcomes between private candidates and centre-based candidates, and I am worried that two of the four proposed mechanisms for private candidates essentially propose a completely different form of qualification for these candidates.

Are 2021 (and 2020) qualifications actually GCSEs and A-Levels?

By labelling the qualifications of 2021 as GCSEs / A-Levels, rather than giving them a different title, there is an implicit statement of comparability between grades awarded in 2021 and those in previous years, which is rather questionable. Others made the point that in 2020 it may have been better to label these qualifications differently – the same argument applies in 2021. Even Ofqual implicitly make this point (p.27) when presenting the argument against overseas candidates taking exams as normal “might give rise to comments that there were 2 types of grades awarded”. The reality is that there will at least three types of grades awarded in recent years, pre-2020, 2020, and 2021. Is it time to face up to this and avoid the pretence of comparability between these different systems?

Equality Considerations

Ofqual seem to believe that if exam boards publish the papers / tests / mini-exams ‘shortly before’ they are taken then that will avoid leaking information but won’t put some students at a disadvantage because ‘students would not know which one(s) they would be required to complete’. I can envisage a situation where some students try to prepare for all published papers the moment they are released online, potentially a much greater number of papers than they will be required to sit, leading to considerable stress and anxiety, with potential equalities implications.

From the consultation, is not clear how exam board sampling will work, but there is the opportunity to bias the sampling process to help detect and correct for unconscious bias, if equalities information is available to exam boards. This could be considered.


On p.29, Ofqual state that ‘The usual assurances of comparability between years, between individual students, between schools and colleges and between exam boards will not be possible.’ This is not inspiring of confidence, but is honest. The question is how we can mitigate these impacts as far as possible. I hope Ofqual will listen carefully to the suggestions for 2021, and publish the approach taken in plenty of time. Releasing the algorithm used in 2020 on the day of A-level result release was unacceptable, and I hope Ofqual have learnt from this experience.

Watch Where You’re Pointing That!

This week Nadesh Ramanathan, a member of research staff in my group, will be presenting a paper at the virtual FPL 2020 conference entitled “Precise Pointer Analysis in High Level Synthesis” (jointly with John Wickerson and myself). This blog post is intended as an accessible summary of the key message of the paper.

People are now aiming to generate hardware accelerators for more complex algorithms than things like classical CNNs, low-level image processing tasks, and other bread-and-butter hardware acceleration tasks. Inevitably, this is a difficult task to get right, and the prevalence of C/C++-based high-level synthesis (HLS) tools offers a great opportunity to experiment with the design space. Sophisticated algorithms written in C/C++ often incorporate pointers, which have long been difficult for HLS tools. Previously, I proposed a relatively sophisticated analysis using separation logic, together with my PhD student Felix Winterstein, which is an intensive analysis specialised to certain data structures. Nadesh’s most recent work can, in some sense, be viewed as the opposite. He is trying to make more simple, but more generally applicable pointer analyses more widely understood and used within HLS, while trying to quantify how much this might bring to hardware accelerator design.

The basic idea is that since FPGA compile times are long, we can afford to spend a bit more time being precise about which variables can point to which other variables. The question is, what are the benefits of being more precise in the context of HLS? Nadesh has studied two different types of ‘sensitivity’ of pointer analyses – to flow and to context. Flow-sensitive analyses consider the ordering of memory operations, context sensitive analyses consider the calling context of functions. The most common form of analysis in HLS is Andersen analysis, which is neither flow- nor context-sensitive. So how much do we gain by utilising more precise analyses?

Nadesh studies this question by modifying the LegUp source code, showing that over the PTABen benchmark set, area utilisation can be halved and performance doubled by using these analyses. This suggests that as we move towards greater diversity in hardware accelerators, HLS tool developers should think carefully about their pointer analyses.

A-Levels and GCSEs in 2020

This week was A-Level results day. It was also the day that Ofqual published its long-awaited standardisation algorithm. Full details can be found in the 319-page report. In this blog post, I’ve set down my initial thoughts after reading the report.


I would like to begin by saying that Ofqual was not given an easy task: produce a system to devise A-level and GCSE grades without exams or coursework. Reading the report, it is clear that they worked hard to do the best they could within the confines they operate, and I respect that work. Nevertheless, I have several concerns to share.


1. Accounting for Prior Attainment

The model corrects for differences between historical prior attainment and prior attainment of the 2020 cohort in the following way (first taking into account any learners without prior attainment measures.) For any particular grade, the proportion to be awarded is equal to the historical proportion at that grade adjusted by a factor referred to in the report as q_{kj} - p_{kj}. (See p.92-93 of the report, which incidentally has a typo here — c_k should read c_{kj}.) As noted by the Fischer Family Trust, it appears that this factor is based solely on national differences in value added, and this could cause a problem. To illustrate this requires an artificial example. Imagine that Centre A has a historical transition matrix looking like this – all of its 200 students have walked away with A*s in this subject in recent years, whether they were in the first or second GCSE decile (and half were in each). Well done Centre A!

GCSE DecileA*A

Meanwhile, let’s say the national transition matrix looks more like this:

GCSE DecileA*A

Let’s now look at 2020 outcomes. Assume that this year, Centre A has an unusual cohort: all students were second decile in prior attainment. It seems natural to expect that it would still get mainly A*s, consistent with its prior performance, but this is not the outcome of the model. Instead, its historical distribution of 100% A*s is adjusted downwards because of the national transition matrix. The proportion of A*s at Centre A will be reduced by 40% – now only 60% of them will get A*s! This happens because the national transition matrix expects a 50/50 split of Decile 1 and Decile 2 students to end up with 50% A* and a Decile 2-only cohort to end up with 10% A*, resulting in a downgrade of 40%.

2. Model accuracy

Amongst the various possible standardisation options, Ofqual evaluated accuracy based on trying to predict 2019 exam grades and seeing how well they matched to awarded exams. This immediately presents a problem: no rank orders were submitted for 2019 students, so how is this possible? The answer provided is “the actual rank order within the centre based on the marks achieved in 2019 were used as a replacement“, i.e. they back-fitted 2019 marks to rank orders. This only provides a reasonable idea of accuracy if we assume that teacher-submitted rank orders in 2020 would exactly correspond to mark orders of their pupils, as noted by Guy Nason. Of course this will not be the case, so the accuracy estimates in the Ofqual report are likely to be significant overestimates. And they’re already not great, even under a perfect-ranking assumption: Ofqual report that only 12 out of 22 GCSE subjects were accurate to within one grade, with some subjects having only 40% accuracy in terms of predicting the attained grade – so one is left wondering what the accuracy might actually be for 2020 once rank-order uncertainty is taken into account.

There may also be a systematic variation in the accuracy of the model across different grades, but this is obscured by using the probability of successful classification across any grade as the primary measure of accuracy. Graphs presented in the Ofqual report suggest, for example, that the models are far less accurate at Grade 4 than at Grade 7 in GCSE English.

3. When is a large cohort a large cohort?

A large cohort, and therefore one for which teacher-assessed grades are used at all, is defined in the algorithm to be one with at least 15 students. But how do we count these 15 students? The current cohort or the historic cohort, or something else? The answer is given in Ofqual’s report: the harmonic mean of the two. As an extreme example of this, centre cohorts can be considered “large” with only 8 pupils this year – so long as they had at least 120 in the recent past. It seems remarkable that a centre could have fewer pupils than GCSE grades and still be “large”!

4. Imputed marks fill grade ranges

As the penultimate step in the Ofqual algorithm, “imputed marks” are calculated for each student – a kind of proxy mark equally spaced between grade end-points. So, for example, if Centre B only has one student heading for a Grade C at this stage then – by definition – it’s a mid-C. If they had two Grade C students, they’d be equally spaced across the “C spectrum”. This means that in the next step of the algorithm, cut-score setting, these students are vulnerable to changing grades. For centres which tend to fill the full grade range anyway, this may not be an issue. But I worry that we may see some big changes at the edges of centre distributions as a result of this quirk.

5. No uncertainty quantification

Underlying many of these concerns is, perhaps, a more fundamental one. Grades awarded this year come with different levels of uncertainty, depending on factors like how volatile attainment at the centre has been in the past, the size of the cohorts, known uncertainty in grading, etc. Yet none of this is visible in the awarded grade. In practice, this means that some Grade Cs are really “B/C”s while some are “A-E”, and we don’t know the difference. It is not beyond possibility to quantify the uncertainty – in fact I proposed awarding grade ranges in my original consultation response to Ofqual. This issue has been raised independently by the Royal Statistical Society and even for normal exam years, given the inherent unreliability of exam grades, by Dennis Sherwood. For small centres, rather than a statistically reasonable approach to widen the grade range, the impact of only awarding a single grade with unquantified uncertainty is that Ofqual have had to revert to teacher-assessed grades, leading to an unfair a “mix and match” system where some centres have had their teacher-assessed grades awarded while some haven’t.

What Must Happen Now?

I think everyone can agree that centres need to immediately receive all the intermediate steps in the calculations of their grades. Many examinations officers are currently scratching their heads, after having received only a small part of this information. The basic principle must be that centres are able to recalculate their grades from first principles if they want to. This additional information should include the proportion of pupils in both historical and current cohorts with matched prior attainment data for each subject and which decile each student falls into, the national transition matrices used for each subject, the values of q_{kj} and p_{kj} for each subject / grade combination, the imputed marks for each 2020 student, and the national imputed mark cut-points for each grade boundary in each subject.

At a political level, serious consideration should now be given to awarding teacher-assessed grades (CAGs) this year. While I was initially supportive of a standardisation approach – and I support the principles of Ofqual’s “meso-standardisation” – I fear that problems with the current standarisation algorithm are damaging rather than preserving public perception of A-Level grades. We may have now reached the point that the disadvantages of sticking to the current system are worse than the disadvantages of simply accepting CAGs for A-Levels.

Ofqual states in their report that “A key motivation for the design of the approach to standardisation [was] as far as possible [to] ensure that a grade represents the same standard, irrespective of the school or college they attended.”.  Unfortunately, my view is that this has not been achieved by the Ofqual algorithm. However, despite my concerns over Ofqual’s algorithm, it is also questionable whether any methodology meeting this objective could be implemented in time under a competitive education system culture driven by high-stakes accountability systems. Something to think about for our post-COVID world.

Some Notes on Metric Spaces

This post contains some summary informal notes of key ideas from my reading of Mícheál Ó Searcóid’s Metric Spaces (Springer, 2007). These notes are here as a reference for me, my students, and any others who may be interested. They are by no means exhaustive, but rather cover topics that seemed interesting to me on first reading. By way of a brief book review, it’s worth noting that Ó Searcóid’s approach is excellent for learning a subject. He has a few useful tricks up his sleeve, in particular:

  • Chapters will often start with a theorem proving equivalence of various statements (e.g. Theorem 8.1.1, Criteria for Continuity at a Point). Only then will he choose one of these statements as a definition, and he explains this choice carefully, often via reference to other mathematics.
  • The usual definition-theorem-proof style is supplemented with ‘question’ – these are relatively informally-stated questions and their answers. They have been carefully chosen to highlight some questions the reader might be wondering about at that point in the text and to demonstrate key (and sometimes surprising) answers before the formal theorem statement.
  • The writing is pleasant, even playful at times though never lacking formality. This is a neat trick to pull off.
  • There are plenty of exercises, and solutions are provided.

These features combine to produce an excellent learning experience.

1. Some Basic Definitions

A metric on a set X is a function d : X \times X \to {\mathbb R} such that:

  • Positivity: d(a,b) \geq 0 with equality iff a = b
  • Symmetry: d(a,b) = d(b,a)
  • Triangle inequality: d(a,b) \leq d(a,c) + d(c,b)

The combination of such a metric and a the corresponding set is a metric space.

Given a metric space (X,d), the point function at z is \delta_z : x \mapsto d(z,x).

A pointlike function u : X \to {\mathbb R}^\oplus is one where u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)

For metric spaces (X,d) and (Y,e), X is a metric subspace of Y iff X \subseteq Y and d is a restriction of e.

For metric spaces (X,d) and (Y,e), an isometry \phi : X \to Y is a function such that e(\phi(a),\phi(b)) = d(a,b). The metric subspace (\phi(X),e) is an isometric copy of (X,d).

Some standard constructions of metrics for product spaces:

  1. \mu_1 : (a,b) \mapsto \sum_{i=1}^n \tau_i(a_i,b_i)
  2. \mu_2 : (a,b) \mapsto \sqrt{\sum_{i=1}^n \left(\tau_i(a_i,b_i)\right)^2}
  3. \mu_\infty : (a,b) \mapsto \max\left\{\tau_i(a_i,b_i) | i \in {\mathbb N}\right\}

A conserving metric e on a product space is one where \mu_\infty(a,b) \leq e(a,b) \leq \mu_1(a,b). Ó Searcóid calls these conserving metrics because they conserve an isometric copy of the individual spaces, recoverable by projection (I don’t think this is a commonly used term). This can be seen because fixing elements of all-but-one of the constituent spaces makes the upper and lower bound coincide, resulting in recovery of the original metric.

A norm on a linear space V over {\mathbb R} or {\mathbb C} is a real function such that for x, y \in V and \alpha scalar:

  • ||x|| \geq 0 with equality iff x = 0
  • ||\alpha x|| = |\alpha|\; ||x||
  • ||x + y|| \leq ||x|| + ||y||

The metric defined by the norm is d(a,b) = ||a - b||.

2. Distances

The diameter of a set A \subseteq X of metric space (X,d) is \text{diam}(A) = \sup\{d(r,s) | r, s \in A\}.

The distance of a point x \in X from a set A \subseteq X is \text{dist}(x, A) = \inf\{ d(x,a) | a \in A\}.

An isolated point z \in S where S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) \neq 0.

An accumulation point or limit point z \in X of S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) = 0. Note that z doesn’t need to be in S. A good example is z = 0, X = {\mathbb R}, S = \{1/n | n \in {\mathbb N}\}.

The distance from subset A to subset B of a metric space is defined as \text{dist}(A,B) = \inf\{ d(a,b) | a \in A, b \in B\}.

A nearest point s \in S of S to z \in X is one for which d(z,s) = \text{dist}(z,S). Note that nearest points don’t need to exist, because \text{dist} is defined via the infimum. If a metric space is empty or admits a nearest point to each point in every metric superspace, it is said to have the nearest-point property.

3. Boundaries

A point a is a boundary point of S in X iff \text{dist}(a,S) = \text{dist}(a,S^c) = 0. The collection of these points is the boundary \partial S.

Metric spaces with no proper non-trivial subset with empty boundary are connected. An example of a disconnected metric space is X = (0,1) \cup (7,8) as a metric subspace of {\mathbb R}, while {\mathbb R} itself is certainly connected.

Closed sets are those that contain their boundary.

The closure of S in X is \bar{S} \triangleq X \cup \partial S. The interior is S \setminus \partial S. The exterior is (\bar{S})^c.

Interior, boundary, and exterior are mutually disjoint and their union is X.

4. Sub- and super-spaces

A subset S \subseteq X is dense in X iff \bar{S} = X, or equivalently if for every x \in X, \text{dist}(x,S) = 0. The archetypal example is that \mathbb{Q} is dense in \mathbb{R}.

A complete metric space X is one that is closed in every metric superspace of X. An example is \mathbb{R}.

5. Balls

Let b[a;r) = \{ x \in X | d(a,x) < r \} denote an open ball and similarly b[a;r] = \{ x \in X | d(a,x) \leq r \} denote a closed ball. In the special case of normed linear spaces, b[a;r) = a + rb[0;1) and similarly for closed balls, so the important object is this unit ball – all others have the same shape. A norm on a space V is actually defined by three properties such balls U must have:

  • Convexity
  • Balanced (i.e. x \in U \Rightarrow -x \in U)
  • For each x \in V \setminus \{0\}, the set \{ t \in \mathbb{R}^+ | t x \in U \},
    • is nonempty
    • must have real supremum s
    • sx \notin U

6. Convergence

The mth tail of a sequence x = (x_n) is the set \mbox{tail}_m(x) = \{x_m | n \in {\mathbb N}, n \geq m \}.

Suppose X is a metric space, z \in X and x= (x_n) is a sequence in X. Sequence x converges to z in X, denoted x_n \to z iff every open subset of X that contains z includes a tail of x. In this situation, z is unique and is called the limit of the sequence, denoted \mbox{lim }x_n.

It follows that for (X,d) a metric space, z \in X and (x_n) a sequence in X, the sequence (x_n) converges to z in X iff the real sequence (d(x_n,z))_{n \in \mathbb{N}} converges to 0 in {\mathbb R}.

For real sequences, we can define the:

  • limit superior, \mbox{lim sup } x_n = \mbox{inf } \{ \mbox{sup } \mbox{tail}_n(x) | n \in \mathbb{N} \} and
  • limit inferior, \mbox{lim inf } x_n = \mbox{sup } \{ \mbox{inf } \mbox{tail}_n(x) | n \in \mathbb{N} \}.

It can be shown that x_n \to z iff \mbox{lim sup } x_n = \mbox{lim inf } x_n = z.

Clearly sequences in superspaces converge to the same limit – the same is true in subspaces if the limit point is in the subspace itself. Sequences in finite product spaces equipped with product metrics converge in the product space iff their projections onto the individual spaces converge.

Every subsequence of a convergent sequence converges to the same limit as the parent sequence, but the picture for non-convergent parent sequences is more complicated, as we can still have convergent subsequences. There are various equivalent ways of characterising these limits of subsequences, e.g. centres of balls containing an infinite number of terms of the parent sequence.

A sequence (x_n) is Cauchy iff for every r \in \mathbb{R}^+, there is a ball of radius r that includes a tail of (x_n). Every convergent sequence is Cauchy. The converse is not true, but only if the what should be the limit point is missing from the space — adding this point and extending the metric appropriately yields a convergent sequence. It can be shown that a space is complete (see above for definition) iff every Cauchy sequence is also a convergent sequence in that space.

7. Bounds

A subset S of a metric space X is a bounded subset iff S = X = \emptyset or S is included in some ball of X. A metric space X is bounded iff it is a bounded subset of itself. An alternative characterisation of a bounded subset S is that it has finite diameter.

The Hausdorff metric is defined on the set S(X) of all non-empty closed bounded subsets of a set X equipped with metric d. It is given by h(A,B) = \max \{ \sup\{ \mbox{dist}(b, A) | b \in B\}, \sup\{ \mbox{dist}(a, B) | a \in A\} \}.

Given a set X and a metric space Y, f : X \to Y is a bounded function iff f(X) is a bounded subset of Y. The set of bounded functions from X to Y is denoted B(X,Y). There is a standard metric on bounded functions, s(f,g) = \sup \{ e(f(x),g(x)) | x \in X \} where e is the metric on Y.

Let X be a nonempty set and Y be a nonempty metric space. Let (f_n) be a sequence of functions from X to Y and g: X \to Y. Then:

  • (f_n) converges pointwise to g iff (f_n(z)) converges to g(z) for all z \in X
  • (f_n) converges uniformly to g iff \sup\{ e(f_n(x),g(x)) | x \in X \} is real for each n \in {\mathbb N} and the sequence ( \sup\{ e(f_n(x),g(x) | x \in X \})_{n \in {\mathbb N}} converges to zero in {\mathbb R}.

It’s interesting to look at these two different notions of convergence because the second is stronger. Every uniformly-convergent sequence of functions converges pointwise, but the converse is not true. An example is the sequence f_n : \mathbb{R}^+ \to \mathbb{R} given by f_n(x) = 1/nx. This converges pointwise but not uniformly to the zero function.

A stronger notion than boundedness is total boundedness. A subset S of a metric space X is totally bounded iff for each r \in {\mathbb R}^+, there is a finite collection of balls of X of radius r that covers S. An example of a bounded but not totally bounded subset is any infinite subset of a space with the discrete metric. Total boundedness carries over to subspaces and finite unions.

Conserving metrics play an important role in bounds, allowing bounds on product spaces to be equivalent to bounds on the projections to the individual spaces. This goes for both boundedness and total boundedness.

8. Continuity

Given metric spaces X and Y, a point z \in X and a function f: X \to Y, the function is said to be continuous at z iff for each open subset V \subseteq Y with f(z) \in V, there exists and open subset U of X with z \in U such that f(U) \subseteq V.

Extending from points to the whole domain, the function is said to be continuous on X iff for each open subset V \subseteq Y, f^{-1}(V) is open in X.

Continuity is not determined by the codomain, in the sense that a continuous function is continuous on any metric superspace of its range. It is preserved by function composition and by restriction.

Continuity plays well with product spaces, in the sense that if the product space is endowed with a product metric, a function mapping into the product space is continuous iff its compositions with the natural projections are all continuous.

For (X,d) and (Y,e) metric spaces, \mathcal{C}(X,Y) denotes the metric space of continuous bounded functions from X to Y with the supremum metric (f,g) \mapsto \sup\{ e(g(x),f(x)) | x \in X \}. \mathcal{C}(X,Y) is closed in the space of bounded functions from X to Y.

Nicely, we can talk about convergence using the language of continuity. In particular, let X be a metric space, and \tilde{\mathbb{N}} = \mathbb{N} \cup \{ \infty \}. Endow \tilde{\mathbb{N}} with the inverse metric (a,b) \mapsto |a^{-1} - b^{-1} | for a,b \in {\mathbb N}, (n,\infty) \mapsto n^{-1} and (\infty, \infty) \mapsto 0. Let \tilde{x} : \tilde{\mathbb{N}} \to X. Then \tilde{x} is continuous iff the sequence (x_n) converges in X to x_{\infty}. In particular, the function extending each convergent sequence with its limit is an isometry from the space of convergent sequences in X to the metric space of continuous bounded functions from \tilde{\mathbb{N}} to X.

9. Uniform Continuity

Here we explore increasing strengths of continuity: Lipschitz continuity > uniform continuity > continuity. Ó Searcóid also adds strong contractions into this hierarchy, as the strongest class studied.

Uniform continuity requires the \delta in the epsilon-delta definition of continuity to extend across a whole set. Consider metric spaces (X,d) and (Y,e), a function f : X \to Y, and a metric subspace S \subseteq X. The function f is uniformly continuous on S iff for every \epsilon \in \mathbb{R}^+ there exists a \delta \in \mathbb{R}^+ s.t. for every x, z \in S for which d(z,x) < \delta, it holds that e( f(z), f(x) ) < \epsilon.

If (X,d) is a metric space with the nearest-point property and f is continuous, then f is also uniformly continuous on every bounded subset of X. A good example might be a polynomial on \mathbb{R}.

Uniformly continuous functions map compact metric spaces into compact metric spaces. They preserve total boundedness and Cauchy sequences. This isn’t necessarily true for continuous functions, e.g. x \mapsto 1/x on (0,1] does not preserve the Cauchy property of the sequence (1/n).

There is a remarkable relationship between the Cantor Set and uniform continuity. Consider a nonempty metric space (X,d). Then X is totally bounded iff there exists a bijective uniformly continuous function from a subset of the Cantor Set to X. As Ó Searcóid notes, this means that totally bounded metric spaces are quite small, in the sense that none can have cardinality greater than that of the reals.

Consider metric spaces (X,d) and (Y,e) and function f: X \to Y. The function is called Lipschitz with Lipschitz constant k \in \mathbb{R}^+ iff e( f(a), f(b) ) \leq k d(a,b) for all a, b \in X.

Note here the difference to uniform continuity: Lipschitz continuity restricts uniform continuity by describing a relationship that must exist between the \epsilons and \deltas – uniform leaves this open. A nice example from Ó Searcóid of a uniformly continuous non-Lipschitz function is x \mapsto \sqrt{1 - x^2} on [0,1).

Lipschitz functions preserve boundedness, and the Lipschitz property is preserved by function composition.

There is a relationship between Lipschitz functions on the reals and their differentials. Let I be a non-degenerate intervals of \mathbb{R} and f: I \to \mathbb{R}. Then f is Lipschitz on I iff f' is bounded on I.

A function with Lipschitz constant less than one is called a strong contraction.

Unlike the case for continuity, not every product metric gives rise to uniformly continuous natural projections, but this does hold for conserving metrics.

10. Completeness

Let (X,d) be a metric space and u : X \to \mathbb{R}. The function u is called a virtual point iff:

  • u(a) - u(b) \leq d(a,b) \leq u(a) + u(b) for all a,b \in X
  • \text{inf} \; u(X) = 0
  • 0 \notin u(X)

We saw earlier that a metric space X is complete iff it is closed in every metric superspace of X. There are a number of equivalent characterisations, including that every Cauchy sequence in X converses in X.

Consider a metric space (X,d). A subset of S \subseteq X is a complete subset of X iff (S,d) is a complete metric space.

If X is a complete metric space and S \subseteq X, then S is complete iff S is closed in X.

Conserving metrics ensure that finite products of complete metric spaces are complete.

A non-empty metric space (X,d) is complete iff (\mathcal{S},h) is complete, where \mathcal{S}(X) denotes the collection of all non-empty closed bounded subsets of X and h denotes the Hausdorff metric.

For X a non-empty set and (Y,e) a metric space, the metric space B(X,Y) of bounded functions from X to Y with the supremum metric is a complete metric space iff Y is complete. An example is that the space of bounded sequences in \mathbb{R} is complete due to completeness of \mathbb{R}.

We can extend uniformly continuous functions from dense subsets to complete spaces to unique uniformly continuous functions from the whole: Consider metric spaces (X,d) and (Y,e) with the latter being complete. Let S \subseteq X be a dense subset of X and f : S \to Y be a uniformly continuous function. Then there exists a uniformly continuous function \tilde{f} : X \to Y such that \tilde{f}|_S = f. There are no other continuous extensions of f to X.

(Banach’s Fixed-Point Theorem). Let (X,d) be a non-empty complete metric space and f : X \to X be a strong contraction on X with Lipschitz constant k \in (0,1). Then f has a unique fixed point in X and, for each w \in X, the sequence (f^n(w)) converges to the fixed point. Beautiful examples of this abound, of course. Ó Searcóid discusses IFS fractals – computer scientists will be familiar with applications in the semantics of programming languages.

A metric space (Y,e) is called a completion of metric space (X,d) iff (Y,e) is complete and (X,d) is isometric to a dense subspace of (Y,e).

We can complete any metric space. Let (X,d) be a metric space. Define \tilde{X} = \delta(X) \cup \text{vp}(X) where \delta(X) denotes the set of all point functions in X and \text{vp}(X) denotes the set of all virtual points in X. We can endow \tilde{X} with the metric s given by (u,v) \mapsto \sup\{ |u(x) - v(x)| | x \in X \}. Then \tilde{X} is a completion of X.

Here the subspace (\delta(X),s) of (\tilde{X},s) forms the subspace isometric to (X,d).

11. Connectedness

A metric space X is a connected metric space iff X cannot be expressed as the union of two disjoint nonempty open subsets of itself. An example is \mathbb{R} with its usual metric. As usual, Ó Searcóid gives a number of equivalent criteria:

  • Every proper nonempty subset of X has nonempty boundary in X
  • No proper nonempty subset of X is both open and closed in X
  • X is not the union of two disjoint nonempty closed subsets of itself
  • Either X = \emptyset or the only continuous functions from X to the discrete space \{0,1\} are the two constant functions

Connectedness is not a property that is relative to any metric superspace. In particular, if X is a metric space, Z is a metric subspace of X and S \subseteq Z, then the subspace S of Z is a connected metric space iff the subspace S of X is a connected metric space. Moreover, for a connected subspace X of X with S \subseteq A \subseteq \bar{S}, the subspace A is connected. In particular, \bar{S} itself is connected.

Every continuous image of a connected metric space is connected. In particular, for nonempty S \subseteq \mathbb{R}, S is connected iff S is an interval. This is a generalisation of the Intermediate Value Theorem (to see this, consider the continuous functions f : X \to \mathbb{R}).

Finite products of connected subsets endowed with a product metric are connected. Unions of chained collections (i.e. sequences of subsets whose sequence neighbours are non-disjoint) of connected subsets are themselves connected.

A connected component U of a metric space X is a subset that is connected and which has no proper superset that is also connected – a kind of maximal connected subset. It turns out that the connected components of a metric space X are mutually disjoint, all closed in X, and X is the union of its connected components.

A path in metric space X is a continuous function f : [0, 1] \to X. (These functions turn out to be uniformly continuous.) This definition allows us to consider a stronger notion of connectedness: a metric space X is pathwise connected iff for each a, b \in X there is a path in X with endpoints a and b. An example given by Ó Searcóid of a space that is connected but not pathwise connected is the closure in \mathbb{R}^2 of \Gamma = \{ (x, \sin (1/x) | x \in \mathbb{R}^+ \}. From one of the results above, \bar{\Gamma} is connected because \Gamma is connected. But there is no path from, say, (0,0) (which nevertheless is in \bar{\Gamma}) to any point in \Gamma.

Every continuous image of a pathwise connected metric space is itself pathwise connected.

For a linear space, an even stronger notion of connectedness is polygonal connectedness. For a linear space X with subset S and a, b \in S, a polygonal connection from a to b in X is an n-tuple of points (c_1, \ldots c_n) s.t. c_1 = a, c_n = b and for each i \in \{1, 2, \ldots, n-1\}, \{(1 - t)c_i + t c_{i+1} | t \in [0,1] \} \subseteq S. We then say a space is polygonally connected iff there exists a polygonal connection between every two points in the space. Ó Searcóid gives the example of \{ z \in \mathbb{C} | \; |z|= 1 \} as a pathwise connected but not polygonally connected subset of \mathbb{C}.

Although in general these three notions of connectedness are distinct, they coincide for open connected subsets of normed linear spaces.

12. Compactness

Ó Searcóid gives a number of equivalent characterisations of compact non-empty metric spaces X, some of the ones I found most interesting and useful for the following material include:

  • Every open cover for X has a finite subcover
  • X is complete and totally bounded
  • X is a continuous image of the Cantor set
  • Every real continuous function defined on X is bounded and attains its bounds

The example is given of closed bounded intervals of \mathbb{R} as archetypal compact sets. An interesting observation is given that ‘most’ metric spaces cannot be extended to compact metric spaces, simply because there aren’t many compact metric spaces — as noted above in the section on bounds, there are certainly no more than |\mathbb{R}|, given they’re all images of the Cantor set.

If X is a compact metric space and S \subseteq X then S is compact iff S is closed in X. This follows because S inherits total boundedness from X, and completeness follows also if S is closed.

The Inverse Function Theorem states that for X and Y metric spaces with X compact, and for f : X \to Y injective and continuous, f^{-1}: f(X) \to X is uniformly continuous.

Compactness plays well with intersections, finite unions, and finite products endowed with a product metric. The latter is interesting, given that we noted above that for non conserving product metrics, total boundedness doesn’t necessarily carry forward.

Things get trickier when dealing with infinite-dimension spaces. The following statement of the Arzelà-Ascoli Theorem is given, which allows us to characterise the compactness of a closed, bounded subset of \mathcal{C}(X,Y) for compact metric spaces X and Y:

For each x \in X, define \hat{x}: S \to Y by \hat{x}(f) = f(x) for each f \in S. Let \hat{X} = \{\hat{x} | x \in X \}. Then:

  • \hat{X} \subseteq B(S,Y) and
  • S is compact iff x \to \hat{x} from X to B(S,Y) is continuous

13. Equivalence

Consider a set X and the various metrics we can equip it with. We can define a partial order \succeq on these metrics in the following way. d is topologically stronger than e, d \succeq e iff every open subset of (X,e) is open in (X,d). We then get an induced notion of topological equivalence of two metrics, when d \succeq e and e \succeq d.

As well as obviously admitting the same open subsets, topologically equivalent metrics admit the same closed subsets, dense subsets, compact subsets, connected subsets, convergent sequences, limits, and continuous functions to/from that set.

It turns out that two metrics are topologically equivalent iff the identity functions from (X,d) to (X,e) and vice versa are both continuous. Following the discussion above relating to continuity, this hints at potentially stronger notions of comparability – and hence of equivalence – of metrics, which indeed exist. In particular d is uniformly stronger than e iff the identify function from (X,d) to (X,e) is uniformly continuous. Also, d is Lipschitz stronger than e iff the identity function from (X,d) to (X,e) is Lipschitz.

The stronger notion of a uniformly equivalent metric is important because these metrics additionally admit the same Cauchy sequences, totally bounded subsets and complete subsets.

Lipschitz equivalence is even stronger, additionally providing the same bounded subsets and subsets with the nearest-point property.

The various notions of equivalence discussed here collapse to a single one when dealing with norms. For a linear space X, two norms on X are topologically equivalent iff they are Lipschitz equivalent, so we can just refer to norms as being equivalent. All norms on finite-dimensional linear spaces are equivalent.

Finally, some notes on the more general idea of equivalent metric spaces (rather than equivalent metrics.) Again, these are provided in three flavours:

  • topologically equivalent metric spaces (X,d) and (Y,e) are those for which there exists a continuous bijection with continuous inverse (a homeomorphism) from X to Y.
  • for uniformly equivalent metric spaces, we strengthen the requirement to uniform continuity
  • for Lipschitz equivalent metric spaces, we strengthen the requirement to Lipschitz continuity
  • strongest of all, isometries are discussed above

Note that given the definitions above, the metric space (X,d) is equivalent to the metric space (X,e) if d and e are equivalent, but the converse is not necessarily true. For equivalent metric spaces, we require existence of a function — for equivalent metrics this is required to be the identity.

ResearchED on Curriculum

A colleague recently pointed me to the ResearchED Guide to the Curriculum, a volume of essays edited by Clare Sealy. Following the guidance of Lemov and Badillo‘s essay in this volume that ‘reading a book should be a writing-intensive experience’, I’ve written down some of my thoughts after reading this book. They come from the perspective of someone who teaches (albeit in higher education rather than in a school) and is a researcher (but not in education). Of course, my background undoubtedly skews my perspective and limits my awareness of much educational theory.

The context here is that schools in England have been very busy over the last couple of years rethinking their curriculum, not least because the new Ofsted school inspection framework places it centre-stage. So now is a good moment to engage with schools over some of the more tricky questions involved.

I found this collection of essays very thought provoking, and would recommend engaging, whether or not you consider yourself to a fan of the “curriculum revolution” underway in English schools.

Knowledge and Teaching

Many of the contributions relate to a knowledge-based curriculum, but none give a working definition of knowledge. I think it’s useful for educators to reflect on, and leaders to engage with, epistemology at some level. When ideas like a “knowledge-based curriculum” start to be prioritised, then we need to understand what various meanings this might have, and precisely to what other ideas these terms may be being used in opposition. Of course this becomes even more important when politics enters the picture: I find it hard to envisage a definition of a knowledge-based curriculum that is broad enough to encompass both Michael Gove’s approach to knowledge in history and Michael F.D. Young‘s principles espoused in this book. A central problem in the theory of knowledge, of course, is the theory of truth; it’s interesting that Ashbee‘s essay in the same volume riffs on this theory when asking whether it is right that students should learn something false (the presence of electron shells in an atom) in order to facilitate understanding later on. Again, I think this could do with a more sophisticated analysis of truth and falsity here – it is by no means universally accepted that the presence of electron shells can be said to be ‘false’, and I do think the philosophical standpoint on such questions has implications for curriculum design – especially in the sciences.

The same holds for teaching. The role of teaching in imparting knowledge needs to be fully explored. Even if we accept the premise that, to quote Young, ‘schools in a democracy should all be working towards access to powerful knowledge for all their pupils’ (and also leave to one side the definition of a democracy) this leaves open the question of the role of the teacher in providing that access. At one extreme seems to lie the Gradgrindian approach best summarised by Dickens in Hard Times of students as ‘little vessels then and there arranged in order, ready to have imperial gallons of facts poured into them until them were full to the brim’, at the other an unschooling approach. But both can legitimately claim to be pursuing this aim. In the middle of these extremes, the teacher’s role in setting up experiences, and in developing understanding for example through Adey and Shayer’s concept of ‘cognitive conflict’ could explored more deeply.

It’s interesting that in his essay in this book, Young – described by Sealy as one of the ‘godfathers’ of the knowledge-based curriculum – has plenty to say about problematic ways this concept has been interpreted, in particular that “a school adopting a knowledge-led curriculum can spend too much time on testing whether students have memorised the knowledge of previous years“, and that “a focus on memorisation does not necessarily encourage students to develop a ‘relationship to knowledge’ that leads to new questions.” These concerns echo my own fears, and I see the latter also arise in higher education as students make the leap between undergraduate and postgraduate work.

My own teaching as an academic has spanned the full range from largely chalk-and-talk unidirectional presentations to undergraduate students to fairly laissez-faire mentoring of PhD students through their own discovery of the background material required for their research. It’s interesting to reflect on the different level of resourcing required to follow these models, a topic Young and Aurora both pick up in their essays: the need for a curriculum model that incorporates how teachers might engage with external constraints (resource, externally imposed exam syllabuses, etc.) in the short-term, even as we work towards a better long-term future for our students.

Memorisation is mentioned by several authors, and of course can be important, but – as Young says – it’s also important that students come to view it as “a step to acquiring new knowledge“, not as acquiring new knowledge. So my question to schools is this: how is that desirable outcome student perception reflected in your curriculum? How does your curriculum help develop that view by students?

My concerns over some of the ‘knowledge-based’ or ‘knowledge-led’ work in schools in recent years is broadly in line with Young’s view in this volume that teaching viewed as the transmission of knowledge excludes the process by which students develop a relationship with knowledge (my emphasis). I was also pleased by Young’s assertion that schools should treat subjects not just as bodies of knowledge but as communities of teachers and researchers and pupils as neophyte members of such communities. To me, this is wholly consistent with the exciting ideas behind Claxton’s Building Learning Power framework I reviewed some years ago here.

What Do We Want Students to Be Able to Do?

In addition to more traditional answers, Ashbee suggests some that should make schools pause for thought. For example, she suggests that while others may have chosen the curriculum content, students should be equipped to critically evaluate the inclusion of the knowledge in the curriculum they have studied and to ask what else could have been included. I like this idea: a key question for schools, though, is where do our curricula equip students for this task?

One aspect largely absent from this volume is a critical discussion of assessment, including testing, and its role in the curriculum, both in obvious terms of shaping the curriculum in the long- and short-term, and the – perhaps less obvious – nature of forms of assessments in themselves driving students’ behaviour and understanding of the nature of learning.

Planning Lessons

In an essay on Curriculum Coherence, Neil Almond discusses the role of sequencing in a subject curriculum. Almond uses the analogy of a box set, contrasting The Simpsons (minimal ordering required) to Game of Thrones (significant ordering required.)

Three aspects of the timing of lessons are, I think, missing in this discussion and deserve more explicit consideration.

Firstly, any necessary order of discussion of topics is rarely a total (linear) order, to put it mathematically. It’s maybe not even a partial order. Explicit dependencies between topic areas have been explored in depth by Cambridge Mathematics, who have built a directed graph representation of a curriculum. The lack of totality of the order provides significant freedom in practice; it is less clear what best practice might be, as a department or teacher, of how to take advantage of this freedom. It’s also less clear when to take advantage of this freedom: should this be a department-level once-and-for-all decision, one delegated to teachers to determine on the fly, or something in between? And why?

Secondly, even once this freedom has been exploited by mapping the dependence structure of concepts into a total order, there still remains the question of mapping from that order into time. Again, there is flexibility: should this unit take one week or two, or a whole term, and – importantly – curricula need to consider who should exercise this flexibility, when and why. Within mathematics, this has come under a lot of scrutiny in recent years, through discussions around the various definitions of mastery.

Finally, one aspect that the box set analogy obscures is the extent to which the sequencing of lessons is to be co-created with the students. Simpsons and Game of Thrones writers don’t have the option to co-create sequencing on the fly with their audience – schools and universities do. To what extent should this freedom be utilised, by whom, when, and to what end?

Linking Universities with Schools and Secondaries with Primaries

Ashbee discusses the very interesting question of how school curricula can engage with the mechanisms for knowledge generation in the broader discipline. For example, experiment, peer review, art exhibitions, all help reflect the norms of the discipline’s cutting edge back into the school curriculum. This is why I was sad, recently, to see Ofqual consulting on the removal of student-conducted experiments from 2021 GCSE science examinations, to give teachers time to cram in more facts: ‘science’ without experiment is not science.

Since one of the key venues for knowledge generation is the academy, increasing interaction between schools and universities should be very productive at the moment with increased school thinking about curriculum fundamentals. I am pleased to have played a very small part in the engagement of my university, both through my own outreach and through discussions leading up to our recent announcement of the opening of a new Imperial Maths School. More of all this, please, universities!

The theme of linking phases of education also appears in Andrew Percival‘s case study of primary curriculum development, where he emphasises the benefit his primary school obtained through teachers joining subject associations, e.g. in Design and Technology, making links with secondary specialists, and introducing self-directed study time for primary teaching staff to develop their subject knowledge. Those of us in all sectors should seek out links through joint professional associations.

Lessons for Leaders

Christine Counsell‘s essay tackles the topic of how school senior leadership teams (SLTs) should engage with developing and monitoring their departments under a knowledge-based curriculum. The main issue here is in the secondary sector, as SLTs will not include all subject specialisms. My colleague probably had this essay in mind when pointing out this edited volume, as many of the lessons and ideas here apply equally well to school governors engaging with subject leaders. I would agree with this. But actually, I would go further and say that many of Counsell’s suggestions for SLTs actually echo previous best practice in governance from before the new national curriculum. In meetings between subject leaders and school governors, governors have always played the role of knowledgable outsider, whose aim is to guide a subject leader in conversation to reflect on their role in the development of the teaching of their subject and its norms. It’s quite interesting to see this convergence. I was also struck by Counsell’s insistence on the importance of discussing curriculum content with middle leaders rather than relying on proxies such as attainment results alone, which can actually act to conceal the curriculum; in my day job this mirrors the importance we try to give in staff appraisal to discussing the research discoveries of academic staff, not focusing on the number or venue of publications. I think many of the arguments made are transferrable between schools and universities. Counsell also identifies positive and negative roles played by SLTs in developing a positive culture in middle leadership, and hence provides some useful material around which governance questions can be posed to probe SLT themselves.