Easter Coq

This Easter I set myself a little challenge to learn a little bit of Coq – enough to construct a proof of a simple but useful theorem in computer arithmetic. Long-time readers of this blog will know that this is not my first outing with dependent types, though I’ve never used them in anger. Four years ago – also during the Easter break! – I read Stump‘s book on Agda and spent some time playing with proofs and programming, as I documented here.

This blog post documents some of the interesting things in Coq I observed over the last few days. I’ve decided to write the majority of this post in Coq itself, below, before finishing off with some concluding remarks. In this way, anyone really interested can step through the definitions and proofs themselves.

 * A first datapath identity
 * George A. Constantinides, 2/4/21
 * This is an attempt to learn some basic Coq by proving a standard identity used in computer arithmetic,
 * namely \bar{x} + 1 = \bar{x – 1}. 
 * This identity is useful because it allows Boolean operations to move through arithmetic operations.
 * The code is for learning and teaching purposes only. It is not intended to be an efficient or elegant
 * approach, nor is it intended to make best use of existing Coq libraries. On the contrary, I have often used
 * many steps when one would do, so we can step through execution and see how it works.

Require Import Coq.Program.Equality.
Require Import Coq.Logic.Eqdep_dec.
Require Import Coq.Arith.Peano_dec.

(* Create my own bitvector type. It has a length, passed as a nat, and consists of bools. *)
Inductive bv : nat -> Set :=
| nilbv : bv 0
| consbv : forall n : nat, bool -> bv n -> bv (S n).

(* Head and tail of a bitvector, with implicit length arguments *)
Definition hdi {n : nat} (xs : bv (S n)) :=
  match xs with
  | consbv _ head _ => head

Definition tli {n : nat} (xs : bv (S n)) :=
  match xs with
  | consbv _ _ tail => tail

(* The basic carry and sum functions of a Boolean full adder *)
Definition carryfunc (a : bool) (b: bool) (c: bool) : bool :=
  orb (orb (andb a b) (andb a c)) (andb b c).

Definition sumfunc (a : bool) (b : bool) (c : bool) : bool :=
  xorb (xorb a b) c.

 * A ripple carry adder, with implicit length argument
 * Note that this definition makes use of a trick known as the ‘convoy pattern’ [1]
 * to get the dependent typing to work in a match clause. We use a ‘return’ clause
 * to make the type of the match result be a function which is then applied to the unmatched
 * argument. In this way the type system can understand that x and y have the same dependent type.
 * Note also the use of Fixpoint for a recursive definition.

Fixpoint rcai {n : nat} (x : bv n) (y : bv n) (cin : bool) : (bool * bv n) :=
  match x in bv n return bv n -> ( bool * bv n ) with
  | nilbv => fun _ => (cin, nilbv) (* an empty adder passes its carry in to its carry out *)
  | consbv n1 xh xt => fun y1 =>
       let (cout, sumout) := rcai xt (tli y1) (carryfunc cin xh (hdi y1)) in
                        (cout, consbv n1 (sumfunc cin xh (hdi y1)) sumout)
  end y.

(* We define addition modulo 2^n by throwing away the carry out, using snd, and then define an infix operator *)
Definition moduloadder {n : nat} (x : bv n) (y : bv n) : (bv n) :=
  snd (rcai x y false).

Infix “+” := moduloadder.

(* Bitwise negation of a word *)
Fixpoint neg {n : nat} (x : bv n) : (bv n) :=
  match x with
  | nilbv => nilbv
  | consbv n1 xh xt => consbv n1 (negb xh) (neg xt)

(* The word-level constant zero made of n zeros *)
Fixpoint bvzero {n : nat} : (bv n) :=
  match n with
  | O => nilbv
  | (S n1) => consbv n1 false bvzero

(* The word-level constant one with n leading zeros *)
Definition bvone {n : nat} :=
  consbv n true bvzero.

(* Additive inverse of a word, defined as ‘negate all the bits and add one’  *)
Definition addinv {n : nat} (x : bv (S n)) : (bv (S n)) :=
  neg(x) + bvone.

(* Subtraction modulo 2^n is defined as addition with the additive inverse and given its own infix operator *)
Definition modulosub {n : nat} (x : bv (S n)) (y : bv (S n)) :=
  x + (addinv y).

Infix “-” := modulosub.

(* a bit vector of just ones *)
Fixpoint ones {n : nat} : (bv n) :=
  match n with
  | O => nilbv
  | S n1 => consbv n1 true ones

(* OK, now we have some definitions, let’s prove some theorems! *)

(* Our first lemma (‘Lemma’ versus ‘Theorem’ has no language significance in Coq) says that inverting a
 * bitvector of ones gives us a bitvector of zeros.
 * There’s a couple of interesting points to note even in this simple proof by induction:
 * 1. I had to use ‘dependent destruction’,
 *    which is defined in the Coq.Program.Equality library, to get the destruction of variable x to take into account
 *    the length of the bitvector.
 * 2. The second use of inversion here didn’t get me what I wanted / expected, again due to dependent typing, for
 *    reasons I found explained in [2]. The solution was to use a theorem inj_pair_eq_dec, defined in 
 *    Coq.Logic.Eqdep_dec. This left me needing to prove that equality on the naturals is decidable. Thankfully,
 *    Coq.Arith.Peano_dec has done that.

Lemma invertzeros : forall {n : nat} (x : bv n),
  x = bvzero -> neg x = ones.
  intros n x H.
  induction n.
  dependent destruction x.
  auto. (* base case proved *)
  dependent destruction x.
  simpl bvzero in H.

  inversion H.

  simpl bvzero in H.
  inversion H. (* inversion with dependent type starts here…          *)
  apply inj_pair2_eq_dec in H2. (* goes via this theorem                                 *)
  2: apply eq_nat_dec. (* and completes via a proof of decidability of equality *)

  apply IHn.
  apply H2.

 * The next lemma says that if you fix one input to a ripple carry adder to zero and feed in the carry-in as zero
 * too, then the carry out will not be asserted and the sum will just equal the fixed input.
 * I proved this by induction, reasoning by case on the possible Boolean values of the LSB.
 * The wrinkle to notice here is that I didn’t know how to deal with a ‘let’ clause, but thanks to Yann Herklotz
 * (https://yannherklotz.com) who came to my aid by explaining that a ‘let’ is syntactic sugar for a match.

Lemma rcai_zero: forall (n : nat) (x : bv n),
  rcai x bvzero false = (false, x).
  intros n x.
  induction n.
  dependent destruction x.
  auto. (* base case proved *)
  dependent destruction x.
  simpl bvzero.
  simpl rcai.
  destruct b.
  unfold sumfunc. simpl.
  unfold carryfunc. simpl.

  destruct (rcai x bvzero false) eqn: H.

  rewrite IHn in H.
  inversion H.

  rewrite IHn in H.
  inversion H.

  unfold sumfunc. simpl.
  unfold carryfunc. simpl.

  destruct (rcai x bvzero false) eqn: H. (* The trick Yann taught me *)

  rewrite IHn in H.
  inversion H.

  rewrite IHn in H.
  inversion H.

 * The next lemma proves that -1 is a vector of ones
 * One thing to note here is that I needed to explicitly supply the implicit argument n to addinv using @.

Lemma allones: forall {n : nat}, @addinv n bvone = ones.
  intros n.
  induction n.
  auto. (* base case proved *)

  unfold bvone.
  unfold addinv.
  unfold bvone.
  unfold “+”.


  unfold carryfunc.
  unfold sumfunc.

  destruct (rcai (neg bvzero) bvzero false) eqn: H.


  rewrite rcai_zero in H.
  inversion H.

  apply invertzeros.

 * This lemma captures the fact that one way you can add one to a bitvector using a ripple carry adder is
 * to add zero and assert the carry in port. 

Lemma increment_with_carry : forall (n : nat) (x : bv (S n)),
  x + bvone = snd (rcai x bvzero true).
  intros n x.
  dependent destruction x.

  (* first peel off the LSB from the two operands *)

  simpl bvzero.
  simpl rcai.

  unfold bvone.
  unfold “+”.
  simpl rcai.

  (* now case split by the LSB of x to show the same thing *)

  destruct b.

  unfold carryfunc.
  unfold sumfunc.

  unfold carryfunc.
  unfold sumfunc.

(* This lemma says that if you add a vector of ones to a value x using a ripple carry adder, while asserting the
 * carry in port, then the sum result will just be x. Of course this is because -1 + 1 = 0, though I didn’t prove
 * it that way.
 * A neat trick I found to use in this proof is to use the tactic ‘apply (f_equal snd)’ on one of the hypotheses
 * in order to isolate the sum component in the tuple produced by the ripple carry function rcai.

Lemma rcai_ones_cin_identity : forall (n : nat) (x : bv n),
  snd (rcai x ones true) = x.
  intros n x.
  induction n.
  dependent destruction x.
  dependent destruction x.
  simpl ones.
  simpl rcai.

  (* case analysis *)
  destruct b.
  unfold carryfunc.
  unfold sumfunc.
  destruct (rcai x ones true) eqn: H.
  apply (f_equal snd) in H. (* a neat trick *)
  simpl in H.
  rewrite IHn in H.

  unfold carryfunc.
  unfold sumfunc.
  destruct (rcai x ones true) eqn: H.
  apply (f_equal snd) in H.
  simpl in H.
  rewrite IHn in H.

 * This lemma is actually the main content of what we’re trying to prove, just not wrapped up in
 * very readable form yet.
 * Note the use of ‘rewrite <-‘ to use an existing lemma to rewrite a term from the RHS of the equality
 * in the lemma to the LHS. Without the ‘<-‘ it would do it the other way round.

Lemma main_helper : forall (n : nat) (x : bv (S n)),
  neg (x + ones) = neg x + bvone.
  intros n x.
  induction n.
  dependent destruction x.
  destruct b.
  dependent destruction x.
  dependent destruction x.
  auto. (* base case proved *)

  dependent destruction x.
  unfold bvone.
  unfold “+”.
  simpl rcai.

  destruct b.
  unfold carryfunc.
  unfold sumfunc.

  rewrite rcai_zero.

  destruct (rcai x (consbv n true ones) true) eqn: H.
  simpl neg.
  simpl snd.

  apply (f_equal snd) in H.
  simpl snd in H.
  rewrite rcai_ones_cin_identity in H.

  unfold carryfunc.
  unfold sumfunc.

  destruct (rcai (neg x) (consbv n false bvzero)) eqn: H.
  apply (f_equal snd) in H.
  simpl snd in H.

  rewrite <- increment_with_carry in H.

  simpl snd.

  destruct (rcai x (consbv n true ones) false) eqn: H1.
  simpl snd.
  simpl neg.

  apply (f_equal snd) in H1.
  simpl snd in H1.

  rewrite <- H1.
  rewrite <- H.

  apply IHn.

Theorem main_theorem: forall (n : nat) (x : bv (S n)),
  neg x + bvone = neg (xbvone).
  intros n x.
  unfold “-“.
  rewrite allones.
  rewrite <- main_helper.

 * References
 * [1] http://adam.chlipala.net/cpdt/html/MoreDep.html
 * [2] https://stackoverflow.com/questions/24720137/inversion-produces-unexpected-existt-in-coq&nbsp;

Some Lessons

So what have I learned from this experience, beyond a little bit of Coq? Firstly, it was fun. It was a nice way to spend a couple of days of my Easter holiday. I am not sure I would want to do it under time pressure, though, as it was also frustrating at times. If I ever wanted to use Coq in anger for my work, I would want to take a couple of months – or more – to really spend time with it.

On the positive side, Coq really forced me to think about foundations. What do I actually mean when I write \overline{x} + 1 = \overline{x - 1}? Should I be thinking in {\mathbb Z}, in {\mathbb Z}/n\mathbb{Z}, or in digits, and when? How should bitvector arithmetic behave on zero-sized bitvectors? (Oh, and I certainly did not expect to be digging out a proof of decidability of natural equality from Coq’s standard library to prove this theorem!) The negative side is the same: Coq really forced me to think about foundations. And I remain to be convinced that I want to do that when I’m not on Easter holiday and in a philosophical mood.

I loved the type system and the expression of theorems. I’m luke warm about the proof process. At least the way I wrote the proofs – which was probably intolerably amateur – it felt like someone could come along and change the tactics at some point and my proof would be broken. Maybe this is not true, but this is what it felt like. This was a different feeling to that I remember when playing with Agda four years ago, which felt like everything needed to be explicit but somehow felt more nailed down and permanent. In Agda, the proofs are written in the same language as the types and I enjoyed that, too. Both languages are based on dependent types, and so as – I understand – is Lean. My colleague Kevin Buzzard is a strong advocate of Lean. Perhaps that’s one for another Easter holiday!

Thinking about this proof from a hardware perspective – designing efficient bit-parallel arithmetic hardware – it is clear that we do not need to have proved the theorem for all n. Each bit slice occupies silicon area, and as this is a finite resource, it would be sufficient to have one proof for each feasible value of n. Of course, this makes things much easier to prove, even if it comes with much more baggage. I can fire up an SMT solver and prove the theorem completely automatically for a specific value of n. As an example, if you paste the code below into the Z3 prover (hosted at rise4fun), the solver will report unsat, i.e. there is provably no satisfying value of the variable x violating the theorem for n = 4.

(declare-fun x () (_ BitVec 4))
(assert (not (= (bvadd (bvneg x) #x1) (bvneg (bvadd x #xF)))))

There are pluses and minuses to this. On the plus side, the SMT query is fast and automatic. On the minus side, in addition to only being valid for n = 4, it gives me – and perhaps some future AI – none of the intuition as to why this theorem holds. When I read mathematics, the proofs are not incidental, they are core to the understanding of what I’m reading.

Will this also be true for future AI-driven EDA tools?


In case this is useful to anyone (or to me in the future): I got syntax highlighting playing well for Coq with WordPress.com by using coqdoc to generate HTML and CSS, then hacking at the CSS so that it didn’t affect the rest of my WordPress theme, pasting it into the WordPress.com CSS customiser, and putting the generated HTML in a WordPress.com HTML block. Take care to avoid the CSS class .comment, used by coqdoc for code comments but also used by WordPress for blog post comment formatting!

Thanks again to Yann Herklotz for help understanding let bindings in Coq.

False Negatives and False Positives

Yesterday, I posted some comments on Twitter regarding the press focus on lateral flow test (LFT) false positives on the eve of the return of most students to school in England. It seems that I should probably have written a short blog post about this rather than squeezing it into a few tweets, given the number of questions I’ve had about the post since then. This is my attempt.

The press seem to be focusing on the number of false positives we are likely to see on return to school and the unnecessarily lost learning this may cause. My view is that given the current high case numbers and slow rate of decline, this is not the primary issue we should be worried about. Primarily, we should be worried about the false negative rate of these tests. My concern is that the small number of true positives caught by these tests may have less impact on reducing the rate of infection than behavioural relaxation induced by these tests has on increasing the rate of infection. Time will tell, of course.

Let me explain some of the data that has been released, the conclusions I draw from it, and why false positive rate is important for these conclusions regarding false negative rates.

For any given secondary-age pupil, if given both a LFT and a PCR test, excluding void tests there are four possible outcomes: LFT+ & PCR+ (LFT positive & PCR positive), LFT+ & PCR-, LFT- & PCR+ and LFT- & PCR-. We can imagine these in a table of probabilities, as below.

PCR+??0.34% to 1.45%

Here the total LFT+ figure of 0.19% comes from the SchoolsWeek article for secondary school students based on Test and Trace data for late February, while the total PCR+ figure is the confidence interval provided in the REACT-1 9b data from, which says “In the latter half of round 9 (9b), prevalence varied from 0.21% (0.14%, 0.31%) in those aged 65 and over to 0.71% (0.34%, 1.45%) in those aged 13 to 17 years.” Note that REACT-1 9b ran over almost the same period as the Test and Trace data on which the SchoolsWeek article is based.

What I think we all really want to know is what is the probability that a lateral flow test would give me a negative result when a PCR test would give me a positive result? We cannot get this information directly from the table, but we can start to fill in some of the question marks. Clearly the total LFT- probability will be 100% – 0.19% = 99.81%, and the total PCR- probability will be 98.55% to 99.66%. What about the four remaining question marks?

Let’s consider the best case specificity of these tests: that all the 0.19% of LFT+ detected were true positives, i.e. the tests were producing no false positives at all. In that case, the table would look something like this:

PCR+~0.19%~0.15% to ~1.26%0.34% to 1.45%
PCR-0.00%98.55% to 99.66%98.55% to 99.66%

Under these circumstances, we can see from the table that the LFTs are picking up 0.19% out of 0.34% to 1.45%, so we can estimate that the most they’re picking up is 0.19/0.34 = 56% of the true positive cases.

However, this assumed no false positives at all, which is a highly unrealistic assumption. What if we consider a more realistic assumption on false positives? The well-cited Oxford study gives a confidence interval for self-trained operatives of 0.24% to 0.60% false positives. Note that the lower end of this confidence interval would suggest that we should see at least 0.24% x 98.55% = ~0.24% of positive LFTs just from false positives alone. This is a higher value than the LFT positive rate we saw over this period of 0.19% (as noted by Deeks here). So this means it’s also entirely feasible that none of the LFT+ results were true positives, i.e. the results table could look more like this:

PCR+0%0.34% to 1.45%0.34% to 1.45%
PCR-~0.19%~98.36% to ~99.47%98.55% to 99.66%

Now this time round, we can see that the tests are picking up 0% out of 0.34% to 1.45%, so we can estimate that they’re picking up 0% of the true positive cases (i.e. 100% false negatives).

This is why I think we do need to have a conversation about false positives. Not because of a few days of missed school, as reported in the press, but because hiding behind these numbers may be a more significant issue of a much higher false negative rate than we thought, leading to higher infections in schools as people relax after receiving negative lateral flow tests.

Perhaps most importantly, I think the Government needs to commit to following the recommendations of the Royal Statistical Society, which would enable us to get to the bottom of exactly what is going on here with false positive and false negative rates.

(Note that I have assumed throughout this post that LFTs are being used as a ‘quick and easy’ substitute for PCRs, so that the ideal LFT outcome is to mirror a PCR outcome. I am aware that there are those who do not think this is the case, and I am not a medical expert so will not pass further comment on this issue.)

GCSEs and A-Levels in 2021: Current Plans

This week, a flurry of documents were released by Ofqual and the Department for Education in response to the consultation over what is to replace GCSEs and A-Levels in 2021 I blogged about previously.

In this post, I examine these documents to draw out the main lessons and evaluate them against my own submission to the consultation, which was based on my earlier post. The sources of information I draw on for this post are:

  1. A letter (‘Direction’) from the Secretary of State for Education to Ofqual sent to Ofqual on the 23rd February (and published on the 25th).
  2. Guidance on awarding qualifications in Summer 2021 published on the 25th February.
  3. Ofqual’s new Consultation on the general qualifications alternative awarding framework published on the 25th February, and to close on 11th March.

In my post on 16th January, I concluded that the initial proposals were complex and ill-defined, with scope to produce considerable workload for the education sector while still delivering a lack of comparability. The announcements this week from the Secretary of State and Ofqual have not helped allay my fears.

Curriculum Coverage

The decision has been made by Government that “teachers’ judgements this year should only be made on the content areas that have been taught.” However, the direction from the Secretary of State also insists that “teachers should assess as much course content as possible to ensure in the teachers’ judgement that there has been sufficient coverage of the curriculum to enable progression to further education, training, or employment, where relevant.”

Presumably, these two pieces of information are supposed to be combined to form a grade, or at least the latter modulates the decision over whether a grade is awardable. However, in what way this should happen is totally opaque. Let’s assume we have Student A who has only covered a small part of the GCSE maths curriculum, in the judgement of teachers ‘insufficient to enable progression to further education’ (In what? A-Level maths, or something else? Surely the judgement may depend on this.) However, Student A is ‘performing at’ a high grade (say Grade 8) in that small part of the curriculum. What do they get? A Grade 8? Lower than a Grade 8? No grade? How well will they do compared to Student B who has broad curriculum coverage, but little depth?

The issue of incomplete curriculum coverage has not nearly been addressed by these proposals.

The Return of Criterion Referencing

Several of us made the point in our submissions to the consultation that GCSE grades are norm-referenced, not criterion-referenced (with the exception of Grade 8, 5 and 2, as per the Grade Descriptors). As a result, if national comparability of exams is to be removed, then solid grade descriptors would need to be produced. Several of us suggested that the existence of so many GCSE grades, with such a low level of repeatability between assessors (see, e.g. Dennis Sherwood’s articles on this topic), suggests that grades should be thinned out this year. It seems that the Government agrees with this principle, as they have directed Ofqual to produce ‘grade descriptors for at least alternate grades’. This is good in as far as it goes, but if grade descriptors for only alternate grades are produced – quite rightly – then surely only alternate grades should be awarded!

In addition, if grade descriptors are to be useable no matter what narrow range of the curriculum has been covered, they will necessarily be very broad in nature, further narrowing the precision with which such judgements can be made. I await exemplar grade descriptors with some trepidation.

As I pointed out in my submission, calling 2021 and 2020 results GCSEs and A-Levels just misleadingly suggests comparability to previous years – better to wipe the slate clean and call them something else.

Fairness to Students: Lack of Comparability

One of my main concerns in the original proposal – not addressed in the final outcome – is that of fairness between centres and even between individual pupils in a centre. Centres will be allowed to use work conducted under a wide variety of circumstances, and while they are supposed to sign off that they are ‘confident’ that this work was that of the student without ‘inappropriate’ levels of support, I am not clear how they are supposed to gain that confidence for work conducted during lockdown, for example – which is explicitly allowed. There are many opportunities for unfairness here even within a centre.

Now if we bring in different centres having covered different parts of the curriculum and using different methodologies for quality assurance, the scope for unfairness is dramatic. The problem for students is that such unfairness will be much harder to identify come Summer compared to 2020, when ‘mutant algorithms’ could be blamed.

It seems odd to me that centres are asked to “use consistent sources of evidence for a class or cohort” yet there does not appear to be any reasonable attempt to maintain that consistency between cohorts at different centres. Exam boards will be asked to sample some subjects in some centres to check grades, but given the points I raise above over curriculum coverage it is very unclear how it will be possible to reach the conclusion that the incorrect grade has been awarded in all but the most extreme cases.

The guarantee of an Autumn exam series (as in 2020) is, however, to be welcomed.

Kicking the Can to Exam Boards

It is now the job of exam boards to come up with “a list of those sources of and approaches to collecting evidence that are considered most effective in determining grades” by the end of March. Good luck to them.

Exam boards must also undertake checks of all centres’ internal quality assurance processes before grades are submitted to them on the 18th of June. It is unclear what these checks will entail – I find this hard to imagine. Schools: please do share your experience of this process with me.

Teacher Workload

One of the big concerns I had in my original consultation response was over the increase in teacher workload. One aspect of this has been addressed: unlike in the draft proposals, teachers will no longer be responsible for appeals. However, there is still a very considerable additional workload involved in: getting to grips with the assessment materials released by the exam boards, developing an in-house quality assurance system, getting that agreed by the exam board, making an assessment of students, showing the students the evidence on which this assessment is based (this is a requirement), and submitting the grades, all over the window 1st April 2021 – 18th June 2021. I asked in my consultation response whether additional funding will be made available for schools, e.g. to provide cover for the release time required for this work. No answer has been forthcoming.

The development of an in-house quality assurance system is non trivial. The proposed GQAA framework imposes the requirements for such a system to have:

  • ‘a set policy on its approach to making judgements in relation to each Teacher Assessed Grade, including how Additional Assessment Materials and any other evidence will be used,’
  • ‘internal arrangements to standardise the judgements made in respect of the Centre’s Learners and a process for internal sign-off of each Teacher Assessed Grade,’
  • ‘a comparison of the Teacher Assessed Grades to results for previous cohorts at the Centre taking the same qualification to provide a high-level cross-check to ensure that Teacher Assessed Grades overall are not overly lenient or harsh compared to results in previous years,’
  • ‘specific support for newly qualified Teachers and Teachers less familiar with assessment, and
  • ‘a declaration by the head of Centre.’

The third bullet point seems logically impossible to achieve. Results this year will not be comparable to previous years as they will be using a different system based on different evidence. So there appears to be no way to check whether TAGs (not CAGs this year!) are comparable to those in previous years.

Private Candidates

Private candidates got a really bad deal last year. This year we are told that private candidates “should be assessed in a similar way to other students”, but that this will be “using an adapted range of evidence”. I’m not completely convinced that these two statements are logically consistent. It will be interesting to hear the experience of private candidates this year.

What about 2022?

It is unfortunate that schools will be left with such a narrow window of time, so we must start to think about 2022 right now. However, I note that in the current Consultation on the General Qualifications Alternative Awarding Framework, there is scope for whatever is decided now to bleed into 2022: ”We have not proposed a specific end date for the framework because it is possible some measures will be required for longer than others. Instead we propose that the GQAA Framework will apply until we publish a notice setting an end date.”

All the more reason to get it right now.

GCSEs and A-Levels in 2021

I have collected my initial thoughts after reading the Ofqual consultation, released on the 15th January 2021, over GCSE and A-Level replacements for this year. Alongside many others, I submitted proposals for 2020 which I felt would have avoided some of the worst outcomes we saw in Summer last year. My hope is that, this year, some of the suggestions will be given greater weight.

The basic principle underlying the Ofqual consultation is that teachers will be asked to grade students, that they can use a range of different evidence sources to do so, and that exam boards will be asked to produce mini tests / exams as one such source of evidence. This is not unlike the approach used in Key Stage 1 assessments (“SATs”) in primary schools in recent years. The actual process to be used to come up with a summary grade based on various sources of information is not being consulted over now, and it appears this will come from exam boards in guidance issued to teachers at some undetermined time in the future. This is a significant concern, as the devil really will be in the detail.

Overall, I am concerned that the proposed process is complex and ill-defined. There is scope to produce considerable workload for the education sector while still delivering a lack of comparability between centres / schools. I outline my concerns in more detail below.

Exam Board Papers – What are They For?

Ofqual is proposing that exam boards provide teachers papers (‘mini exams’) to “support consistency within and between schools and colleges” and that they “could also help with appeals”. However, it is very unclear how these papers will achieve these objectives. Papers might be sat at school or at home (p.18), they might be under supervision and they might not. Teachers might be asked to ‘remotely supervise’ these tests (p.18). These choices could vary on a per-pupil basis. The taking of tests may even be optional, and certainly teachers “should have some choice” over which questions are answered by their students. Grades will not be determined by these papers, so at best they will form one piece of evidence. If consistency is challenged, will the grades on these papers (when combined in some, as yet undetermined way) overrule other sources of information? This could be a cause of some confusion and needs significant clarity. The scope for lack of comparability of results between centres is significant when placing undue weight on these papers, and I am left wondering whether the additional workload for teachers and exam boards required to implement this proposal is really worth it.

If tests are to be taken (there are good reasons to suggest that they may be bad idea in their currently-envisaged form – see below), then I agree with Ofqual that – in principle – the ideal place to take them is in school (p.18). However, it is absolutely essential that school leaders do not end up feeling pressured to open to all students in an unsafe environment, due to the need for these tests. This is a basic principle, and I would resist any move to place further pressure on school leaders to fully open their schools until it is safe to do so.

Quality Assurance by Exam Boards

The main mechanism being proposed to ensure comparability and fairness between two centres / schools is random sampling (p.20-21). The exam board will sample the evidence base of a particular school for a particular subject, and query this with the school if they feel there is inadequate evidence to support the grades (it is not clear in the consultation whether the sampling will be of all pupils or individual pupils at that centre). This is a reasonable methodology for that particular subject / centre / student, but there is a major piece of information missing to enable judgement of whether this is sufficient for quality assurance of the system as a whole: what proportion of student grades will be sampled in this way? My concern is that the resources available to exam boards will be too small for this to be a large enough sample and that therefore the vast majority of grades awarded will be effectively unmoderated. This approach appears to be motivated by avoiding the bungled attempt at algorithmic moderation proposed in 2020, but without adequate resourcing, comparability between centres is not guaranteed to be better than it was under the abandoned 2020 scheme, and may even be worse.

Moreover, the bar for changing school grades appears to be set very high: “where robust investigation indicates that guidance has not been followed, or malpractice is found” (p.21), so I suspect we are heading towards a system of largely unmoderated centre-assessed grades. In 2020, centres were not aware at the point of returning CAGs that these would end up being given in unmoderated form, and therefore many centres appear to have been cautious when awarding high grades. Will this still be the case in 2021?

Curriculum Coverage

It is acknowledged throughout the consultation that centres / schools will have been unable to cover the entire curriculum in many cases. There appear to be two distinct issues to be dealt with here:

A. How to assess a subject with incomplete coverage

There are many ways this could be done. For the sake of argument, consider this question in the simplest setting of an exam. Here, the most direct approach would be simply to assess the entire curriculum, acknowledging that many more students would be unable to answer all questions this year, but re-adjusting grade boundaries to compensate. This may not be the best approach for student wellbeing, however, and in any case the proposal to use non-controlled assessment methods opens up much more flexibility. My concern is that flexibility almost always comes at the cost of comparability.

Ofqual are proposing that teachers have the ability to differentially weight different forms of assessment (e.g. practicals in the sciences). Is is unclear in the consultation whether this is on a per-student or on a per-centre basis – either brings challenges to fairness and transparency, and this point needs to be clarified quite urgently. They are also effectively proposing that teachers can give zero weight to some elements of the curriculum by choosing not to set / use assessments based on these elements. It is as yet undecided whether past work and tests can be used, or whether only work from now on – once students are aware it can be used for these purposes. It is opaque in the consultation how they are proposing to combine these various partial assessments. One approach I would not like to see is a weighted average of the various pieces of evidence available. A more robust approach, and one which may overcome some objections to using prior work, may be to allow teachers to select a number of the highest-graded pieces of work produced to date – a ‘curated portfolio’ approach. This may mitigate against both incomplete curriculum coverage and different student attitudes to summatively-assessed work versus standard class / homework.

B. How to ensure fairness

The consultation acknowledges that students in different parts of the country may have covered different amounts of the curriculum, due to local COVID restrictions. There is an unavoidable tension, therefore, between ‘assessment as a measure of what you can do’ and ‘assessment as a measure of what you can do, under the circumstances you were in’. This tension will not go away, and the Government needs to pick an option as a political decision. Some forms of assessment may mitigate this problem, to a degree, such as the ‘curated portfolio’ proposal made above, but none will solve it.


It is proposed that students are able to appeal to the exam board only ‘on the grounds that the school or college had not acted in line with the exam board’s procedural requirements’ (p.23). I am rather unclear how students are supposed to obtain information over the procedures followed at the school / college, so this sets a very high bar for appeals to the board. Meanwhile, the procedure for appeal to the school (p.23) appears to have a very low bar, and thus could potentially involve a significant extra workload for school staff. There is some suggestion that schools could be allowed to engage staff from other schools to handle marking appeals. If adequately financially resourced, Ofqual may wish to make this mandatory, to avoid conflicts of interest.

It is unclear in the consultation whether students will be able to appeal on the basis of an unfair weighting being applied to different elements of the curriculum (p.14). This could add an additional layer of complexity.

Grade Boundary Cliff-Edges

Grade boundaries have always been problematic. Can we really say that a student one mark either side of a Grade A boundary is that different in attainment? Last year, a bungled attempt was made to address this concern by requiring submission of student rankings within grade boundaries. Centre-Assessed Grades (CAGs) last year were optimistic, but this should come as no surprise – given a candidate I believe has a 50/50 chance of either getting an A or a B, why on earth would I choose a B? This issue will persist under the proposals for 2021, and I believe may be amplified by the knowledge that an algorithmic standardisation process will not be used. I suspect we may see even more complaints about ‘grade inflation’ in 2021, with significant knock-on effects for university admissions and funding. The root cause of this problem appears to be the aim to maintain the illusion of comparability between years for GCSE and A-Level results.


There are very significant workload implications for teachers, for school leaders, and for exam boards in these proposals – far more so than in 2020 arrangements. This workload has explicitly not yet been quantified in the consultation. I believe it needs to be quantified and funded: centres should receive additional funding to support this work, and teachers need to be guaranteed additional non-contact time to undertake the considerable additional work being requested of them.

Private Candidates

Private candidates, such as home-educated students, got a very poor deal last year. This must not be repeated, especially since many of the students who would be taking GCSEs and A-Levels this year are exactly the same home-educated students who decided to postpone for one year as a result of the changes last year. I am concerned to ensure comparability of outcomes between private candidates and centre-based candidates, and I am worried that two of the four proposed mechanisms for private candidates essentially propose a completely different form of qualification for these candidates.

Are 2021 (and 2020) qualifications actually GCSEs and A-Levels?

By labelling the qualifications of 2021 as GCSEs / A-Levels, rather than giving them a different title, there is an implicit statement of comparability between grades awarded in 2021 and those in previous years, which is rather questionable. Others made the point that in 2020 it may have been better to label these qualifications differently – the same argument applies in 2021. Even Ofqual implicitly make this point (p.27) when presenting the argument against overseas candidates taking exams as normal “might give rise to comments that there were 2 types of grades awarded”. The reality is that there will at least three types of grades awarded in recent years, pre-2020, 2020, and 2021. Is it time to face up to this and avoid the pretence of comparability between these different systems?

Equality Considerations

Ofqual seem to believe that if exam boards publish the papers / tests / mini-exams ‘shortly before’ they are taken then that will avoid leaking information but won’t put some students at a disadvantage because ‘students would not know which one(s) they would be required to complete’. I can envisage a situation where some students try to prepare for all published papers the moment they are released online, potentially a much greater number of papers than they will be required to sit, leading to considerable stress and anxiety, with potential equalities implications.

From the consultation, is not clear how exam board sampling will work, but there is the opportunity to bias the sampling process to help detect and correct for unconscious bias, if equalities information is available to exam boards. This could be considered.


On p.29, Ofqual state that ‘The usual assurances of comparability between years, between individual students, between schools and colleges and between exam boards will not be possible.’ This is not inspiring of confidence, but is honest. The question is how we can mitigate these impacts as far as possible. I hope Ofqual will listen carefully to the suggestions for 2021, and publish the approach taken in plenty of time. Releasing the algorithm used in 2020 on the day of A-level result release was unacceptable, and I hope Ofqual have learnt from this experience.

A-Levels and GCSEs in 2020

This week was A-Level results day. It was also the day that Ofqual published its long-awaited standardisation algorithm. Full details can be found in the 319-page report. In this blog post, I’ve set down my initial thoughts after reading the report.


I would like to begin by saying that Ofqual was not given an easy task: produce a system to devise A-level and GCSE grades without exams or coursework. Reading the report, it is clear that they worked hard to do the best they could within the confines they operate, and I respect that work. Nevertheless, I have several concerns to share.


1. Accounting for Prior Attainment

The model corrects for differences between historical prior attainment and prior attainment of the 2020 cohort in the following way (first taking into account any learners without prior attainment measures.) For any particular grade, the proportion to be awarded is equal to the historical proportion at that grade adjusted by a factor referred to in the report as q_{kj} - p_{kj}. (See p.92-93 of the report, which incidentally has a typo here — c_k should read c_{kj}.) As noted by the Fischer Family Trust, it appears that this factor is based solely on national differences in value added, and this could cause a problem. To illustrate this requires an artificial example. Imagine that Centre A has a historical transition matrix looking like this – all of its 200 students have walked away with A*s in this subject in recent years, whether they were in the first or second GCSE decile (and half were in each). Well done Centre A!

GCSE DecileA*A

Meanwhile, let’s say the national transition matrix looks more like this:

GCSE DecileA*A

Let’s now look at 2020 outcomes. Assume that this year, Centre A has an unusual cohort: all students were second decile in prior attainment. It seems natural to expect that it would still get mainly A*s, consistent with its prior performance, but this is not the outcome of the model. Instead, its historical distribution of 100% A*s is adjusted downwards because of the national transition matrix. The proportion of A*s at Centre A will be reduced by 40% – now only 60% of them will get A*s! This happens because the national transition matrix expects a 50/50 split of Decile 1 and Decile 2 students to end up with 50% A* and a Decile 2-only cohort to end up with 10% A*, resulting in a downgrade of 40%.

2. Model accuracy

Amongst the various possible standardisation options, Ofqual evaluated accuracy based on trying to predict 2019 exam grades and seeing how well they matched to awarded exams. This immediately presents a problem: no rank orders were submitted for 2019 students, so how is this possible? The answer provided is “the actual rank order within the centre based on the marks achieved in 2019 were used as a replacement“, i.e. they back-fitted 2019 marks to rank orders. This only provides a reasonable idea of accuracy if we assume that teacher-submitted rank orders in 2020 would exactly correspond to mark orders of their pupils, as noted by Guy Nason. Of course this will not be the case, so the accuracy estimates in the Ofqual report are likely to be significant overestimates. And they’re already not great, even under a perfect-ranking assumption: Ofqual report that only 12 out of 22 GCSE subjects were accurate to within one grade, with some subjects having only 40% accuracy in terms of predicting the attained grade – so one is left wondering what the accuracy might actually be for 2020 once rank-order uncertainty is taken into account.

There may also be a systematic variation in the accuracy of the model across different grades, but this is obscured by using the probability of successful classification across any grade as the primary measure of accuracy. Graphs presented in the Ofqual report suggest, for example, that the models are far less accurate at Grade 4 than at Grade 7 in GCSE English.

3. When is a large cohort a large cohort?

A large cohort, and therefore one for which teacher-assessed grades are used at all, is defined in the algorithm to be one with at least 15 students. But how do we count these 15 students? The current cohort or the historic cohort, or something else? The answer is given in Ofqual’s report: the harmonic mean of the two. As an extreme example of this, centre cohorts can be considered “large” with only 8 pupils this year – so long as they had at least 120 in the recent past. It seems remarkable that a centre could have fewer pupils than GCSE grades and still be “large”!

4. Imputed marks fill grade ranges

As the penultimate step in the Ofqual algorithm, “imputed marks” are calculated for each student – a kind of proxy mark equally spaced between grade end-points. So, for example, if Centre B only has one student heading for a Grade C at this stage then – by definition – it’s a mid-C. If they had two Grade C students, they’d be equally spaced across the “C spectrum”. This means that in the next step of the algorithm, cut-score setting, these students are vulnerable to changing grades. For centres which tend to fill the full grade range anyway, this may not be an issue. But I worry that we may see some big changes at the edges of centre distributions as a result of this quirk.

5. No uncertainty quantification

Underlying many of these concerns is, perhaps, a more fundamental one. Grades awarded this year come with different levels of uncertainty, depending on factors like how volatile attainment at the centre has been in the past, the size of the cohorts, known uncertainty in grading, etc. Yet none of this is visible in the awarded grade. In practice, this means that some Grade Cs are really “B/C”s while some are “A-E”, and we don’t know the difference. It is not beyond possibility to quantify the uncertainty – in fact I proposed awarding grade ranges in my original consultation response to Ofqual. This issue has been raised independently by the Royal Statistical Society and even for normal exam years, given the inherent unreliability of exam grades, by Dennis Sherwood. For small centres, rather than a statistically reasonable approach to widen the grade range, the impact of only awarding a single grade with unquantified uncertainty is that Ofqual have had to revert to teacher-assessed grades, leading to an unfair a “mix and match” system where some centres have had their teacher-assessed grades awarded while some haven’t.

What Must Happen Now?

I think everyone can agree that centres need to immediately receive all the intermediate steps in the calculations of their grades. Many examinations officers are currently scratching their heads, after having received only a small part of this information. The basic principle must be that centres are able to recalculate their grades from first principles if they want to. This additional information should include the proportion of pupils in both historical and current cohorts with matched prior attainment data for each subject and which decile each student falls into, the national transition matrices used for each subject, the values of q_{kj} and p_{kj} for each subject / grade combination, the imputed marks for each 2020 student, and the national imputed mark cut-points for each grade boundary in each subject.

At a political level, serious consideration should now be given to awarding teacher-assessed grades (CAGs) this year. While I was initially supportive of a standardisation approach – and I support the principles of Ofqual’s “meso-standardisation” – I fear that problems with the current standarisation algorithm are damaging rather than preserving public perception of A-Level grades. We may have now reached the point that the disadvantages of sticking to the current system are worse than the disadvantages of simply accepting CAGs for A-Levels.

Ofqual states in their report that “A key motivation for the design of the approach to standardisation [was] as far as possible [to] ensure that a grade represents the same standard, irrespective of the school or college they attended.”.  Unfortunately, my view is that this has not been achieved by the Ofqual algorithm. However, despite my concerns over Ofqual’s algorithm, it is also questionable whether any methodology meeting this objective could be implemented in time under a competitive education system culture driven by high-stakes accountability systems. Something to think about for our post-COVID world.

Some Notes on Metric Spaces

This post contains some summary informal notes of key ideas from my reading of Mícheál Ó Searcóid’s Metric Spaces (Springer, 2007). These notes are here as a reference for me, my students, and any others who may be interested. They are by no means exhaustive, but rather cover topics that seemed interesting to me on first reading. By way of a brief book review, it’s worth noting that Ó Searcóid’s approach is excellent for learning a subject. He has a few useful tricks up his sleeve, in particular:

  • Chapters will often start with a theorem proving equivalence of various statements (e.g. Theorem 8.1.1, Criteria for Continuity at a Point). Only then will he choose one of these statements as a definition, and he explains this choice carefully, often via reference to other mathematics.
  • The usual definition-theorem-proof style is supplemented with ‘question’ – these are relatively informally-stated questions and their answers. They have been carefully chosen to highlight some questions the reader might be wondering about at that point in the text and to demonstrate key (and sometimes surprising) answers before the formal theorem statement.
  • The writing is pleasant, even playful at times though never lacking formality. This is a neat trick to pull off.
  • There are plenty of exercises, and solutions are provided.

These features combine to produce an excellent learning experience.

1. Some Basic Definitions

A metric on a set X is a function d : X \times X \to {\mathbb R} such that:

  • Positivity: d(a,b) \geq 0 with equality iff a = b
  • Symmetry: d(a,b) = d(b,a)
  • Triangle inequality: d(a,b) \leq d(a,c) + d(c,b)

The combination of such a metric and a the corresponding set is a metric space.

Given a metric space (X,d), the point function at z is \delta_z : x \mapsto d(z,x).

A pointlike function u : X \to {\mathbb R}^\oplus is one where u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)

For metric spaces (X,d) and (Y,e), X is a metric subspace of Y iff X \subseteq Y and d is a restriction of e.

For metric spaces (X,d) and (Y,e), an isometry \phi : X \to Y is a function such that e(\phi(a),\phi(b)) = d(a,b). The metric subspace (\phi(X),e) is an isometric copy of (X,d).

Some standard constructions of metrics for product spaces:

  1. \mu_1 : (a,b) \mapsto \sum_{i=1}^n \tau_i(a_i,b_i)
  2. \mu_2 : (a,b) \mapsto \sqrt{\sum_{i=1}^n \left(\tau_i(a_i,b_i)\right)^2}
  3. \mu_\infty : (a,b) \mapsto \max\left\{\tau_i(a_i,b_i) | i \in {\mathbb N}\right\}

A conserving metric e on a product space is one where \mu_\infty(a,b) \leq e(a,b) \leq \mu_1(a,b). Ó Searcóid calls these conserving metrics because they conserve an isometric copy of the individual spaces, recoverable by projection (I don’t think this is a commonly used term). This can be seen because fixing elements of all-but-one of the constituent spaces makes the upper and lower bound coincide, resulting in recovery of the original metric.

A norm on a linear space V over {\mathbb R} or {\mathbb C} is a real function such that for x, y \in V and \alpha scalar:

  • ||x|| \geq 0 with equality iff x = 0
  • ||\alpha x|| = |\alpha|\; ||x||
  • ||x + y|| \leq ||x|| + ||y||

The metric defined by the norm is d(a,b) = ||a - b||.

2. Distances

The diameter of a set A \subseteq X of metric space (X,d) is \text{diam}(A) = \sup\{d(r,s) | r, s \in A\}.

The distance of a point x \in X from a set A \subseteq X is \text{dist}(x, A) = \inf\{ d(x,a) | a \in A\}.

An isolated point z \in S where S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) \neq 0.

An accumulation point or limit point z \in X of S \subseteq X is one for which \text{dist}(z, S \setminus \{z\}) = 0. Note that z doesn’t need to be in S. A good example is z = 0, X = {\mathbb R}, S = \{1/n | n \in {\mathbb N}\}.

The distance from subset A to subset B of a metric space is defined as \text{dist}(A,B) = \inf\{ d(a,b) | a \in A, b \in B\}.

A nearest point s \in S of S to z \in X is one for which d(z,s) = \text{dist}(z,S). Note that nearest points don’t need to exist, because \text{dist} is defined via the infimum. If a metric space is empty or admits a nearest point to each point in every metric superspace, it is said to have the nearest-point property.

3. Boundaries

A point a is a boundary point of S in X iff \text{dist}(a,S) = \text{dist}(a,S^c) = 0. The collection of these points is the boundary \partial S.

Metric spaces with no proper non-trivial subset with empty boundary are connected. An example of a disconnected metric space is X = (0,1) \cup (7,8) as a metric subspace of {\mathbb R}, while {\mathbb R} itself is certainly connected.

Closed sets are those that contain their boundary.

The closure of S in X is \bar{S} \triangleq X \cup \partial S. The interior is S \setminus \partial S. The exterior is (\bar{S})^c.

Interior, boundary, and exterior are mutually disjoint and their union is X.

4. Sub- and super-spaces

A subset S \subseteq X is dense in X iff \bar{S} = X, or equivalently if for every x \in X, \text{dist}(x,S) = 0. The archetypal example is that \mathbb{Q} is dense in \mathbb{R}.

A complete metric space X is one that is closed in every metric superspace of X. An example is \mathbb{R}.

5. Balls

Let b[a;r) = \{ x \in X | d(a,x) < r \} denote an open ball and similarly b[a;r] = \{ x \in X | d(a,x) \leq r \} denote a closed ball. In the special case of normed linear spaces, b[a;r) = a + rb[0;1) and similarly for closed balls, so the important object is this unit ball – all others have the same shape. A norm on a space V is actually defined by three properties such balls U must have:

  • Convexity
  • Balanced (i.e. x \in U \Rightarrow -x \in U)
  • For each x \in V \setminus \{0\}, the set \{ t \in \mathbb{R}^+ | t x \in U \},
    • is nonempty
    • must have real supremum s
    • sx \notin U

6. Convergence

The mth tail of a sequence x = (x_n) is the set \mbox{tail}_m(x) = \{x_m | n \in {\mathbb N}, n \geq m \}.

Suppose X is a metric space, z \in X and x= (x_n) is a sequence in X. Sequence x converges to z in X, denoted x_n \to z iff every open subset of X that contains z includes a tail of x. In this situation, z is unique and is called the limit of the sequence, denoted \mbox{lim }x_n.

It follows that for (X,d) a metric space, z \in X and (x_n) a sequence in X, the sequence (x_n) converges to z in X iff the real sequence (d(x_n,z))_{n \in \mathbb{N}} converges to 0 in {\mathbb R}.

For real sequences, we can define the:

  • limit superior, \mbox{lim sup } x_n = \mbox{inf } \{ \mbox{sup } \mbox{tail}_n(x) | n \in \mathbb{N} \} and
  • limit inferior, \mbox{lim inf } x_n = \mbox{sup } \{ \mbox{inf } \mbox{tail}_n(x) | n \in \mathbb{N} \}.

It can be shown that x_n \to z iff \mbox{lim sup } x_n = \mbox{lim inf } x_n = z.

Clearly sequences in superspaces converge to the same limit – the same is true in subspaces if the limit point is in the subspace itself. Sequences in finite product spaces equipped with product metrics converge in the product space iff their projections onto the individual spaces converge.

Every subsequence of a convergent sequence converges to the same limit as the parent sequence, but the picture for non-convergent parent sequences is more complicated, as we can still have convergent subsequences. There are various equivalent ways of characterising these limits of subsequences, e.g. centres of balls containing an infinite number of terms of the parent sequence.

A sequence (x_n) is Cauchy iff for every r \in \mathbb{R}^+, there is a ball of radius r that includes a tail of (x_n). Every convergent sequence is Cauchy. The converse is not true, but only if the what should be the limit point is missing from the space — adding this point and extending the metric appropriately yields a convergent sequence. It can be shown that a space is complete (see above for definition) iff every Cauchy sequence is also a convergent sequence in that space.

7. Bounds

A subset S of a metric space X is a bounded subset iff S = X = \emptyset or S is included in some ball of X. A metric space X is bounded iff it is a bounded subset of itself. An alternative characterisation of a bounded subset S is that it has finite diameter.

The Hausdorff metric is defined on the set S(X) of all non-empty closed bounded subsets of a set X equipped with metric d. It is given by h(A,B) = \max \{ \sup\{ \mbox{dist}(b, A) | b \in B\}, \sup\{ \mbox{dist}(a, B) | a \in A\} \}.

Given a set X and a metric space Y, f : X \to Y is a bounded function iff f(X) is a bounded subset of Y. The set of bounded functions from X to Y is denoted B(X,Y). There is a standard metric on bounded functions, s(f,g) = \sup \{ e(f(x),g(x)) | x \in X \} where e is the metric on Y.

Let X be a nonempty set and Y be a nonempty metric space. Let (f_n) be a sequence of functions from X to Y and g: X \to Y. Then:

  • (f_n) converges pointwise to g iff (f_n(z)) converges to g(z) for all z \in X
  • (f_n) converges uniformly to g iff \sup\{ e(f_n(x),g(x)) | x \in X \} is real for each n \in {\mathbb N} and the sequence ( \sup\{ e(f_n(x),g(x) | x \in X \})_{n \in {\mathbb N}} converges to zero in {\mathbb R}.

It’s interesting to look at these two different notions of convergence because the second is stronger. Every uniformly-convergent sequence of functions converges pointwise, but the converse is not true. An example is the sequence f_n : \mathbb{R}^+ \to \mathbb{R} given by f_n(x) = 1/nx. This converges pointwise but not uniformly to the zero function.

A stronger notion than boundedness is total boundedness. A subset S of a metric space X is totally bounded iff for each r \in {\mathbb R}^+, there is a finite collection of balls of X of radius r that covers S. An example of a bounded but not totally bounded subset is any infinite subset of a space with the discrete metric. Total boundedness carries over to subspaces and finite unions.

Conserving metrics play an important role in bounds, allowing bounds on product spaces to be equivalent to bounds on the projections to the individual spaces. This goes for both boundedness and total boundedness.

8. Continuity

Given metric spaces X and Y, a point z \in X and a function f: X \to Y, the function is said to be continuous at z iff for each open subset V \subseteq Y with f(z) \in V, there exists and open subset U of X with z \in U such that f(U) \subseteq V.

Extending from points to the whole domain, the function is said to be continuous on X iff for each open subset V \subseteq Y, f^{-1}(V) is open in X.

Continuity is not determined by the codomain, in the sense that a continuous function is continuous on any metric superspace of its range. It is preserved by function composition and by restriction.

Continuity plays well with product spaces, in the sense that if the product space is endowed with a product metric, a function mapping into the product space is continuous iff its compositions with the natural projections are all continuous.

For (X,d) and (Y,e) metric spaces, \mathcal{C}(X,Y) denotes the metric space of continuous bounded functions from X to Y with the supremum metric (f,g) \mapsto \sup\{ e(g(x),f(x)) | x \in X \}. \mathcal{C}(X,Y) is closed in the space of bounded functions from X to Y.

Nicely, we can talk about convergence using the language of continuity. In particular, let X be a metric space, and \tilde{\mathbb{N}} = \mathbb{N} \cup \{ \infty \}. Endow \tilde{\mathbb{N}} with the inverse metric (a,b) \mapsto |a^{-1} - b^{-1} | for a,b \in {\mathbb N}, (n,\infty) \mapsto n^{-1} and (\infty, \infty) \mapsto 0. Let \tilde{x} : \tilde{\mathbb{N}} \to X. Then \tilde{x} is continuous iff the sequence (x_n) converges in X to x_{\infty}. In particular, the function extending each convergent sequence with its limit is an isometry from the space of convergent sequences in X to the metric space of continuous bounded functions from \tilde{\mathbb{N}} to X.

9. Uniform Continuity

Here we explore increasing strengths of continuity: Lipschitz continuity > uniform continuity > continuity. Ó Searcóid also adds strong contractions into this hierarchy, as the strongest class studied.

Uniform continuity requires the \delta in the epsilon-delta definition of continuity to extend across a whole set. Consider metric spaces (X,d) and (Y,e), a function f : X \to Y, and a metric subspace S \subseteq X. The function f is uniformly continuous on S iff for every \epsilon \in \mathbb{R}^+ there exists a \delta \in \mathbb{R}^+ s.t. for every x, z \in S for which d(z,x) < \delta, it holds that e( f(z), f(x) ) < \epsilon.

If (X,d) is a metric space with the nearest-point property and f is continuous, then f is also uniformly continuous on every bounded subset of X. A good example might be a polynomial on \mathbb{R}.

Uniformly continuous functions map compact metric spaces into compact metric spaces. They preserve total boundedness and Cauchy sequences. This isn’t necessarily true for continuous functions, e.g. x \mapsto 1/x on (0,1] does not preserve the Cauchy property of the sequence (1/n).

There is a remarkable relationship between the Cantor Set and uniform continuity. Consider a nonempty metric space (X,d). Then X is totally bounded iff there exists a bijective uniformly continuous function from a subset of the Cantor Set to X. As Ó Searcóid notes, this means that totally bounded metric spaces are quite small, in the sense that none can have cardinality greater than that of the reals.

Consider metric spaces (X,d) and (Y,e) and function f: X \to Y. The function is called Lipschitz with Lipschitz constant k \in \mathbb{R}^+ iff e( f(a), f(b) ) \leq k d(a,b) for all a, b \in X.

Note here the difference to uniform continuity: Lipschitz continuity restricts uniform continuity by describing a relationship that must exist between the \epsilons and \deltas – uniform leaves this open. A nice example from Ó Searcóid of a uniformly continuous non-Lipschitz function is x \mapsto \sqrt{1 - x^2} on [0,1).

Lipschitz functions preserve boundedness, and the Lipschitz property is preserved by function composition.

There is a relationship between Lipschitz functions on the reals and their differentials. Let I be a non-degenerate intervals of \mathbb{R} and f: I \to \mathbb{R}. Then f is Lipschitz on I iff f' is bounded on I.

A function with Lipschitz constant less than one is called a strong contraction.

Unlike the case for continuity, not every product metric gives rise to uniformly continuous natural projections, but this does hold for conserving metrics.

10. Completeness

Let (X,d) be a metric space and u : X \to \mathbb{R}. The function u is called a virtual point iff:

  • u(a) - u(b) \leq d(a,b) \leq u(a) + u(b) for all a,b \in X
  • \text{inf} \; u(X) = 0
  • 0 \notin u(X)

We saw earlier that a metric space X is complete iff it is closed in every metric superspace of X. There are a number of equivalent characterisations, including that every Cauchy sequence in X converses in X.

Consider a metric space (X,d). A subset of S \subseteq X is a complete subset of X iff (S,d) is a complete metric space.

If X is a complete metric space and S \subseteq X, then S is complete iff S is closed in X.

Conserving metrics ensure that finite products of complete metric spaces are complete.

A non-empty metric space (X,d) is complete iff (\mathcal{S},h) is complete, where \mathcal{S}(X) denotes the collection of all non-empty closed bounded subsets of X and h denotes the Hausdorff metric.

For X a non-empty set and (Y,e) a metric space, the metric space B(X,Y) of bounded functions from X to Y with the supremum metric is a complete metric space iff Y is complete. An example is that the space of bounded sequences in \mathbb{R} is complete due to completeness of \mathbb{R}.

We can extend uniformly continuous functions from dense subsets to complete spaces to unique uniformly continuous functions from the whole: Consider metric spaces (X,d) and (Y,e) with the latter being complete. Let S \subseteq X be a dense subset of X and f : S \to Y be a uniformly continuous function. Then there exists a uniformly continuous function \tilde{f} : X \to Y such that \tilde{f}|_S = f. There are no other continuous extensions of f to X.

(Banach’s Fixed-Point Theorem). Let (X,d) be a non-empty complete metric space and f : X \to X be a strong contraction on X with Lipschitz constant k \in (0,1). Then f has a unique fixed point in X and, for each w \in X, the sequence (f^n(w)) converges to the fixed point. Beautiful examples of this abound, of course. Ó Searcóid discusses IFS fractals – computer scientists will be familiar with applications in the semantics of programming languages.

A metric space (Y,e) is called a completion of metric space (X,d) iff (Y,e) is complete and (X,d) is isometric to a dense subspace of (Y,e).

We can complete any metric space. Let (X,d) be a metric space. Define \tilde{X} = \delta(X) \cup \text{vp}(X) where \delta(X) denotes the set of all point functions in X and \text{vp}(X) denotes the set of all virtual points in X. We can endow \tilde{X} with the metric s given by (u,v) \mapsto \sup\{ |u(x) - v(x)| | x \in X \}. Then \tilde{X} is a completion of X.

Here the subspace (\delta(X),s) of (\tilde{X},s) forms the subspace isometric to (X,d).

11. Connectedness

A metric space X is a connected metric space iff X cannot be expressed as the union of two disjoint nonempty open subsets of itself. An example is \mathbb{R} with its usual metric. As usual, Ó Searcóid gives a number of equivalent criteria:

  • Every proper nonempty subset of X has nonempty boundary in X
  • No proper nonempty subset of X is both open and closed in X
  • X is not the union of two disjoint nonempty closed subsets of itself
  • Either X = \emptyset or the only continuous functions from X to the discrete space \{0,1\} are the two constant functions

Connectedness is not a property that is relative to any metric superspace. In particular, if X is a metric space, Z is a metric subspace of X and S \subseteq Z, then the subspace S of Z is a connected metric space iff the subspace S of X is a connected metric space. Moreover, for a connected subspace X of X with S \subseteq A \subseteq \bar{S}, the subspace A is connected. In particular, \bar{S} itself is connected.

Every continuous image of a connected metric space is connected. In particular, for nonempty S \subseteq \mathbb{R}, S is connected iff S is an interval. This is a generalisation of the Intermediate Value Theorem (to see this, consider the continuous functions f : X \to \mathbb{R}).

Finite products of connected subsets endowed with a product metric are connected. Unions of chained collections (i.e. sequences of subsets whose sequence neighbours are non-disjoint) of connected subsets are themselves connected.

A connected component U of a metric space X is a subset that is connected and which has no proper superset that is also connected – a kind of maximal connected subset. It turns out that the connected components of a metric space X are mutually disjoint, all closed in X, and X is the union of its connected components.

A path in metric space X is a continuous function f : [0, 1] \to X. (These functions turn out to be uniformly continuous.) This definition allows us to consider a stronger notion of connectedness: a metric space X is pathwise connected iff for each a, b \in X there is a path in X with endpoints a and b. An example given by Ó Searcóid of a space that is connected but not pathwise connected is the closure in \mathbb{R}^2 of \Gamma = \{ (x, \sin (1/x) | x \in \mathbb{R}^+ \}. From one of the results above, \bar{\Gamma} is connected because \Gamma is connected. But there is no path from, say, (0,0) (which nevertheless is in \bar{\Gamma}) to any point in \Gamma.

Every continuous image of a pathwise connected metric space is itself pathwise connected.

For a linear space, an even stronger notion of connectedness is polygonal connectedness. For a linear space X with subset S and a, b \in S, a polygonal connection from a to b in X is an n-tuple of points (c_1, \ldots c_n) s.t. c_1 = a, c_n = b and for each i \in \{1, 2, \ldots, n-1\}, \{(1 - t)c_i + t c_{i+1} | t \in [0,1] \} \subseteq S. We then say a space is polygonally connected iff there exists a polygonal connection between every two points in the space. Ó Searcóid gives the example of \{ z \in \mathbb{C} | \; |z|= 1 \} as a pathwise connected but not polygonally connected subset of \mathbb{C}.

Although in general these three notions of connectedness are distinct, they coincide for open connected subsets of normed linear spaces.

12. Compactness

Ó Searcóid gives a number of equivalent characterisations of compact non-empty metric spaces X, some of the ones I found most interesting and useful for the following material include:

  • Every open cover for X has a finite subcover
  • X is complete and totally bounded
  • X is a continuous image of the Cantor set
  • Every real continuous function defined on X is bounded and attains its bounds

The example is given of closed bounded intervals of \mathbb{R} as archetypal compact sets. An interesting observation is given that ‘most’ metric spaces cannot be extended to compact metric spaces, simply because there aren’t many compact metric spaces — as noted above in the section on bounds, there are certainly no more than |\mathbb{R}|, given they’re all images of the Cantor set.

If X is a compact metric space and S \subseteq X then S is compact iff S is closed in X. This follows because S inherits total boundedness from X, and completeness follows also if S is closed.

The Inverse Function Theorem states that for X and Y metric spaces with X compact, and for f : X \to Y injective and continuous, f^{-1}: f(X) \to X is uniformly continuous.

Compactness plays well with intersections, finite unions, and finite products endowed with a product metric. The latter is interesting, given that we noted above that for non conserving product metrics, total boundedness doesn’t necessarily carry forward.

Things get trickier when dealing with infinite-dimension spaces. The following statement of the Arzelà-Ascoli Theorem is given, which allows us to characterise the compactness of a closed, bounded subset of \mathcal{C}(X,Y) for compact metric spaces X and Y:

For each x \in X, define \hat{x}: S \to Y by \hat{x}(f) = f(x) for each f \in S. Let \hat{X} = \{\hat{x} | x \in X \}. Then:

  • \hat{X} \subseteq B(S,Y) and
  • S is compact iff x \to \hat{x} from X to B(S,Y) is continuous

13. Equivalence

Consider a set X and the various metrics we can equip it with. We can define a partial order \succeq on these metrics in the following way. d is topologically stronger than e, d \succeq e iff every open subset of (X,e) is open in (X,d). We then get an induced notion of topological equivalence of two metrics, when d \succeq e and e \succeq d.

As well as obviously admitting the same open subsets, topologically equivalent metrics admit the same closed subsets, dense subsets, compact subsets, connected subsets, convergent sequences, limits, and continuous functions to/from that set.

It turns out that two metrics are topologically equivalent iff the identity functions from (X,d) to (X,e) and vice versa are both continuous. Following the discussion above relating to continuity, this hints at potentially stronger notions of comparability – and hence of equivalence – of metrics, which indeed exist. In particular d is uniformly stronger than e iff the identify function from (X,d) to (X,e) is uniformly continuous. Also, d is Lipschitz stronger than e iff the identity function from (X,d) to (X,e) is Lipschitz.

The stronger notion of a uniformly equivalent metric is important because these metrics additionally admit the same Cauchy sequences, totally bounded subsets and complete subsets.

Lipschitz equivalence is even stronger, additionally providing the same bounded subsets and subsets with the nearest-point property.

The various notions of equivalence discussed here collapse to a single one when dealing with norms. For a linear space X, two norms on X are topologically equivalent iff they are Lipschitz equivalent, so we can just refer to norms as being equivalent. All norms on finite-dimensional linear spaces are equivalent.

Finally, some notes on the more general idea of equivalent metric spaces (rather than equivalent metrics.) Again, these are provided in three flavours:

  • topologically equivalent metric spaces (X,d) and (Y,e) are those for which there exists a continuous bijection with continuous inverse (a homeomorphism) from X to Y.
  • for uniformly equivalent metric spaces, we strengthen the requirement to uniform continuity
  • for Lipschitz equivalent metric spaces, we strengthen the requirement to Lipschitz continuity
  • strongest of all, isometries are discussed above

Note that given the definitions above, the metric space (X,d) is equivalent to the metric space (X,e) if d and e are equivalent, but the converse is not necessarily true. For equivalent metric spaces, we require existence of a function — for equivalent metrics this is required to be the identity.

ResearchED on Curriculum

A colleague recently pointed me to the ResearchED Guide to the Curriculum, a volume of essays edited by Clare Sealy. Following the guidance of Lemov and Badillo‘s essay in this volume that ‘reading a book should be a writing-intensive experience’, I’ve written down some of my thoughts after reading this book. They come from the perspective of someone who teaches (albeit in higher education rather than in a school) and is a researcher (but not in education). Of course, my background undoubtedly skews my perspective and limits my awareness of much educational theory.

The context here is that schools in England have been very busy over the last couple of years rethinking their curriculum, not least because the new Ofsted school inspection framework places it centre-stage. So now is a good moment to engage with schools over some of the more tricky questions involved.

I found this collection of essays very thought provoking, and would recommend engaging, whether or not you consider yourself to a fan of the “curriculum revolution” underway in English schools.

Knowledge and Teaching

Many of the contributions relate to a knowledge-based curriculum, but none give a working definition of knowledge. I think it’s useful for educators to reflect on, and leaders to engage with, epistemology at some level. When ideas like a “knowledge-based curriculum” start to be prioritised, then we need to understand what various meanings this might have, and precisely to what other ideas these terms may be being used in opposition. Of course this becomes even more important when politics enters the picture: I find it hard to envisage a definition of a knowledge-based curriculum that is broad enough to encompass both Michael Gove’s approach to knowledge in history and Michael F.D. Young‘s principles espoused in this book. A central problem in the theory of knowledge, of course, is the theory of truth; it’s interesting that Ashbee‘s essay in the same volume riffs on this theory when asking whether it is right that students should learn something false (the presence of electron shells in an atom) in order to facilitate understanding later on. Again, I think this could do with a more sophisticated analysis of truth and falsity here – it is by no means universally accepted that the presence of electron shells can be said to be ‘false’, and I do think the philosophical standpoint on such questions has implications for curriculum design – especially in the sciences.

The same holds for teaching. The role of teaching in imparting knowledge needs to be fully explored. Even if we accept the premise that, to quote Young, ‘schools in a democracy should all be working towards access to powerful knowledge for all their pupils’ (and also leave to one side the definition of a democracy) this leaves open the question of the role of the teacher in providing that access. At one extreme seems to lie the Gradgrindian approach best summarised by Dickens in Hard Times of students as ‘little vessels then and there arranged in order, ready to have imperial gallons of facts poured into them until them were full to the brim’, at the other an unschooling approach. But both can legitimately claim to be pursuing this aim. In the middle of these extremes, the teacher’s role in setting up experiences, and in developing understanding for example through Adey and Shayer’s concept of ‘cognitive conflict’ could explored more deeply.

It’s interesting that in his essay in this book, Young – described by Sealy as one of the ‘godfathers’ of the knowledge-based curriculum – has plenty to say about problematic ways this concept has been interpreted, in particular that “a school adopting a knowledge-led curriculum can spend too much time on testing whether students have memorised the knowledge of previous years“, and that “a focus on memorisation does not necessarily encourage students to develop a ‘relationship to knowledge’ that leads to new questions.” These concerns echo my own fears, and I see the latter also arise in higher education as students make the leap between undergraduate and postgraduate work.

My own teaching as an academic has spanned the full range from largely chalk-and-talk unidirectional presentations to undergraduate students to fairly laissez-faire mentoring of PhD students through their own discovery of the background material required for their research. It’s interesting to reflect on the different level of resourcing required to follow these models, a topic Young and Aurora both pick up in their essays: the need for a curriculum model that incorporates how teachers might engage with external constraints (resource, externally imposed exam syllabuses, etc.) in the short-term, even as we work towards a better long-term future for our students.

Memorisation is mentioned by several authors, and of course can be important, but – as Young says – it’s also important that students come to view it as “a step to acquiring new knowledge“, not as acquiring new knowledge. So my question to schools is this: how is that desirable outcome student perception reflected in your curriculum? How does your curriculum help develop that view by students?

My concerns over some of the ‘knowledge-based’ or ‘knowledge-led’ work in schools in recent years is broadly in line with Young’s view in this volume that teaching viewed as the transmission of knowledge excludes the process by which students develop a relationship with knowledge (my emphasis). I was also pleased by Young’s assertion that schools should treat subjects not just as bodies of knowledge but as communities of teachers and researchers and pupils as neophyte members of such communities. To me, this is wholly consistent with the exciting ideas behind Claxton’s Building Learning Power framework I reviewed some years ago here.

What Do We Want Students to Be Able to Do?

In addition to more traditional answers, Ashbee suggests some that should make schools pause for thought. For example, she suggests that while others may have chosen the curriculum content, students should be equipped to critically evaluate the inclusion of the knowledge in the curriculum they have studied and to ask what else could have been included. I like this idea: a key question for schools, though, is where do our curricula equip students for this task?

One aspect largely absent from this volume is a critical discussion of assessment, including testing, and its role in the curriculum, both in obvious terms of shaping the curriculum in the long- and short-term, and the – perhaps less obvious – nature of forms of assessments in themselves driving students’ behaviour and understanding of the nature of learning.

Planning Lessons

In an essay on Curriculum Coherence, Neil Almond discusses the role of sequencing in a subject curriculum. Almond uses the analogy of a box set, contrasting The Simpsons (minimal ordering required) to Game of Thrones (significant ordering required.)

Three aspects of the timing of lessons are, I think, missing in this discussion and deserve more explicit consideration.

Firstly, any necessary order of discussion of topics is rarely a total (linear) order, to put it mathematically. It’s maybe not even a partial order. Explicit dependencies between topic areas have been explored in depth by Cambridge Mathematics, who have built a directed graph representation of a curriculum. The lack of totality of the order provides significant freedom in practice; it is less clear what best practice might be, as a department or teacher, of how to take advantage of this freedom. It’s also less clear when to take advantage of this freedom: should this be a department-level once-and-for-all decision, one delegated to teachers to determine on the fly, or something in between? And why?

Secondly, even once this freedom has been exploited by mapping the dependence structure of concepts into a total order, there still remains the question of mapping from that order into time. Again, there is flexibility: should this unit take one week or two, or a whole term, and – importantly – curricula need to consider who should exercise this flexibility, when and why. Within mathematics, this has come under a lot of scrutiny in recent years, through discussions around the various definitions of mastery.

Finally, one aspect that the box set analogy obscures is the extent to which the sequencing of lessons is to be co-created with the students. Simpsons and Game of Thrones writers don’t have the option to co-create sequencing on the fly with their audience – schools and universities do. To what extent should this freedom be utilised, by whom, when, and to what end?

Linking Universities with Schools and Secondaries with Primaries

Ashbee discusses the very interesting question of how school curricula can engage with the mechanisms for knowledge generation in the broader discipline. For example, experiment, peer review, art exhibitions, all help reflect the norms of the discipline’s cutting edge back into the school curriculum. This is why I was sad, recently, to see Ofqual consulting on the removal of student-conducted experiments from 2021 GCSE science examinations, to give teachers time to cram in more facts: ‘science’ without experiment is not science.

Since one of the key venues for knowledge generation is the academy, increasing interaction between schools and universities should be very productive at the moment with increased school thinking about curriculum fundamentals. I am pleased to have played a very small part in the engagement of my university, both through my own outreach and through discussions leading up to our recent announcement of the opening of a new Imperial Maths School. More of all this, please, universities!

The theme of linking phases of education also appears in Andrew Percival‘s case study of primary curriculum development, where he emphasises the benefit his primary school obtained through teachers joining subject associations, e.g. in Design and Technology, making links with secondary specialists, and introducing self-directed study time for primary teaching staff to develop their subject knowledge. Those of us in all sectors should seek out links through joint professional associations.

Lessons for Leaders

Christine Counsell‘s essay tackles the topic of how school senior leadership teams (SLTs) should engage with developing and monitoring their departments under a knowledge-based curriculum. The main issue here is in the secondary sector, as SLTs will not include all subject specialisms. My colleague probably had this essay in mind when pointing out this edited volume, as many of the lessons and ideas here apply equally well to school governors engaging with subject leaders. I would agree with this. But actually, I would go further and say that many of Counsell’s suggestions for SLTs actually echo previous best practice in governance from before the new national curriculum. In meetings between subject leaders and school governors, governors have always played the role of knowledgable outsider, whose aim is to guide a subject leader in conversation to reflect on their role in the development of the teaching of their subject and its norms. It’s quite interesting to see this convergence. I was also struck by Counsell’s insistence on the importance of discussing curriculum content with middle leaders rather than relying on proxies such as attainment results alone, which can actually act to conceal the curriculum; in my day job this mirrors the importance we try to give in staff appraisal to discussing the research discoveries of academic staff, not focusing on the number or venue of publications. I think many of the arguments made are transferrable between schools and universities. Counsell also identifies positive and negative roles played by SLTs in developing a positive culture in middle leadership, and hence provides some useful material around which governance questions can be posed to probe SLT themselves.

Award of GCSEs and A-Levels in 2020

Readers of my blog based in England may know that due to COVID-19, GCSEs (typically taken at age 16) and A-Levels (age 18) are not going ahead as exams this year. Yesterday, the Office for Qualifications and Examinations Regulation (Ofqual) published a consultation on the methods to be used to ensure fairness in the award of these important qualifications. I intend to respond to this consultation, which is only open for two weeks, and have produced a draft response below. Before I submit it, I would welcome any feedback. Equally, others should feel free to borrow from my response if it helps them.

Centre Assessment Grades

To what extent do you agree or disagree that we should incorporate the requirement for exam boards to collect information from centres on centre assessment grades and their student rank order, in line with our published information document, into our exceptional regulatory requirements for this year?


To what extent do you agree or disagree that exam boards should only accept centre assessment grades and student rank orders from a centre when the Head of Centre or their nominated deputy has made a declaration as to their accuracy and integrity?

Strongly Agree

To what extent do you agree or disagree that Heads of Centre should not need to make a specific declaration in relation to Equalities Law?


To what extent do you agree or disagree that students in year 10 and below who had been entered to complete exams this summer should be issued results on the same basis as students in year 11 and above?

Strongly Agree

To what extent do you agree or disagree that inappropriate disclosure of centre assessment judgements or rank order information should be investigated by exam boards as potential malpractice?

Neither Agree not Disagree

Do you have any comments about our proposals for centre assessment grades?

  1. While a separate Equalities Law declaration is not necessary, the Head of Centre should be able to declare that they have taken equality law into consideration as part of their declaration.
  2. Ofqual should liaise with the National Governance Association and with teaching unions to provide guidance to governing bodies and staff on appropriate challenge and support to schools in order to ensure processes underlying Head of Centre declaration are appropriately evidenced.
  3. While I understand and support the motivation for labelling inappropriate disclosure of centre assessments as malpractice, care must be taken and guidance given to centres over what is deemed “inappropriate”. I would not want to be in the situation where a teacher is unable to calm a student in a way they normally would, for example by telling them that “I can’t see any way you won’t get a Grade 7”. There may be an equalities implication for those students suffering from extreme anxiety, and this should be considered when drawing up guidance for centres.
  4. While I accept that there is little time to provide detailed guidance for centres to follow when drawing up rank-order lists, the publication of examples of good practice may help centres, and I would recommend this is considered.

Issuing Results

To what extent do you agree or disagree that we should incorporate into the regulatory framework a requirement for all exam boards to issue results in the same way this summer, in accordance with the approach we will finalise after this consultation, and not by any other means?

Strongly Agree

Do you have any comments about our proposal for the issuing of results?


Impact on Students

To what extent do you agree or disagree that we should only allow exam boards to issue results for private candidates for whom a Head of Centre considers that centre assessment grades and a place in a rank order can properly be submitted?


To what extent do you agree or disagree that the arrangements we put in place to secure the issue of results this summer should extend to students in the rest of the UK?

Strongly agree

To what extent do you agree or disagree that the arrangements we put in place to secure the issue of results this summer should extend to all students, wherever they are taking the qualifications?

Neither agree nor disagree

Do you have any comments about the impact of our proposals on any particular groups of students?

  1. Unfortunately, I see no other option than that proposed for private candidates. However, I am concerned that the definition of “properly” in the criterion given is made much more explicit and in objective terms to the heads of centres.
  2. I suggest legal advice is sought over the enforceability of arrangements within centres outside the UK, in particular over the implications of breach of a head of centre’s declaration before proceeding with treating them the same as those within the UK.
  3. I am concerned over the impact of the proposed arrangements for some groups of students who may be differentially affected by the change in routine due to lockdown, e.g. those with Autistic Spectrum Conditions (ASC). In order to be as fair as possible to these students, I suggest that explicit guidance be given to centres emphasising that centres are free to disregard any dip in attainment since lockdown when coming up with their rank-order list, and again emphasising their duties under equalities legislation.

Statistical standardisation of centre assessment grades

To what extent do you agree or disagree with the aims outlined above?


To what extent do you agree or disagree that using an approach to statistical standardisation which emphasises historical evidence of centre performance given the prior attainment of students is likely to be fairest for all students?


To what extent do you agree or disagree that the trajectory of centres’ results should NOT be included in the statistical standardisation process?


To what extent do you agree or disagree that the individual rank orders provided by centres should NOT be modified to account for bias regarding different students according to their particular protected characteristics or their socio-economic backgrounds?


To what extent do you agree or disagree that we should incorporate the standardisation approach into our regulatory framework?


Do you have any comments about our proposals for the statistical standardisation of centre assessment grades?

  1. I am unclear from the consultation on whether standardisation is to occur on an exam-board basis or across exam boards. If it is on an exam-board basis, it is not clear what will happen when centres have changed exam board over the time window used to judge prior grade distribution at the school, especially if the change is for the first time this year.
  2. I have several statistical concerns over the proposed methodology, given the level of detail discussed so far. In particular,
    (i) there is a recognition that small centres or small cohorts will be difficult to deal with – this is a significant issue, and may be exacerbated depending on the definition of cohort (see #3, below), leading to significant statistical uncertainty;
    (ii) it is hugely important to avoid 2020 results being affected by outlier results in previous years. One possibility is to use median results from the previous three years – I would avoid using mean results or a single year’s results.
    Given these concerns, my view is that it would be more appropriate to award a “grade range” to students (e.g. “9-7”, which may of course include degenerate ranges like just “7”). This allows statistical uncertainty arising from the various measures integrated into the standardisation algorithm to be explicitly quantified and provide a transparent per-pupil result. It would allow universities and sixth-forms to decide for themselves whether to admit a pupil optimistically, pessimistically or on the basis of the interval midpoint.
  3. It is unclear from the consultation whether the past grade distributions used will be on a per-subject basis. If not, this is likely to violate proposed Aim 1 of the standardisation process. However, if so, this is likely to result in some very small cohorts for optional subjects at particular centres, so extreme statistical care must be taken in using these cohorts as the basis for grading in 2020. A possible solution us to produce grade ranges, as above.
  4. From a statistical perspective, estimation of grade distributions at a per-centre level (rather than estimation of mean grade, for example) is fraught with danger and highly sensitive to cohort size. It is very important that you do not consider the empirical frequency distribution of grades in a centre over the last 1,2 or 3 years as the underlying probability distribution but rather as a sample from the latter, using an appropriate statistical method to estimate the distribution from the sample. Such methods would also allow the incorporation of notions of variance, which could be factored into the “grade ranges” for students, explained in #2. As an extreme example: if a centre had no Grade 6’s last year, only 5’s and 7’s, we should not bias our model to no Grade 6’s this year, surely.
  5. There is an additional option for standardisation, not considered in the consultation document, which is less subject to the statistical problems of distribution estimation. You could extract just one or two parameters from your model (e.g. desired mean, desired standard deviation) and use these to normalise the distribution from each centre, rather than fit the complete distributions. Such aggregate statistics will be less susceptible to variation, especially for smaller cohorts.
  6. I am unclear how it is possible to award grades to students at centres without any historical outcomes and with no prior attainment data or prior attainment data covering a statistically-insignificant portion of the cohort. For these centres, some form of moderation or relying on Autumn term exam results may be required.
  7. I am concerned by the statement in the consultation that “we will evaluate the optimal span of historical centre outcomes (one, 2 or 3 years). We will select the approach that is likely to be the most accurate in standardising students’ grades.” There is no discussion of how “most accurate” can be judged; there is no data upon which to make this decision, so I would urge caution and an outlier-rejection strategy (see #2 above).
  8. While I broadly agree that there is insufficient data upon which to base rank-order modification based on protected characteristics or socio-economic backgrounds, of the three approaches discussed in the consultation document, the “second approach” is currently very vague and needs further refinement before I can offer an opinion on it. I am happy to be contacted for further comment on this in the future.
  9. I am concerned by the absence of a mechanism to flag unusual rank order differences between subjects in a centre. It should be possible to identify, for example, pupils ranked very high in Subject A and very low in Subject B compared to the typical centile differences in rankings between these subjects, for further investigation by the exam boards. The sensitivity of such a test could be set an appropriate level to the amount of staff time available to investigate.


Appealing the results

To what extent do you agree or disagree that we should not provide for a review or appeals process premised on scrutiny of the professional judgements on which a centre’s assessment grades are determined?


To what extent do you agree or disagree that we should not provide for a student to challenge their position in a centre’s rank order?


To what extent do you agree or disagree that we should not provide for an appeal in respect of the process or procedure used by a centre?

Strongly disagree

To what extent do you agree or disagree that we should provide for a centre to appeal to an exam board on the grounds that the exam board used the wrong data when calculating a grade, and/or incorrectly allocated or communicated the grades calculated?

Strongly Agree

To what extent do you agree or disagree that for results issued this summer, exam boards should only consider appeals submitted by centres and not those submitted by individual students?

Strongly disagree

To what extent do you agree or disagree that we should not require an exam board to ensure consent has been obtained from all students who might be affected by the outcome of an appeal before that appeal is considered?


To what extent do you agree or disagree that exam boards should not put down grades of other students as a result of an appeal submitted on behalf of another student?

Strongly agree

To what extent do you agree or disagree that exam boards should be permitted to ask persons who were involved in the calculation of results to be involved in the evaluation of appeals in relation to those results?


To what extent do you agree or disagree that exam boards should be able to run a simplified appeals process?

Neither agree nor disagree

To what extent do you agree or disagree that we should not provide for appeals in respect of the operation or outcome of the statistical standardisation model?

Strongly agree

To what extent do you agree or disagree with our proposal to make the Exam Procedures Review Service (EPRS) available to centres for results issued this summer?

Strongly agree

Do you have any comments about our proposals for appealing results?

  1. I disagree with the absence of an appeal procedure against centre procedure. While recognising the difficulties faced by centres and the exceptional circumstances, there is an element of natural justice that must be maintained. Without such an appeal process, there is no safeguard against centres using completely inappropriate mechanisms to derive grade and rank orders, beyond the signed statement from the head of centre. While the consultation suggests that detailed guidance will not be sent to centres on the procedures they should follow, it is reasonable to expect a centre – if challenged by a sufficient number of candidates – to explain the procedure they did follow, and for an appeal body to find this to be reasonable or unreasonable in the circumstances. The outcome of any successful appeal may have to be the cancelling of all grades in a certain subject at a certain centre, requiring a fall-back to the Autumn 2020 exams, but the mere existence of such a mechanism may help focus centres on ensuring justifiable procedures are in place.
  2. The consultation document leaves open the question of what role staff of exam boards who were involved in the calculation of results would have in appeals. It appears proper for them to be involved in providing evidence to an independent appeals committee, but not to form such a committee.


An Autumn exam series

To what extent do you agree or disagree that entries to the autumn series should be limited to those who were entered for the summer series, or those who the exam board believes have made a compelling case about their intention to have entered for the summer series (as well as to students who would normally be permitted to take GCSEs in English language and mathematics in November)?


To which qualifications the emergency regulations will apply

To what extent do you agree or disagree that we should apply the same provisions as GCSE, AS and A level qualifications to all Extended Project Qualifications and to the Advanced Extension Award qualification?

Strongly agree

Do you have any comments about the qualifications to which the exceptional regulatory measures will apply?


Building the arrangements into our regulatory framework

To what extent do you agree or disagree that we should confirm that exam boards will not be permitted to offer opportunities for students to take exams in May and June 2020?


To what extent do you agree or disagree with our proposals that exam boards will not be permitted to offer exams for the AEA qualification or to moderate Extended Project Qualifications this summer?


Do you have any comments about our proposals for building our arrangements into our regulatory framework?

I have sympathy with the proposals in this section, but they need to be balanced against the harm done to those candidates who will be unable to use centre-based assessments and against Ofqual’s duties under the Equalities legislation, given that this may disproportionately affect disabled students (see pp.51-52 of the consultation document.) On balance, it may be better to leave this as an operational decision between exam boards and exam centres to allow exams in May and June, if possible, only for these students.


Equality impact assessment

Are there other potential equality impacts that we have not explored? What are they?

As previously noted, I am concerned over the impact of the proposed arrangements for some groups of students who may be differentially affected by the change in routine due to lockdown, e.g. those with Autistic Spectrum Conditions (ASC). In order to be as fair as possible to these students, I suggest that explicit guidance be given to centres emphasising that centres are free to disregard any dip in attainment since lockdown when coming up with their rank-order list, and again emphasising their duties under equalities legislation.

We would welcome your views on how any potential negative impacts on particular groups of students could be mitigated:

If Ofqual were to adopt a “grade range” approach, outlined above, then the quoted research into the reliability of predicted GCSE, AS and A-levels prior to 2015 could be used to inform the degree of uncertainty in the range, mitigating the impact on particular groups of students.

The Growth Mindset

Over the last 5-10 years, the Growth Mindset has become a very popular feature of many schools across England. I have seen it implemented in a couple of schools, and I’m also aware that its initiator, Carol Dweck, gave an interview a couple of years ago where she criticised some implementations as “false growth mindset”.

In order to learn a bit more about the original research conducted by Dweck, I decided over the holiday to read her early book, ‘Self-theories: Their role in motivation, personality, and development’, Psychology Press, 1999. I have no background in psychology and a very limited background in educational theory, but I still want to know how much I can get from this as a parent, as an educator, and as a member of a school board.

As notes to myself, and for others who may be interested, I’m reporting the main take-away messages I got from the book in this post. I do not question the validity of any claims – I am not knowledgeable enough to do so – and I’m also very conscious that I have not had time to follow up the references to read the primary research literature. Instead, I cite below the chapters of the book in which the references can be found, should blog readers be interested in following up more deeply.

Two Theories of Intelligence

Dweck defines the seeking of challenge, the value of effort, and persistence in the face of obstacles as ‘mastery-oriented approaches’. She aims to knock down several ‘commonly held’ beliefs about what fosters such approaches: they are not more common in students with high ability, they are not necessarily improved by success in tasks, they are not improved by praise of students’ intelligence, and they are not even typically associated with students who have a high confidence in their intelligence. So what are the best approaches to fostering such qualities?

Dweck contrasts two theories of intelligence, which I’ve heard referred to in schools as “the fixed mindset” and “the growth mindset”. In the original research in this book, she refers to these as “The Theory of Fixed Intelligence” / “The Entity Theory” and “The Theory of Malleable Intelligence” / “The Incremental Theory”. In an experimental setting, failure is reported to motivate some students and demotivate others, in an apparently fairly bimodal distribution (Chapter 2).

To my mind, what’s missing from this discussion is a shared understanding of what intelligence actually is (Dweck picks this up much later in Chapter 9, on IQ tests). Intelligence, to me, describes the ability to learn and think – this seems to be a qualitative rather than a quantitative property. We could, of course, talk about speed or depth or some other quantification, and I’m aware that there’s a huge volume of work on this topic, about which I know little (any pointers for good books on this?) A principled definition of intelligence seems relevant because while I think nobody would say that a person’s knowledge is fixed, there is clearly a difference of opinion over the ability to gain such knowledge and skills – do people differ solely in the rate of development of knowledge / skills, or in the maximum level of knowledge / skills, or something else? And if there are such limits on the rate of change today for Person X, will those limits be different in the future for the same person? If the rate of change can change, can the rate of change of the rate of change change? And so, ad infinitum. And should we even care? Chapter 9 discusses pupils’ own views, with Dweck suggesting that entity theorists associate intelligence with inherent capacity or potential, while incremental theorists associate intelligence with knowledge, skills and effort. This actually surprised me – it seems that the perspective of the incremental theorists makes the very concept of intelligence – as distinct from knowledge, skills, and effort, superfluous. But it also seems to be somewhat inconsistent, because in Chapter 11 we learn that incremental theorists tend not to judge their classmates’ intelligence based on their performance in school. Perhaps the incremental theorists just have a hazier conception of intelligence in the first place?

What’s clear is that Dweck has no truck with those claiming that Growth Mindset means that “everyone can be an Einstein if you put in the effort” – it’s just that she strongly argues that potential cannot be readily measured based on current attainment – that there may well be undiscovered Einsteins in bottom set classes. These are not the same thing at all.

The Impact of Theories of Intelligence

Dweck then goes on to show that students’ theories of intelligence impact their choice of goals, with students holding the entity theory more likely to chose performance goals, given an option. She shows this to be a causal link, via appropriately designed experiments to temporarily alter students’ theories of intelligence.

Dweck shows that the goals given to students impact on whether they react with a “helpless” or a “mastery” response, even for the same task. Students given a “performance goal” are much more likely to produce a helpless response than those given a “learning goal”. Performance goals are fairly ubiquitous in the English education system, as individual target grades shared with pupils. I wonder whether her observation carries forward into this setting?

Dweck argues that pupils holding an entity model can sabotage their own attainment – withholding effort so that if they do poorly, they can blame their own lack of effort whereas if they do well, they feel validated in their innate intelligence (Chapter 6).

In Chapter 12, Dweck discusses pupils’ views of the belief in the potential to change and improve, and the impact of intelligence models on this belief – which plays out unsurprisingly. I’m more interested in similar beliefs held by teaching staff and how / whether they impact on their practice (does anyone know of any studies on this topic?)

One area where I found the book less precise is whether students can simultaneously be “more of an entity-theorist” in some subjects and “more of an incremental-theorist” in others. Often this was dealt with as if these were universal theories, but my limited experience suggests that students may, for example, hold largely incremental theories in sport while largely entity theories in maths. (Again, anyone know of studies on this topic?)

Changing Theories of Intelligence

So how do we change mindsets? One method Dweck refers to throughout, is to actually teach pupils about theories of intelligence. Another is to focus on the type of praise given: to emphasise an incremental model, praise successful strategies used on tasks they’ve clearly found challenging; quick correct answers should be responded to with apologies for wasting their time, and by setting more appropriate and challenging problems. This is subtly different advice to “praising only effort”, an approach I’ve seen some schools adopting when trying to apply the growth mindset. The best approach seems to be to ensure that challenge level is appropriate for each pupil, ensuring alignment between effort and outcome. Unfortunately, many primary schools in England are running in directly the opposite direction at the moment (see my blog post here); I do wonder what impact this is likely to have on the mindset of high-attaining pupils in the English education system.

In Chapter 15, Dweck looks at the kind of criticism and praise that reinforces these differing views. Criticism suggesting alternatives, e.g. “You’ve not quite done that completely. Maybe you should think of another way,” caused a reinforcement of incremental theories, whereas criticisms of the individual, e.g. “I’m disappointed in you”, tended to emphasise entity theories. More strikingly, Dweck argues strongly that positive praise targeted at inherent traits, e.g. “you’re smart!”, “you’re very good at this” or “I’m proud of you” can reinforce the entity theory, whereas praise such as “you’ve found a great way to do that – can you think of any other ways?” reinforces the incremental theory. While the former type of praise is definitely well received, and gives a temporary boost, Dweck argues that it sets pupils up for failure when they encounter difficulties and draw the inverse conclusion – “if I’ve not been successful, then I’m not smart, and you’re not proud of me”.

Finally, we only need to consider changing mindsets after mindsets are embedded. Dweck spends some space (Chapter 14) on arguing that the helpless-/mastery- dichotomy in responses is present even in 3.5-year-olds (where she associates this with a ‘theory of badness’ held by the children, rather than a ‘theory of intelligence’) so the mindset issue seems to be an issue for all phases of education.


Praise and Criticism. Students receive criticism and praise throughout their learning journey, and trying to change verbal feedback through training of staff is one thing to look at. However, it strikes me that one formalised arena for feedback, shared across parents, children and teachers, is in written “reports home”. I suspect it would be relatively easy to survey these reports for the type of language used, and compare this against the evidence Dweck presents on forms of criticism and praise. I’d be very interested in any schools that may have tried to survey or manage report language to align it with growth mindset principles. This also extends to grades: following Dweck’s results in Chapter 16 on “process praise”, it would seem far better to send home a report saying “worked on some great methods for X” rather than “Grade B”, or “could try alternative strategies for staying focussed” rather than “Grade C”.

Elective Remedial (Catch-up) Classes. Another interesting implication for schools and universities alike is the use of elective remedial classes. Several of Dweck’s studies seem to show that for those pupils who hold an entity theory of intelligence, it’s precisely those pupils who don’t need the remedial classes who are happy to attend them. Institutions should think about how to get around this problem.

School Transitions. There are implications for managing the transition from primary to secondary school, revealed by Dweck’s study of grade school to junior-high transition in the US; perhaps secondaries – jointly with primaries, even – could explicitly teach about theories of intelligence as part of the induction process, like the study at UC Berkeley reported in Chapter 5. I wonder whether any secondaries have tried this?

Mental Health. Mental health in educational settings is a hot topic at the moment. Given Dweck’s theories about self-esteem and its link to mindset, can recent work of schools and universities on mental health be improved by engaging with these ideas? For example, can mental health issues be avoided by trying to foster a growth mindset, and has any significant evidence been collected in this regard?

Grouping by attainment. I have seen many discussions of Growth Mindset that have suggested that grouping pupils by attainment runs counter to the principles outlined here. But interestingly, this is not what Dweck says (Chapter 17). She says that within the entity framework, this might be true, but attainment grouping within the incremental framework is not inherently problematic – it’s just an acknowledgement of fact. I would note that such groups are often referred to in education as “ability groups” rather than “attainment groups” – perhaps reflective of the entity theory. This issue potentially becomes even more acute when considering streaming and/or selective entry testing.

Gifted and Talented Programmes. There appear to be several implications for gifted and talented programmes (G&T) in schools (Dweck deals explicitly with this in Chapter 16, but does not draw out all the conclusions). Firstly, and essentially, we need to ensure all students are challenged, or they will not experience difficulty and effort; at the high-attaining end, this may or may not come from a G&T programme, depending on the pupil and the school approach to differentiation, but it cannot be absent. Secondly, perhaps the name G&T is problematic – Dweck herself says that “the term ‘gifted’ conjures up an entity theory,” and it’s not hard to imagine children in G&T programmes worrying more about losing G&T status than improving their knowledge and skills.

Teacher Mindsets. Although it would seem natural for teachers to have an incremental theory / growth mindset, my observations suggest this is not always the case. I wonder whether any schools have undertaken studies of their own teaching staff in this regard – this could be very interesting.

Beyond Intelligence

Chapter 10 shows that very similar observations apply to personal and social relationships, and Chapter 13 argues that theories of intelligence are also closely associated with the formation of stereotypes. Chapter 17 describes a link with self-esteem, and suggests that parents and teachers alike can model feeling good about effortful tasks, as a route to self-esteem within the incremental model. and that entity models are correlated with depression and anxiety (Chapter 7).

Overall, this book has given me plenty to think about as a parent, and a fair bit to think about as an educator too. I’d be really interested in hearing people’s suggestions for more reading on the topics above, especially if any of the studies I suggest above have already been done in the psychology or education literature.

Readers who enjoyed this post might be interested in my other educational posts.

Teaching Curriculum Beyond Year Group

I have lost track of the number of times that I’ve been told by parents of primary-age children in England that schools are claiming that they are “not allowed” to teach content beyond that set out for the child’s year group in the English National Curriculum, ever since the curriculum reforms in 2014.

This myth seems to be so embedded that I have heard it myself from numerous headteachers and teaching staff.

Instead of spending the time explaining the actual situation afresh each time I am asked, I have instead put it down as this brief explanatory blog post. I hope people find it helpful.

Firstly, different schools will have different policies. It may be school policy to do / not to do something with the curriculum, but this is determined by the school alone, acting in line with the statutory framework. For academies, the statutory framework is typically minimal. Maintained schools must follow the statutory National Curriculum, and – in practice – every academy I’ve come across also abides by these regulations.

Presumably, the myth started because the National Curriculum Programmes of Study [Maths, English] are set out as expectations by year group. However, the programmes very clearly state:

“Within each key stage, schools therefore have the flexibility to introduce content earlier or later than set out in the programme of study.”

“schools can introduce key stage content during an earlier key stage, if appropriate.”

(see Section “School Curriculum” in either the Maths or the English Programme of Study.)

This must be read in the context of the broader thrust of the programmes, which state:

The expectation is that the majority of pupils will move through the programmes of study at broadly the same pace. However, decisions about when to progress should always be based on the security of pupils’ understanding and their readiness to progress to the next stage. Pupils who grasp concepts rapidly should be challenged through being offered rich and sophisticated problems before any acceleration through new content. Those who are not sufficiently fluent with earlier material should consolidate their understanding, including through additional practice, before moving on.

So, put simply, schools can certainly teach children content above their year group. But only if they’re ready for it. Common sense, really.

If you really want to know more about my views on education, then please click on the “Education” link on this blog post to find related posts.