# Schools Bill 2022

On the 11th May 2022 the Government published its Schools Bill, providing the legislation required to implement several of the ideas explained in their earlier White Paper which I analysed in a previous blog post. On the 12th May the Bill was followed up with a raft of explanatory policy statements. I have been going through all these documents, and summarise here points of key importance or that stood out for me.

It is worth noting at the outset that the policy statements go well beyond what’s in the Bill, as they set out plans for intended standards to be set in the future, with the Bill simply enforcing these standards as they arise. As such, the Bill provides for quite sweeping powers for the Secretary of State to amend these standards and duties over time without the requirement for additional legislation.

#### 1.1. Intervention in Existing Academies

Part 1 of the Bill deals with changes to the landscape of Academies in England. I would characterise the scope here as two-fold: firstly, the aim is to move many existing powers from contractual funding agreements into primary legislation, and secondly to be able to apply these powers at the Trust level (‘proprietor’ in the Bill) rather than only at individual Academy level.

Currently, the DfE’s powers of intervention depend on the version of the Funding Agreement and Articles of Association used by a particular academy, which depends on when they converted to Academy status – a patchwork which is both inequitable and very hard to manage. The Bill would place all Academies under the same legislative framework, which is to be welcomed.

Notices to Improve, an existing mechanism (formerly known as ‘Financial Notices to Improve’) seem to be envisaged as the main approach for ensuring change at a Trust or Academy without using the ‘nuclear’ Termination option. It’s proposed that these notices can be issued under wider circumstances than currently, e.g. “[if the SoS is satisfied] that there are significant weaknesses in the proprietor’s governance procedures” Part 1 6(1)(b). I think on balance this is a good idea: it is much better for the DfE to be able to use appropriate levers to effect change, especially in cases of inadequate governance, rather than have to fall back on Termination as the only route.

Along the same lines, Part 1 Section 7 includes powers to appoint or require appointment of directors to an Academy either due to failure to act on a Notice to Improve, a serious breakdown in governance procedures, or for safeguarding reasons. This happens by directing the Trust to appoint a named person or, if the DfE prefers, a person with named skills and experience. The accompanying briefing paper discusses using this power to create Interim Trust Boards, not unlike existing Interim Executive Boards for maintained schools causing concern. It’s interesting that similar provisions existed at academy level in early versions of the model Articles of Association of academies, see e.g. Article 62 allowing the appointment of ‘Additional Governors’ for very similar reasons, and the consequent Article 64 requiring existing governors to stand down in these circumstances. These no longer exist in the current model Articles, and I have always thought this odd. I am pleased to see these intervention powers returning, and perhaps more importantly all schools placed on an equal footing in this regard.

The main new power over LA maintained schools is the power for LAs to apply for an Academy Order via a new Section 3A of the Academies Act 2010. This appears as Part 1 29(2) of the Bill. This seems to be the way the Government is implementing its stated aim to allow LAs to establish MATs (see my analysis of the White Paper for more detail). In areas where the LA is aligned with Government plans, we can expect to see beginning of the end of maintained schools over the 2023/24 school year, I expect. However, there is no obligation for LAs to apply for academy orders, so I believe we will still see a mixed economy of MATs, maintained schools, and isolated SATs well into the future, but perhaps with more regional variation than currently.

#### 1.3. Grammar Schools

Grammar schools and schools with religious character get significant emphasis in the Bill. At least for grammar schools, I read this as primarily a desire to reassure them that the only way they will lose their ability to select is through a parental ballot, not through one of the sweeping powers introduced in the rest of the Bill: essentially the Secretary of State appears to want to explicitly limit his own powers in this regard without bringing in further primary legislation. This makes sense in the context of the White Paper, as a way to bring more grammar schools into MATs.

#### 1.4. Not in the Bill

There are a few things I noted previously as mentioned in the White Paper that the Bill gives no real insight into, such as:

• the requirement to (re)broker schools into strong MATs after two consecutive Ofsted reports below Good
• the setting up of an “arms length” curriculum body
• mechanisms for local governance (though this is mentioned in the explanatory briefing)
• MAT inspection
• collaboration standards for MATs

There is also talk in the accompanying briefings of improvements to complaint handling by academies, another issue I have commented on before and where I agree that significant improvements need to be made.

I suspect the broad powers introduced by the Bill are likely to be used as an “Enabling Act” to introduce these various points via the school standards provisions outlined in Part 1 1(2) of the Bill. We are told that the Academy Trust Standards will come to parliament separately (see explanatory briefing), and I look forward to reading these.

### 2. The Rest of the Bill

The rest of the Bill deals with school funding, school attendance, independent educational institutions, and various miscellaneous items.

I was involved in determining the school funding formula for Essex LA for many years, and I don’t read this portion of the Bill as particularly radical. Long-term readers of my blog will know that I’m not very happy with the approach taken to the National Funding Formula (see, for example, my 2017 consultation response and the links to earlier posts going back some years therein). But the new content in this Bill is essentially simply about making the default funding position the “direct” (previously known as “hard”) National Funding Formula, with the Secretary of State authorising exceptions (presumably because they recognise that actually one size really doesn’t fit all!). I’m pleased to see a continued role for Schools Forums in the proposed legislation, including around de-delegation for maintained schools, in Part 2 (3)(a).

The school attendance provisions introduce a duty on LAs to maintain a register of children not in school and to exchange information between LAs when children on the register move, as well as various provisions to help clamp down on illegal schools.

## Conclusion

From my perspective, the meat of the Bill is Part 1 on Academies, standards, and interventions. I am cautiously optimistic about the measures proposed to intervene in existing academies. In particular, I welcome the move away from high stakes “hands off” versus “terminate” interventions. Whether a particular intervention is best made at Trust or Academy level is worthy of considerable debate, and I think practice is likely to evolve here over time, but it appears to me that if the Government really wants MATs to take active responsibility for their Academies, as discussed in the White Paper, the balance will move toward Trust-level intervention and towards softer forms of intervention.

I am less comfortable with the proposed amendments to the Academies Act 2010 to allow LAs to issue Academy Orders. There appears to be no mechanism to provide for the LAs to decide which MATs academised schools will end up in following one of these orders. While there is no requirement for LAs to issue Academy Orders to all their maintained schools, I do wonder what will happen in those areas which keep significant maintained schools: how will the Government achieve their stated aim of full academisation here? Future legislation, or increasingly constraining LAs ability to adequately maintain their schools via funding or other levers?

Finally, in order to make the intervention measures work well in practice, good resourcing of Regional Director (currently Regional Schools Commissioner) offices will be absolutely essential. It therefore somewhat unfortunate that these papers arrived in the same week as a drive to cut civil service jobs.

# Deciding Theories

Every once in a while I start modelling a problem using a logical formalism and need to remind myself of the decidability of various first-order theories and the complexity results for those that are decidable. Inevitably, I end up spending a long time looking back over literature I once read, with results sprinkled in different places.

This post summarises some of this information in one place for me or others to find in the future, with a skew towards those theories of interest for arithmetic. The information here is primarily taken from Bradley and Manna supplemented with various other sources linked. The usual definitions of $\Omega$ and $O$ are used.

Notes on the table:

1. $T_{\mathrm{E}}$ is as defined in Bradley and Manna, that is it has a signature consisting of $=$ (equality) and all constant, function and predicate symbols, a.k.a. ‘equality with uninterpreted functions’. Reflexivity, symmetry, transitivity, function congruence and predicate congruence are axioms for $=$, but all other functions and predicates are uninterpreted (except w.r.t. these congruence and predicate axioms). Note that this is not the theory of pure equality, for which the full theory is decidable and admits a weak form of quantifier elimination (pure equality doesn’t have the functions or predicates, see Bradley and Manna Section 10.4 for the definition of weak quantifier elimination).
2. Presburger arithmetic as described on Wikipedia does not admit quantifier elimination (counter-example: $\exists x. 2x = y$). However, adding an additional countable set of predicates capturing divisibility (one per divisor) together with an appropriate axiom leads to a version admitting quantifier elimination as per this table.
3. $T_{\mathbb{R}}$ is here taken to have a signature of $\{0, 1, +, -, \times, =, \geq\}$ (with $-$ unary) and axioms corresponding to those of a real closed field (theory of reals in SMT-LIB).
4. $T_{\mathbb{Q}}$ is here taken to have the signature $\{0, 1, +, -, = \geq\}$ (again with $-$ unary). Its axioms are those of an ordered torsion-free abelian group, together with an additional axiom schema asserting divisibility: $\forall x. \exists y. x = ny$ for every positive integer $n$.
5. Using the notation of Bradley and Manna, the theory of extensional arrays is designed to capture array data structures. It has the signature $\{\cdot[\cdot], \cdot\langle\cdot\vartriangleleft\cdot\rangle, =\}$, with the first two symbols denoting a binary and ternary function (respectively) for accessing and modifying array elements; arrays are immutable and so the ternary operator returns the modified array. ‘Extensional’ here denotes that there is an an axiom capturing that arrays are equal iff all their elements are equal in all places. (Theory of ArraysEx theory in SMT-LIB).

A decision procedure for the union of quantifier-free fragments can be obtained by combining the decision procedures for the individual fragments, via the Nelsen-Oppen method, under the following conditions:

1. Their signatures only share equality.
2. Their theories must (individually) be stably infinite, i.e. every $T$-satisifiable quantifier-free formula is satisfied by a $T$-interpretation with a domain of infinite cardinality.

If deciding each individual theory is in NP, then deciding the combination theory is also in NP.

While investigating the quantifier elimination column of the table above, I came across the Ultimate Eliminator tool which looks like great fun.

Please let me know if you spot any errors or significant omissions that may be of interest to readers of this blog.

# Schools White Paper 2022

Today saw the launch of the Department for Education’s long-awaited schools white paper “Opportunity for all: strong schools with great teachers for your child”, alongside some accompanying documents which are well worth reading, especially “The Case for a Fully Trust-Led System”. This post collects some initial thoughts on the content published today. I will focus here on the specific future plans laid out, what they’re trying to achieve, and some elements of what’s missing. I will not engage deeply with areas I know little about such as teacher training, and my reading will of course be biased to my interests in school improvement.

## A Fully Trust-Led System by 2030

The paper sets out the Government’s plan for all schools to be in multi-academy trusts within a single regulatory system by 2030 and makes clear that LAs will be allowed to establish MATs as part of this process (a sensible idea, if we’re going down the all-MAT route). I am slightly worried by the wording that LAs will be able to establish MATs “where too few strong trusts exist” – I don’t think special conditions should be placed on this ability. A small capacity fund of £86m has been put aside to help expand MATs – it is not clear from the paper whether LAs can bid for access to this fund too, or how this budget was arrived at.

The white paper calls for a clear role for every part of the school system (LAs, MATs, ESFA, DfE, etc.) but is rather unclear on what it sees these roles as being. I blogged about my own views on this last month (here and here), and Sam Freedman gives his view in an Institute for Government report also last month.

The paper begins to flesh out the government’s idea of what a good MAT looks like. Firstly, on size: “We know that trusts typically start to develop central capacity when they have more than 10 schools … We expect that most trusts will be on a trajectory to either serve a minimum of 7,500 pupils or run at least 10 schools.” They also provide a list of features of a strong trust, as previously discussed in the CST’s useful discussion paper setting out their view.

I welcome the moves to address the isolation of some schools, especially those stuck in a single academy trust model: “Many of our best schools operate alone, and not enough attention has been paid to harnessing the expertise already in the system”. I blogged about some problems with SATs in one of my February posts and last year the government published views of SATs who had joined MATs, citing improved governance, leadership, sense of direction, and ability to direct financial resources where they are needed.

So how will all schools end up in MATs? Clearly the government would like them to choose to join. For those that don’t and are graded inadequate by Ofsted, there is already the expectation in the RSC guidance that they will be (re-)brokered into strong MATs. It seems that these powers are likely to be strengthened: “We will shortly be consulting on moving schools that have received two consecutive below ‘Good’ judgements from Ofsted into strong trusts to tackle underperformance.”

Whatever our views on the relative merits of MATs, the government’s reports make “interesting” use of statistics: “If all children did as well as pupils in a trust performing at the 90th percentile, national performance at key stage 2 would be 14 percentage points higher and 19 percentage points higher for disadvantaged pupils”. Here, “higher” seems to refer to a baseline of the national average rather than of the 90th percentile of LA maintained schools (say) – quite a bizarre use of statistics.

## Curriculum and Assessment

The government is proposing to create a new arms-length curriculum body, building on Oak National Academy. This could be of value, but could also see a de-professionalisation of teachers: the devil will be in the implementation. I for one would love to see university engagement with this body, and will try to engage enthusiastically if possible.

I’m disappointed that “we will maintain our current system of primary assessment and world-class GCSEs and A levels”. If there’s some things that the pandemic has brought into sharp relief, they certainly include some of the more unsuitable aspects of our exam system, its lack of robustness to disruption, its reliance on norm referencing and on a huge number of GCSE grades that are of limited reliability even in the best of times.

Much is made of a “parent pledge” that schools will provide evidence-based support “if your child falls behind in English or maths”. But behind in what sense? In attainment compared to year expectations (in which case how does this address the claim to support ‘the most able’) or in progress — but if so, compared to what benchmark? And how will this be identified in any uniform way across schools? This will, apparently, be via “robust assessment” — further guidance will be issued in partnership with Ofsted in due course.

## Objectives

The DfE hopes to increase to 90% the proportion of children achieving expected standard in reading, writing and mathematics at primary, and to increase the national average GCSE grade in English and maths to grade 5, both by 2030. These are consistent with “Mission 5” outlined in the previously-published Levelling-up the UK technical annex. As they note in that annex, it’s important to try to understand the level of ambition embedded in this objective. Unfortunately, despite the annex including a section “Is this mission ambitious, specific and achievable?” the section doesn’t actually provide an argument that it is at an appropriate level of ambition. The best I could find in the white paper is a citation to Slater et al. to argue that significant GCSE grade improvements can come from good teachers, but the same paper also says:

We now finally explore whether any of the few observable teacher characteristics that we have are correlated with estimated teaching effectiveness: gender, age, experience and education…. In fact, none of these variables play any statistically significant role in explaining teacher effectiveness, other than very low levels of experience showing a negative effect.

Helen Slater, Neil M. Davies, Simon Burgess, Do Teachers Matter? Measuring the Variation in Teacher Effectiveness in England

In other words, students gain significantly from good teachers, but spotting good teachers before the event — rather than as a backward-looking explanation — is hard to do. So this seems to support the idea that retention of good teachers and a focus on quality teacher CPD and mentoring is key.

## What’s Missing

Quite a lot of information appears to be ‘to follow’. Some examples:

• On governance. “So that trusts continue to be responsive to parents and local communities, all trusts should have local governance arrangements for their schools. We will discuss how to implement this with the sector.”
• On MAT inspection. “We will launch a regulatory review in May 2022 looking at accountability and regulation – including how we will hold trusts to account through inspection in the future.”
• On achieving full academisation. “We … will engage with the sector on how best to achieve a fully trust led system.”
• On new collaboration expectations. “we will introduce a new collaborative standard – one of the new statutory academy trust standards – requiring that trusts work constructively with each other, their local authorities and the wider public and third sectors. We will engage with the sector, through the wider regulatory review, as we develop the detail.”

In addition, there are several other aspects that I had hoped would be better addressed within the white paper:

• Teacher retention. There is a focus on mechanisms to attract new teachers, e.g. with the previously-announced £30,000 starting salary, but very little to retain existing good teachers. I believe this is short-sighted.
• Teacher time. Funding for greater time ‘off timetable’ to engage in the all-important CPD, peer observation, lesson study, etc. has the potential to retain excellent staff as well as improve practice.
• Capital. We know that many of our schools are in bad shape physically. Now we also know the huge impact that fresh air can have on the health of our staff and students alike, I would like to see a one-off significant cash injection to bring all our schools up to the high standard of ventilation we should expect for a healthy future for our school communities.

I will be following future developments with interest.

# MATs, SATs and Maintained Schools

This post forms the second in a short series of blog posts about the current governance structure of schools in England, ahead of the Government’s expected while paper this spring. The previous post was about the review of the Education and Skills Funding Agency. In this post I focus on the landscape of school structures: multi-academy trusts (MATs), single-academy trusts (SATs) and local authority (LA) maintained schools from the perspective of a governor / trustee.

I will draw heavily on Sam Freedman’s report this month released by the Institute for Government as well as the excellent discussion in the Confederation of School Trusts’ report “What is a Strong Trust?“. One does not have to completely agree with Freedman’s or the CST’s perspectives to find both reports clear, well-argued, and a very useful step towards a greater debate in the education sector.

Freedman notes what is obvious to anyone who has worked in or with the sector – that the current dual system of maintained schools and academies leads to significant duplication and inefficiency. He notes that the regulatory system for academies is ‘incoherent’, largely as a result of the ESFA/DfE split I dealt with in my last blog post. He mentions – quite rightly – that LAs’ statutory responsibility for SEND and certain safeguarding requirements further complicates the picture when they have no effective oversight or intervention powers over academy trusts.

My experience is that the rapid expansion of the academies programme after the Academies Act 2010 was mismanaged. We have been left with a patchwork of schools and a web of contractual agreements. Schools which converted early have often been left with legacy articles of association based on the early model articles which demonstrated little insight into how schools could evolve under poor governance (modern model articles are much improved, though not perfect). Regulators have been left with very limited powers to improve matters, and it should not have come as a surprise to government that – as Freedman states – “[By 2012/3] with things moving so fast it quickly became apparent that the Department for Education (DfE) was becoming overwhelmed and could not properly oversee that many schools.” The unfortunate reality is that many schools are still stuck with the legacy of poor decisions made during this period.

## Three School Structures

There are currently three main models for schools in England: those which are part of a multi-academy trust (MAT), those which are a single-academy trust (SAT) and those maintained by local authorities. Often in these discussions the first two types of school are lumped together, and it becomes a discussion about academies versus non-academies, but I think these three situations deserve distinct analysis. In particular, oversight of an individual school’s performance in the maintained sector and in the MAT sector is, in my experience, stronger than in the SAT sector, which is the outlier in terms of sufficient oversight. I wonder how many SATs are maintaining an Ofsted grade above inadequate, and therefore not subject to intervention, but are nevertheless not performing at the level they might be if they had closer interaction with other schools.

It is clear to anyone who has worked with or for the education sector that schools benefit immensely from working together and supporting each other, and I agree with Leora Cruddas’s argument made at a governance meeting I attended last year that, in times of difficulty for a school, a compulsion to work with other schools is important. At the moment, this primarily comes through membership of a strong multi-academy trust, though I do not see why a strong, well-resourced and empowered Local Authority could not form an equally viable mechanism to drive school improvement.

## Pragmatics: Control and Intervention

Unsurprisingly, Freedman’s paper seeks to advise Zahawi on an appropriate way forward to a more coherent fully academised education system, and without entering the discussion over whether academy trusts are the best structure to deliver school education in the first place, it is worth engaging with the recommendations he makes.

Freedman sees the future of LAs – as per the 2016 white paper – as ensuring that every child has a school place, ensuring the needs of vulnerable children are met, and acting as champions for all parents and families, and – quite reasonably – proposes greater powers for LAs to ensure they can actually fulfil these objectives.

There are some proposals made in the paper that I would fully support:

• Setting a transparent framework of expectations for MATs and giving the regulator powers to intervene, close or merge MATs for either financial/compliance reasons or educational reasons, not only tied to Ofsted grades.
• Ensuring that MATs take ownership of educational improvement and are not simply back-office consolidation bodies as is sometimes the case currently.
• A single regulator for academies, ideally organised as an arm’s length body.

There are also some proposals that are less clear cut for me:

• Giving LAs the power to direct a change in individual academy PANs and admissions policies. Let’s assume that we move to an “all-MAT” system with LA’s still retaining the statutory duty to ensure a place for every pupil. To ensure clear lines of accountability, it seems appropriate for these to be at MAT level not individual academy level: surely mechanisms can still be put in place for intervention at MAT level to ensure they play their part in the strategic place-planning of LAs, rather than micromanaging a MAT’s academies over the head of its trustee board?
• Moving SEND and exclusions policy monitoring / appeals from ESFA to LAs. I agree that ESFA is an odd place for this to sit at the moment in terms of ensuring joined up working between the RSC offices, ESFA and the Local Authority. But moving this to LAs rather than to the DfE again seems to introduce dual lines of accountability for MATs; might it not be better for RSCs to be required to ensure MATs meet the LA’s planning needs?
• Giving individual academies a mechanism to argue to ‘break free’ from a MAT, involving giving schools legal status independent from the MAT. I agree that there may be very good reasons for academies to want to move to another MAT if the MAT is not functioning well, however under an all-MAT system it seems that a more appropriate approach is to provide the regulator with powers to intervene at MAT level than to provide independent legal status to individual academies.

There is an important question of democratic control, which I believe is required to balance some of these suggestions. In particular: who gets to appoint trustees (and members) of an academy trust, and what geographical area is it reasonable for a MAT to cover? On the first point, in the early days of academisation, academies needed to have some staff trustees, some (elected) parent trustees and a trustee appointed by the Local Authority. The Secretary of State was empowered to change the composition under specific conditions laid out in the academy’s funding agreement / articles of association. Government views on this composition has changed over time, with parent trustees going out of and then back into fashion, while staff trustees are definitely out of fashion at the moment. Local Authorities no longer get to appoint trustees at all in recent model articles. The situation locally will vary from trust to trust, depending on when their articles were approved — differences that cannot be justified, in my view. I would suggest that trusts articles are updated and that the ability (though not the requirement) for local authorities and the DfE (via RSC offices) to appoint trustees is included in the new model. This would provide LAs and the DfE direct information on trusts, rather than having to rely on existing trust boards to provide accurate information, in addition to providing a powerful mechanism for spreading best practice across trusts.

There is a huge opportunity for development of the schools sector in England. I look forward to publication of the white paper!

#### Appendix: Minor Quibbles

It’s probably worth pointing out a couple of very minor inaccuracies in the Institute for Government report:

• Financial Notices to Improve, mentioned in the report, no longer exist since September 2021, precisely in recognition of the broader ESFA role currently; they are now subsumed within the broader “Notice to Improve” title.
• A few times in the report, the ‘Special Measures’ category is cited as giving the Regional Schools Commissioners power to re-broker academies. While there may be additional powers – depend on the trust’s funding agreement – under a Special Measures Ofsted category, it’s clear in the Schools Causing Concern guidance that RSCs have the power to re-broker any inadequate school, i.e. also those judged to have ‘Serious Weaknesses’.

# Restructuring Education

In the run-up to the Government’s planned white paper on education, I hope to be publishing a few brief blog posts on the landscape of education leadership and management in England. This post focuses on the summary findings from the ESFA review led by Sir David Bell and published this week.

Summary: this review is very welcome and I am supportive of all the key recommendations.

My perspective is one of governance: I have been a long-term member of Essex Schools Forum, first as an elected representative of maintained primary schools, then as an elected representative of secondary academies. Throughout this time (and beyond) I have been involved in helping Essex shape its local funding formula and respond to the national funding formula. I have governance experience both in the maintained sector and in a single academy trust, and work with a new post-16 multi-academy trust through my workplace.

From the perspective of accountability of academy trusts, I think such a review has long been overdue, in particular over clarity of roles between the offices of the regional schools commissioners and the lines of accountability through academy funding agreements.

The findings suggest retaining ESFA’s funding delivery function as an Arms Length Body (ALB) while moving a considerable number of responsibilities to the DfE. This seems sensible. In particular, from a governance perspective, I wholeheartedly endorse the finding that “The Regional School Commissioners (RSCs) and ESFA work together to provide oversight of the school system; the RSCs focus on educational performance, ESFA on financial management, with both contributing to governance. This sometimes creates points of friction internally and a lack of clarity externally”. The proposal, to move all governance oversight not required to be at ESFA to the new regional DfE structures, also seems entirely reasonable. The review also recommends clarifying the relationship between Ofsted, the DfE and ESFA – my experience of this is that there is already clarity over the different roles of Ofsted versus the DfE and ESFA, although admittedly this knowledge is not widespread, even amongst school leaders.

In line with this move to the DfE, the proposal to move ownership of the Academy Trust Handbook to the DfE (unless scaled back to a purely financial management document) is also to be welcomed by governors and trustees.

The final sensible proposal I would highlight from the review aims to achieve greater alignment in dealing with complaints between the maintained and academy sector. As part of this process, I would urge the DfE to consider mandating a more uniform complaints policy at trust level for academies: although the model DfE policies are entirely reasonable, they are not statutory, and academies minimally complying with legislation set out in the Education (Independent School Standards)(England) Regulations 2014 essentially force complaints to be dropped or escalated to Ofsted or ESFA which could be dealt with at trust level under better procedures.

Of course there are bigger questions relating to the role of multi-academy trusts and local authorities and their interaction with the department for education, and I hope to cover some of these issues in future blog posts. But within the confines of our current system, these reforms seem very much worthwhile.

# Easter Coq

This Easter I set myself a little challenge to learn a little bit of Coq – enough to construct a proof of a simple but useful theorem in computer arithmetic. Long-time readers of this blog will know that this is not my first outing with dependent types, though I’ve never used them in anger. Four years ago – also during the Easter break! – I read Stump‘s book on Agda and spent some time playing with proofs and programming, as I documented here.

This blog post documents some of the interesting things in Coq I observed over the last few days. I’ve decided to write the majority of this post in Coq itself, below, before finishing off with some concluding remarks. In this way, anyone really interested can step through the definitions and proofs themselves.

(*
* A first datapath identity
* George A. Constantinides, 2/4/21
*
* This is an attempt to learn some basic Coq by proving a standard identity used in computer arithmetic,
* namely \bar{x} + 1 = \bar{x – 1}.
*
* This identity is useful because it allows Boolean operations to move through arithmetic operations.
*
* The code is for learning and teaching purposes only. It is not intended to be an efficient or elegant
* approach, nor is it intended to make best use of existing Coq libraries. On the contrary, I have often used
* many steps when one would do, so we can step through execution and see how it works.
*)

Require Import Coq.Program.Equality.
Require Import Coq.Logic.Eqdep_dec.
Require Import Coq.Arith.Peano_dec.

(* Create my own bitvector type. It has a length, passed as a nat, and consists of bools. *)
Inductive bv : nat -> Set :=
| nilbv : bv 0
| consbv : forall n : nat, bool -> bv n -> bv (S n).

(* Head and tail of a bitvector, with implicit length arguments *)
Definition hdi {n : nat} (xs : bv (S n)) :=
match xs with
end.

Definition tli {n : nat} (xs : bv (S n)) :=
match xs with
| consbv _ _ tail => tail
end.

(* The basic carry and sum functions of a Boolean full adder *)
Definition carryfunc (a : bool) (b: bool) (c: bool) : bool :=
orb (orb (andb a b) (andb a c)) (andb b c).

Definition sumfunc (a : bool) (b : bool) (c : bool) : bool :=
xorb (xorb a b) c.

(*
* A ripple carry adder, with implicit length argument
* Note that this definition makes use of a trick known as the ‘convoy pattern’ [1]
* to get the dependent typing to work in a match clause. We use a ‘return’ clause
* to make the type of the match result be a function which is then applied to the unmatched
* argument. In this way the type system can understand that x and y have the same dependent type.
* Note also the use of Fixpoint for a recursive definition.
*)

Fixpoint rcai {n : nat} (x : bv n) (y : bv n) (cin : bool) : (bool * bv n) :=
match x in bv n return bv n -> ( bool * bv n ) with
| nilbv => fun _ => (cin, nilbv) (* an empty adder passes its carry in to its carry out *)
| consbv n1 xh xt => fun y1 =>
let (cout, sumout) := rcai xt (tli y1) (carryfunc cin xh (hdi y1)) in
(cout, consbv n1 (sumfunc cin xh (hdi y1)) sumout)
end y.

(* We define addition modulo 2^n by throwing away the carry out, using snd, and then define an infix operator *)
Definition moduloadder {n : nat} (x : bv n) (y : bv n) : (bv n) :=
snd (rcai x y false).

(* Bitwise negation of a word *)
Fixpoint neg {n : nat} (x : bv n) : (bv n) :=
match x with
| nilbv => nilbv
| consbv n1 xh xt => consbv n1 (negb xh) (neg xt)
end.

(* The word-level constant zero made of n zeros *)
Fixpoint bvzero {n : nat} : (bv n) :=
match n with
| O => nilbv
| (S n1) => consbv n1 false bvzero
end.

(* The word-level constant one with n leading zeros *)
Definition bvone {n : nat} :=
consbv n true bvzero.

(* Additive inverse of a word, defined as ‘negate all the bits and add one’  *)
Definition addinv {n : nat} (x : bv (S n)) : (bv (S n)) :=
neg(x) + bvone.

(* Subtraction modulo 2^n is defined as addition with the additive inverse and given its own infix operator *)
Definition modulosub {n : nat} (x : bv (S n)) (y : bv (S n)) :=

Infix “-” := modulosub.

(* a bit vector of just ones *)
Fixpoint ones {n : nat} : (bv n) :=
match n with
| O => nilbv
| S n1 => consbv n1 true ones
end.

(* OK, now we have some definitions, let’s prove some theorems! *)

(* Our first lemma (‘Lemma’ versus ‘Theorem’ has no language significance in Coq) says that inverting a
* bitvector of ones gives us a bitvector of zeros.
* There’s a couple of interesting points to note even in this simple proof by induction:
* 1. I had to use ‘dependent destruction’,
*    which is defined in the Coq.Program.Equality library, to get the destruction of variable x to take into account
*    the length of the bitvector.
* 2. The second use of inversion here didn’t get me what I wanted / expected, again due to dependent typing, for
*    reasons I found explained in [2]. The solution was to use a theorem inj_pair_eq_dec, defined in
*    Coq.Logic.Eqdep_dec. This left me needing to prove that equality on the naturals is decidable. Thankfully,
*    Coq.Arith.Peano_dec has done that.
*)

Lemma invertzeros : forall {n : nat} (x : bv n),
x = bvzero -> neg x = ones.
Proof.
intros n x H.
induction n.
dependent destruction x.
auto. (* base case proved *)
dependent destruction x.
simpl.
f_equal.
simpl bvzero in H.

inversion H.
reflexivity.

simpl bvzero in H.
inversion H. (* inversion with dependent type starts here…          *)
apply inj_pair2_eq_dec in H2. (* goes via this theorem                                 *)
2: apply eq_nat_dec. (* and completes via a proof of decidability of equality *)

apply IHn.
apply H2.
Qed.

(*
* The next lemma says that if you fix one input to a ripple carry adder to zero and feed in the carry-in as zero
* too, then the carry out will not be asserted and the sum will just equal the fixed input.
* I proved this by induction, reasoning by case on the possible Boolean values of the LSB.
* The wrinkle to notice here is that I didn’t know how to deal with a ‘let’ clause, but thanks to Yann Herklotz
* (https://yannherklotz.com) who came to my aid by explaining that a ‘let’ is syntactic sugar for a match.
*)

Lemma rcai_zero: forall (n : nat) (x : bv n),
rcai x bvzero false = (false, x).
Proof.
intros n x.
induction n.
dependent destruction x.
auto. (* base case proved *)
dependent destruction x.
simpl bvzero.
simpl rcai.
destruct b.
unfold sumfunc. simpl.
unfold carryfunc. simpl.

destruct (rcai x bvzero false) eqn: H.
f_equal.

rewrite IHn in H.
inversion H.
reflexivity.

rewrite IHn in H.
inversion H.
f_equal.

unfold sumfunc. simpl.
unfold carryfunc. simpl.

destruct (rcai x bvzero false) eqn: H. (* The trick Yann taught me *)
f_equal.

rewrite IHn in H.
inversion H.
reflexivity.

rewrite IHn in H.
inversion H.
f_equal.
Qed.

(*
* The next lemma proves that -1 is a vector of ones
* One thing to note here is that I needed to explicitly supply the implicit argument n to addinv using @.
*)

Lemma allones: forall {n : nat}, @addinv n bvone = ones.
Proof.
intros n.
induction n.
auto. (* base case proved *)

simpl.
unfold bvone.
simpl.
unfold bvone.
unfold “+”.

simpl.

unfold carryfunc.
simpl.
unfold sumfunc.
simpl.

destruct (rcai (neg bvzero) bvzero false) eqn: H.
simpl.

f_equal.
f_equal.

rewrite rcai_zero in H.
inversion H.

apply invertzeros.
reflexivity.
Qed.

(*
* This lemma captures the fact that one way you can add one to a bitvector using a ripple carry adder is
* to add zero and assert the carry in port.
*)

Lemma increment_with_carry : forall (n : nat) (x : bv (S n)),
x + bvone = snd (rcai x bvzero true).
Proof.
intros n x.
dependent destruction x.

(* first peel off the LSB from the two operands *)

simpl bvzero.
simpl rcai.

unfold bvone.
unfold “+”.
simpl rcai.

(* now case split by the LSB of x to show the same thing *)

destruct b.

unfold carryfunc.
simpl.
unfold sumfunc.
simpl.
reflexivity.

unfold carryfunc.
simpl.
unfold sumfunc.
simpl.
reflexivity.
Qed.

(* This lemma says that if you add a vector of ones to a value x using a ripple carry adder, while asserting the
* carry in port, then the sum result will just be x. Of course this is because -1 + 1 = 0, though I didn’t prove
* it that way.
* A neat trick I found to use in this proof is to use the tactic ‘apply (f_equal snd)’ on one of the hypotheses
* in order to isolate the sum component in the tuple produced by the ripple carry function rcai.
*)

Lemma rcai_ones_cin_identity : forall (n : nat) (x : bv n),
snd (rcai x ones true) = x.
Proof.
intros n x.
induction n.
dependent destruction x.
simpl.
reflexivity.
dependent destruction x.
simpl ones.
simpl rcai.

(* case analysis *)
destruct b.
unfold carryfunc.
unfold sumfunc.
simpl.
destruct (rcai x ones true) eqn: H.
simpl.
f_equal.
apply (f_equal snd) in H. (* a neat trick *)
simpl in H.
rewrite IHn in H.
auto.

unfold carryfunc.
unfold sumfunc.
simpl.
destruct (rcai x ones true) eqn: H.
simpl.
f_equal.
apply (f_equal snd) in H.
simpl in H.
rewrite IHn in H.
auto.
Qed.

(*
* This lemma is actually the main content of what we’re trying to prove, just not wrapped up in
* Note the use of ‘rewrite <-‘ to use an existing lemma to rewrite a term from the RHS of the equality
* in the lemma to the LHS. Without the ‘<-‘ it would do it the other way round.
*)

Lemma main_helper : forall (n : nat) (x : bv (S n)),
neg (x + ones) = neg x + bvone.
Proof.
intros n x.
induction n.
dependent destruction x.
destruct b.
dependent destruction x.
auto.
dependent destruction x.
auto. (* base case proved *)

dependent destruction x.
simpl.
unfold bvone.
unfold “+”.
simpl rcai.

destruct b.
unfold carryfunc.
unfold sumfunc.
simpl.

rewrite rcai_zero.

destruct (rcai x (consbv n true ones) true) eqn: H.
simpl neg.
simpl snd.
f_equal.
f_equal.

apply (f_equal snd) in H.
simpl snd in H.
rewrite rcai_ones_cin_identity in H.
auto.

unfold carryfunc.
unfold sumfunc.
simpl.

destruct (rcai (neg x) (consbv n false bvzero)) eqn: H.
apply (f_equal snd) in H.
simpl snd in H.

rewrite <- increment_with_carry in H.

simpl snd.

destruct (rcai x (consbv n true ones) false) eqn: H1.
simpl snd.
simpl neg.
f_equal.

apply (f_equal snd) in H1.
simpl snd in H1.

rewrite <- H1.
rewrite <- H.

apply IHn.
Qed.

Theorem main_theorem: forall (n : nat) (x : bv (S n)),
neg x + bvone = neg (xbvone).
Proof.
intros n x.
unfold “-“.
rewrite allones.
rewrite <- main_helper.
reflexivity.
Qed.

## Some Lessons

So what have I learned from this experience, beyond a little bit of Coq? Firstly, it was fun. It was a nice way to spend a couple of days of my Easter holiday. I am not sure I would want to do it under time pressure, though, as it was also frustrating at times. If I ever wanted to use Coq in anger for my work, I would want to take a couple of months – or more – to really spend time with it.

On the positive side, Coq really forced me to think about foundations. What do I actually mean when I write $\overline{x} + 1 = \overline{x - 1}$? Should I be thinking in ${\mathbb Z}$, in ${\mathbb Z}/n\mathbb{Z}$, or in digits, and when? How should bitvector arithmetic behave on zero-sized bitvectors? (Oh, and I certainly did not expect to be digging out a proof of decidability of natural equality from Coq’s standard library to prove this theorem!) The negative side is the same: Coq really forced me to think about foundations. And I remain to be convinced that I want to do that when I’m not on Easter holiday and in a philosophical mood.

I loved the type system and the expression of theorems. I’m luke warm about the proof process. At least the way I wrote the proofs – which was probably intolerably amateur – it felt like someone could come along and change the tactics at some point and my proof would be broken. Maybe this is not true, but this is what it felt like. This was a different feeling to that I remember when playing with Agda four years ago, which felt like everything needed to be explicit but somehow felt more nailed down and permanent. In Agda, the proofs are written in the same language as the types and I enjoyed that, too. Both languages are based on dependent types, and so as – I understand – is Lean. My colleague Kevin Buzzard is a strong advocate of Lean. Perhaps that’s one for another Easter holiday!

Thinking about this proof from a hardware perspective – designing efficient bit-parallel arithmetic hardware – it is clear that we do not need to have proved the theorem for all n. Each bit slice occupies silicon area, and as this is a finite resource, it would be sufficient to have one proof for each feasible value of n. Of course, this makes things much easier to prove, even if it comes with much more baggage. I can fire up an SMT solver and prove the theorem completely automatically for a specific value of n. As an example, if you paste the code below into the Z3 prover (hosted at rise4fun), the solver will report unsat, i.e. there is provably no satisfying value of the variable x violating the theorem for n = 4.

(declare-fun x () (_ BitVec 4))
(check-sat)
(exit)


There are pluses and minuses to this. On the plus side, the SMT query is fast and automatic. On the minus side, in addition to only being valid for n = 4, it gives me – and perhaps some future AI – none of the intuition as to why this theorem holds. When I read mathematics, the proofs are not incidental, they are core to the understanding of what I’m reading.

Will this also be true for future AI-driven EDA tools?

### Notes

In case this is useful to anyone (or to me in the future): I got syntax highlighting playing well for Coq with WordPress.com by using coqdoc to generate HTML and CSS, then hacking at the CSS so that it didn’t affect the rest of my WordPress theme, pasting it into the WordPress.com CSS customiser, and putting the generated HTML in a WordPress.com HTML block. Take care to avoid the CSS class .comment, used by coqdoc for code comments but also used by WordPress for blog post comment formatting!

Thanks again to Yann Herklotz for help understanding let bindings in Coq.

# False Negatives and False Positives

Yesterday, I posted some comments on Twitter regarding the press focus on lateral flow test (LFT) false positives on the eve of the return of most students to school in England. It seems that I should probably have written a short blog post about this rather than squeezing it into a few tweets, given the number of questions I’ve had about the post since then. This is my attempt.

The press seem to be focusing on the number of false positives we are likely to see on return to school and the unnecessarily lost learning this may cause. My view is that given the current high case numbers and slow rate of decline, this is not the primary issue we should be worried about. Primarily, we should be worried about the false negative rate of these tests. My concern is that the small number of true positives caught by these tests may have less impact on reducing the rate of infection than behavioural relaxation induced by these tests has on increasing the rate of infection. Time will tell, of course.

Let me explain some of the data that has been released, the conclusions I draw from it, and why false positive rate is important for these conclusions regarding false negative rates.

For any given secondary-age pupil, if given both a LFT and a PCR test, excluding void tests there are four possible outcomes: LFT+ & PCR+ (LFT positive & PCR positive), LFT+ & PCR-, LFT- & PCR+ and LFT- & PCR-. We can imagine these in a table of probabilities, as below.

Here the total LFT+ figure of 0.19% comes from the SchoolsWeek article for secondary school students based on Test and Trace data for late February, while the total PCR+ figure is the confidence interval provided in the REACT-1 9b data from, which says “In the latter half of round 9 (9b), prevalence varied from 0.21% (0.14%, 0.31%) in those aged 65 and over to 0.71% (0.34%, 1.45%) in those aged 13 to 17 years.” Note that REACT-1 9b ran over almost the same period as the Test and Trace data on which the SchoolsWeek article is based.

What I think we all really want to know is what is the probability that a lateral flow test would give me a negative result when a PCR test would give me a positive result? We cannot get this information directly from the table, but we can start to fill in some of the question marks. Clearly the total LFT- probability will be 100% – 0.19% = 99.81%, and the total PCR- probability will be 98.55% to 99.66%. What about the four remaining question marks?

Let’s consider the best case specificity of these tests: that all the 0.19% of LFT+ detected were true positives, i.e. the tests were producing no false positives at all. In that case, the table would look something like this:

Under these circumstances, we can see from the table that the LFTs are picking up 0.19% out of 0.34% to 1.45%, so we can estimate that the most they’re picking up is 0.19/0.34 = 56% of the true positive cases.

However, this assumed no false positives at all, which is a highly unrealistic assumption. What if we consider a more realistic assumption on false positives? The well-cited Oxford study gives a confidence interval for self-trained operatives of 0.24% to 0.60% false positives. Note that the lower end of this confidence interval would suggest that we should see at least 0.24% x 98.55% = ~0.24% of positive LFTs just from false positives alone. This is a higher value than the LFT positive rate we saw over this period of 0.19% (as noted by Deeks here). So this means it’s also entirely feasible that none of the LFT+ results were true positives, i.e. the results table could look more like this:

Now this time round, we can see that the tests are picking up 0% out of 0.34% to 1.45%, so we can estimate that they’re picking up 0% of the true positive cases (i.e. 100% false negatives).

This is why I think we do need to have a conversation about false positives. Not because of a few days of missed school, as reported in the press, but because hiding behind these numbers may be a more significant issue of a much higher false negative rate than we thought, leading to higher infections in schools as people relax after receiving negative lateral flow tests.

Perhaps most importantly, I think the Government needs to commit to following the recommendations of the Royal Statistical Society, which would enable us to get to the bottom of exactly what is going on here with false positive and false negative rates.

(Note that I have assumed throughout this post that LFTs are being used as a ‘quick and easy’ substitute for PCRs, so that the ideal LFT outcome is to mirror a PCR outcome. I am aware that there are those who do not think this is the case, and I am not a medical expert so will not pass further comment on this issue.)

# GCSEs and A-Levels in 2021: Current Plans

This week, a flurry of documents were released by Ofqual and the Department for Education in response to the consultation over what is to replace GCSEs and A-Levels in 2021 I blogged about previously.

In this post, I examine these documents to draw out the main lessons and evaluate them against my own submission to the consultation, which was based on my earlier post. The sources of information I draw on for this post are:

1. A letter (‘Direction’) from the Secretary of State for Education to Ofqual sent to Ofqual on the 23rd February (and published on the 25th).
2. Guidance on awarding qualifications in Summer 2021 published on the 25th February.
3. Ofqual’s new Consultation on the general qualifications alternative awarding framework published on the 25th February, and to close on 11th March.

In my post on 16th January, I concluded that the initial proposals were complex and ill-defined, with scope to produce considerable workload for the education sector while still delivering a lack of comparability. The announcements this week from the Secretary of State and Ofqual have not helped allay my fears.

## Curriculum Coverage

The decision has been made by Government that “teachers’ judgements this year should only be made on the content areas that have been taught.” However, the direction from the Secretary of State also insists that “teachers should assess as much course content as possible to ensure in the teachers’ judgement that there has been sufficient coverage of the curriculum to enable progression to further education, training, or employment, where relevant.”

Presumably, these two pieces of information are supposed to be combined to form a grade, or at least the latter modulates the decision over whether a grade is awardable. However, in what way this should happen is totally opaque. Let’s assume we have Student A who has only covered a small part of the GCSE maths curriculum, in the judgement of teachers ‘insufficient to enable progression to further education’ (In what? A-Level maths, or something else? Surely the judgement may depend on this.) However, Student A is ‘performing at’ a high grade (say Grade 8) in that small part of the curriculum. What do they get? A Grade 8? Lower than a Grade 8? No grade? How well will they do compared to Student B who has broad curriculum coverage, but little depth?

The issue of incomplete curriculum coverage has not nearly been addressed by these proposals.

## The Return of Criterion Referencing

In addition, if grade descriptors are to be useable no matter what narrow range of the curriculum has been covered, they will necessarily be very broad in nature, further narrowing the precision with which such judgements can be made. I await exemplar grade descriptors with some trepidation.

As I pointed out in my submission, calling 2021 and 2020 results GCSEs and A-Levels just misleadingly suggests comparability to previous years – better to wipe the slate clean and call them something else.

## Fairness to Students: Lack of Comparability

One of my main concerns in the original proposal – not addressed in the final outcome – is that of fairness between centres and even between individual pupils in a centre. Centres will be allowed to use work conducted under a wide variety of circumstances, and while they are supposed to sign off that they are ‘confident’ that this work was that of the student without ‘inappropriate’ levels of support, I am not clear how they are supposed to gain that confidence for work conducted during lockdown, for example – which is explicitly allowed. There are many opportunities for unfairness here even within a centre.

Now if we bring in different centres having covered different parts of the curriculum and using different methodologies for quality assurance, the scope for unfairness is dramatic. The problem for students is that such unfairness will be much harder to identify come Summer compared to 2020, when ‘mutant algorithms’ could be blamed.

It seems odd to me that centres are asked to “use consistent sources of evidence for a class or cohort” yet there does not appear to be any reasonable attempt to maintain that consistency between cohorts at different centres. Exam boards will be asked to sample some subjects in some centres to check grades, but given the points I raise above over curriculum coverage it is very unclear how it will be possible to reach the conclusion that the incorrect grade has been awarded in all but the most extreme cases.

The guarantee of an Autumn exam series (as in 2020) is, however, to be welcomed.

## Kicking the Can to Exam Boards

It is now the job of exam boards to come up with “a list of those sources of and approaches to collecting evidence that are considered most effective in determining grades” by the end of March. Good luck to them.

Exam boards must also undertake checks of all centres’ internal quality assurance processes before grades are submitted to them on the 18th of June. It is unclear what these checks will entail – I find this hard to imagine. Schools: please do share your experience of this process with me.

One of the big concerns I had in my original consultation response was over the increase in teacher workload. One aspect of this has been addressed: unlike in the draft proposals, teachers will no longer be responsible for appeals. However, there is still a very considerable additional workload involved in: getting to grips with the assessment materials released by the exam boards, developing an in-house quality assurance system, getting that agreed by the exam board, making an assessment of students, showing the students the evidence on which this assessment is based (this is a requirement), and submitting the grades, all over the window 1st April 2021 – 18th June 2021. I asked in my consultation response whether additional funding will be made available for schools, e.g. to provide cover for the release time required for this work. No answer has been forthcoming.

The development of an in-house quality assurance system is non trivial. The proposed GQAA framework imposes the requirements for such a system to have:

• ‘a set policy on its approach to making judgements in relation to each Teacher Assessed Grade, including how Additional Assessment Materials and any other evidence will be used,’
• ‘internal arrangements to standardise the judgements made in respect of the Centre’s Learners and a process for internal sign-off of each Teacher Assessed Grade,’
• ‘a comparison of the Teacher Assessed Grades to results for previous cohorts at the Centre taking the same qualification to provide a high-level cross-check to ensure that Teacher Assessed Grades overall are not overly lenient or harsh compared to results in previous years,’
• ‘specific support for newly qualified Teachers and Teachers less familiar with assessment, and
• ‘a declaration by the head of Centre.’

The third bullet point seems logically impossible to achieve. Results this year will not be comparable to previous years as they will be using a different system based on different evidence. So there appears to be no way to check whether TAGs (not CAGs this year!) are comparable to those in previous years.

## Private Candidates

Private candidates got a really bad deal last year. This year we are told that private candidates “should be assessed in a similar way to other students”, but that this will be “using an adapted range of evidence”. I’m not completely convinced that these two statements are logically consistent. It will be interesting to hear the experience of private candidates this year.

It is unfortunate that schools will be left with such a narrow window of time, so we must start to think about 2022 right now. However, I note that in the current Consultation on the General Qualifications Alternative Awarding Framework, there is scope for whatever is decided now to bleed into 2022: ”We have not proposed a specific end date for the framework because it is possible some measures will be required for longer than others. Instead we propose that the GQAA Framework will apply until we publish a notice setting an end date.”

All the more reason to get it right now.

# GCSEs and A-Levels in 2021

I have collected my initial thoughts after reading the Ofqual consultation, released on the 15th January 2021, over GCSE and A-Level replacements for this year. Alongside many others, I submitted proposals for 2020 which I felt would have avoided some of the worst outcomes we saw in Summer last year. My hope is that, this year, some of the suggestions will be given greater weight.

The basic principle underlying the Ofqual consultation is that teachers will be asked to grade students, that they can use a range of different evidence sources to do so, and that exam boards will be asked to produce mini tests / exams as one such source of evidence. This is not unlike the approach used in Key Stage 1 assessments (“SATs”) in primary schools in recent years. The actual process to be used to come up with a summary grade based on various sources of information is not being consulted over now, and it appears this will come from exam boards in guidance issued to teachers at some undetermined time in the future. This is a significant concern, as the devil really will be in the detail.

Overall, I am concerned that the proposed process is complex and ill-defined. There is scope to produce considerable workload for the education sector while still delivering a lack of comparability between centres / schools. I outline my concerns in more detail below.

## Exam Board Papers – What are They For?

Ofqual is proposing that exam boards provide teachers papers (‘mini exams’) to “support consistency within and between schools and colleges” and that they “could also help with appeals”. However, it is very unclear how these papers will achieve these objectives. Papers might be sat at school or at home (p.18), they might be under supervision and they might not. Teachers might be asked to ‘remotely supervise’ these tests (p.18). These choices could vary on a per-pupil basis. The taking of tests may even be optional, and certainly teachers “should have some choice” over which questions are answered by their students. Grades will not be determined by these papers, so at best they will form one piece of evidence. If consistency is challenged, will the grades on these papers (when combined in some, as yet undetermined way) overrule other sources of information? This could be a cause of some confusion and needs significant clarity. The scope for lack of comparability of results between centres is significant when placing undue weight on these papers, and I am left wondering whether the additional workload for teachers and exam boards required to implement this proposal is really worth it.

If tests are to be taken (there are good reasons to suggest that they may be bad idea in their currently-envisaged form – see below), then I agree with Ofqual that – in principle – the ideal place to take them is in school (p.18). However, it is absolutely essential that school leaders do not end up feeling pressured to open to all students in an unsafe environment, due to the need for these tests. This is a basic principle, and I would resist any move to place further pressure on school leaders to fully open their schools until it is safe to do so.

## Quality Assurance by Exam Boards

The main mechanism being proposed to ensure comparability and fairness between two centres / schools is random sampling (p.20-21). The exam board will sample the evidence base of a particular school for a particular subject, and query this with the school if they feel there is inadequate evidence to support the grades (it is not clear in the consultation whether the sampling will be of all pupils or individual pupils at that centre). This is a reasonable methodology for that particular subject / centre / student, but there is a major piece of information missing to enable judgement of whether this is sufficient for quality assurance of the system as a whole: what proportion of student grades will be sampled in this way? My concern is that the resources available to exam boards will be too small for this to be a large enough sample and that therefore the vast majority of grades awarded will be effectively unmoderated. This approach appears to be motivated by avoiding the bungled attempt at algorithmic moderation proposed in 2020, but without adequate resourcing, comparability between centres is not guaranteed to be better than it was under the abandoned 2020 scheme, and may even be worse.

Moreover, the bar for changing school grades appears to be set very high: “where robust investigation indicates that guidance has not been followed, or malpractice is found” (p.21), so I suspect we are heading towards a system of largely unmoderated centre-assessed grades. In 2020, centres were not aware at the point of returning CAGs that these would end up being given in unmoderated form, and therefore many centres appear to have been cautious when awarding high grades. Will this still be the case in 2021?

## Curriculum Coverage

It is acknowledged throughout the consultation that centres / schools will have been unable to cover the entire curriculum in many cases. There appear to be two distinct issues to be dealt with here:

A. How to assess a subject with incomplete coverage

There are many ways this could be done. For the sake of argument, consider this question in the simplest setting of an exam. Here, the most direct approach would be simply to assess the entire curriculum, acknowledging that many more students would be unable to answer all questions this year, but re-adjusting grade boundaries to compensate. This may not be the best approach for student wellbeing, however, and in any case the proposal to use non-controlled assessment methods opens up much more flexibility. My concern is that flexibility almost always comes at the cost of comparability.

Ofqual are proposing that teachers have the ability to differentially weight different forms of assessment (e.g. practicals in the sciences). Is is unclear in the consultation whether this is on a per-student or on a per-centre basis – either brings challenges to fairness and transparency, and this point needs to be clarified quite urgently. They are also effectively proposing that teachers can give zero weight to some elements of the curriculum by choosing not to set / use assessments based on these elements. It is as yet undecided whether past work and tests can be used, or whether only work from now on – once students are aware it can be used for these purposes. It is opaque in the consultation how they are proposing to combine these various partial assessments. One approach I would not like to see is a weighted average of the various pieces of evidence available. A more robust approach, and one which may overcome some objections to using prior work, may be to allow teachers to select a number of the highest-graded pieces of work produced to date – a ‘curated portfolio’ approach. This may mitigate against both incomplete curriculum coverage and different student attitudes to summatively-assessed work versus standard class / homework.

B. How to ensure fairness

The consultation acknowledges that students in different parts of the country may have covered different amounts of the curriculum, due to local COVID restrictions. There is an unavoidable tension, therefore, between ‘assessment as a measure of what you can do’ and ‘assessment as a measure of what you can do, under the circumstances you were in’. This tension will not go away, and the Government needs to pick an option as a political decision. Some forms of assessment may mitigate this problem, to a degree, such as the ‘curated portfolio’ proposal made above, but none will solve it.

## Appeals

It is proposed that students are able to appeal to the exam board only ‘on the grounds that the school or college had not acted in line with the exam board’s procedural requirements’ (p.23). I am rather unclear how students are supposed to obtain information over the procedures followed at the school / college, so this sets a very high bar for appeals to the board. Meanwhile, the procedure for appeal to the school (p.23) appears to have a very low bar, and thus could potentially involve a significant extra workload for school staff. There is some suggestion that schools could be allowed to engage staff from other schools to handle marking appeals. If adequately financially resourced, Ofqual may wish to make this mandatory, to avoid conflicts of interest.

It is unclear in the consultation whether students will be able to appeal on the basis of an unfair weighting being applied to different elements of the curriculum (p.14). This could add an additional layer of complexity.

Grade boundaries have always been problematic. Can we really say that a student one mark either side of a Grade A boundary is that different in attainment? Last year, a bungled attempt was made to address this concern by requiring submission of student rankings within grade boundaries. Centre-Assessed Grades (CAGs) last year were optimistic, but this should come as no surprise – given a candidate I believe has a 50/50 chance of either getting an A or a B, why on earth would I choose a B? This issue will persist under the proposals for 2021, and I believe may be amplified by the knowledge that an algorithmic standardisation process will not be used. I suspect we may see even more complaints about ‘grade inflation’ in 2021, with significant knock-on effects for university admissions and funding. The root cause of this problem appears to be the aim to maintain the illusion of comparability between years for GCSE and A-Level results.

There are very significant workload implications for teachers, for school leaders, and for exam boards in these proposals – far more so than in 2020 arrangements. This workload has explicitly not yet been quantified in the consultation. I believe it needs to be quantified and funded: centres should receive additional funding to support this work, and teachers need to be guaranteed additional non-contact time to undertake the considerable additional work being requested of them.

## Private Candidates

Private candidates, such as home-educated students, got a very poor deal last year. This must not be repeated, especially since many of the students who would be taking GCSEs and A-Levels this year are exactly the same home-educated students who decided to postpone for one year as a result of the changes last year. I am concerned to ensure comparability of outcomes between private candidates and centre-based candidates, and I am worried that two of the four proposed mechanisms for private candidates essentially propose a completely different form of qualification for these candidates.

## Are 2021 (and 2020) qualifications actually GCSEs and A-Levels?

By labelling the qualifications of 2021 as GCSEs / A-Levels, rather than giving them a different title, there is an implicit statement of comparability between grades awarded in 2021 and those in previous years, which is rather questionable. Others made the point that in 2020 it may have been better to label these qualifications differently – the same argument applies in 2021. Even Ofqual implicitly make this point (p.27) when presenting the argument against overseas candidates taking exams as normal “might give rise to comments that there were 2 types of grades awarded”. The reality is that there will at least three types of grades awarded in recent years, pre-2020, 2020, and 2021. Is it time to face up to this and avoid the pretence of comparability between these different systems?

## Equality Considerations

Ofqual seem to believe that if exam boards publish the papers / tests / mini-exams ‘shortly before’ they are taken then that will avoid leaking information but won’t put some students at a disadvantage because ‘students would not know which one(s) they would be required to complete’. I can envisage a situation where some students try to prepare for all published papers the moment they are released online, potentially a much greater number of papers than they will be required to sit, leading to considerable stress and anxiety, with potential equalities implications.

From the consultation, is not clear how exam board sampling will work, but there is the opportunity to bias the sampling process to help detect and correct for unconscious bias, if equalities information is available to exam boards. This could be considered.

## Conclusion

On p.29, Ofqual state that ‘The usual assurances of comparability between years, between individual students, between schools and colleges and between exam boards will not be possible.’ This is not inspiring of confidence, but is honest. The question is how we can mitigate these impacts as far as possible. I hope Ofqual will listen carefully to the suggestions for 2021, and publish the approach taken in plenty of time. Releasing the algorithm used in 2020 on the day of A-level result release was unacceptable, and I hope Ofqual have learnt from this experience.

# A-Levels and GCSEs in 2020

This week was A-Level results day. It was also the day that Ofqual published its long-awaited standardisation algorithm. Full details can be found in the 319-page report. In this blog post, I’ve set down my initial thoughts after reading the report.

## Prelude

I would like to begin by saying that Ofqual was not given an easy task: produce a system to devise A-level and GCSE grades without exams or coursework. Reading the report, it is clear that they worked hard to do the best they could within the confines they operate, and I respect that work. Nevertheless, I have several concerns to share.

## Concerns

1. Accounting for Prior Attainment

The model corrects for differences between historical prior attainment and prior attainment of the 2020 cohort in the following way (first taking into account any learners without prior attainment measures.) For any particular grade, the proportion to be awarded is equal to the historical proportion at that grade adjusted by a factor referred to in the report as $q_{kj} - p_{kj}$. (See p.92-93 of the report, which incidentally has a typo here — $c_k$ should read $c_{kj}$.) As noted by the Fischer Family Trust, it appears that this factor is based solely on national differences in value added, and this could cause a problem. To illustrate this requires an artificial example. Imagine that Centre A has a historical transition matrix looking like this – all of its 200 students have walked away with A*s in this subject in recent years, whether they were in the first or second GCSE decile (and half were in each). Well done Centre A!

Meanwhile, let’s say the national transition matrix looks more like this:

Let’s now look at 2020 outcomes. Assume that this year, Centre A has an unusual cohort: all students were second decile in prior attainment. It seems natural to expect that it would still get mainly A*s, consistent with its prior performance, but this is not the outcome of the model. Instead, its historical distribution of 100% A*s is adjusted downwards because of the national transition matrix. The proportion of A*s at Centre A will be reduced by 40% – now only 60% of them will get A*s! This happens because the national transition matrix expects a 50/50 split of Decile 1 and Decile 2 students to end up with 50% A* and a Decile 2-only cohort to end up with 10% A*, resulting in a downgrade of 40%.

2. Model accuracy

Amongst the various possible standardisation options, Ofqual evaluated accuracy based on trying to predict 2019 exam grades and seeing how well they matched to awarded exams. This immediately presents a problem: no rank orders were submitted for 2019 students, so how is this possible? The answer provided is “the actual rank order within the centre based on the marks achieved in 2019 were used as a replacement“, i.e. they back-fitted 2019 marks to rank orders. This only provides a reasonable idea of accuracy if we assume that teacher-submitted rank orders in 2020 would exactly correspond to mark orders of their pupils, as noted by Guy Nason. Of course this will not be the case, so the accuracy estimates in the Ofqual report are likely to be significant overestimates. And they’re already not great, even under a perfect-ranking assumption: Ofqual report that only 12 out of 22 GCSE subjects were accurate to within one grade, with some subjects having only 40% accuracy in terms of predicting the attained grade – so one is left wondering what the accuracy might actually be for 2020 once rank-order uncertainty is taken into account.

There may also be a systematic variation in the accuracy of the model across different grades, but this is obscured by using the probability of successful classification across any grade as the primary measure of accuracy. Graphs presented in the Ofqual report suggest, for example, that the models are far less accurate at Grade 4 than at Grade 7 in GCSE English.

3. When is a large cohort a large cohort?

A large cohort, and therefore one for which teacher-assessed grades are used at all, is defined in the algorithm to be one with at least 15 students. But how do we count these 15 students? The current cohort or the historic cohort, or something else? The answer is given in Ofqual’s report: the harmonic mean of the two. As an extreme example of this, centre cohorts can be considered “large” with only 8 pupils this year – so long as they had at least 120 in the recent past. It seems remarkable that a centre could have fewer pupils than GCSE grades and still be “large”!

4. Imputed marks fill grade ranges

As the penultimate step in the Ofqual algorithm, “imputed marks” are calculated for each student – a kind of proxy mark equally spaced between grade end-points. So, for example, if Centre B only has one student heading for a Grade C at this stage then – by definition – it’s a mid-C. If they had two Grade C students, they’d be equally spaced across the “C spectrum”. This means that in the next step of the algorithm, cut-score setting, these students are vulnerable to changing grades. For centres which tend to fill the full grade range anyway, this may not be an issue. But I worry that we may see some big changes at the edges of centre distributions as a result of this quirk.

5. No uncertainty quantification

## What Must Happen Now?

I think everyone can agree that centres need to immediately receive all the intermediate steps in the calculations of their grades. Many examinations officers are currently scratching their heads, after having received only a small part of this information. The basic principle must be that centres are able to recalculate their grades from first principles if they want to. This additional information should include the proportion of pupils in both historical and current cohorts with matched prior attainment data for each subject and which decile each student falls into, the national transition matrices used for each subject, the values of $q_{kj}$ and $p_{kj}$ for each subject / grade combination, the imputed marks for each 2020 student, and the national imputed mark cut-points for each grade boundary in each subject.

At a political level, serious consideration should now be given to awarding teacher-assessed grades (CAGs) this year. While I was initially supportive of a standardisation approach – and I support the principles of Ofqual’s “meso-standardisation” – I fear that problems with the current standarisation algorithm are damaging rather than preserving public perception of A-Level grades. We may have now reached the point that the disadvantages of sticking to the current system are worse than the disadvantages of simply accepting CAGs for A-Levels.

Ofqual states in their report that “A key motivation for the design of the approach to standardisation [was] as far as possible [to] ensure that a grade represents the same standard, irrespective of the school or college they attended.”.  Unfortunately, my view is that this has not been achieved by the Ofqual algorithm. However, despite my concerns over Ofqual’s algorithm, it is also questionable whether any methodology meeting this objective could be implemented in time under a competitive education system culture driven by high-stakes accountability systems. Something to think about for our post-COVID world.