Pruning Circuits

On Tuesday, my former PhD student Erwei Wang (now at AMD) will present our recent paper “Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference” at the ACM International Symposium on FPGAs. This is joint work with our collaborator Mohamed Abdelfattah from Cornell Tech as well as James Davis, George-Ilias Stavrou and Peter Y.K. Cheung at Imperial College.

In 2019, I published a paper in Phil. Trans. Royal Soc. A, suggesting that it would be fruitful to explore the possibilities opened up when considering the graph of a Boolean circuit – known as a netlist – as a neural network topology. The same year, in a paper at FCCM 2019 (and then with a follow-up article in IEEE Transactions on Computers), Erwei, James, Peter and I showed how to put some of these ideas into practice by learning the content of the Boolean lookup tables that form the reconfigurable fabric of an FPGA, for a fixed circuit topology.

Our new paper takes these ideas significantly further and actually learns the topology itself, by pruning circuit elements. Those working on deep neural networks will be very familiar with the idea of pruning – removing certain components of a network to achieve a more efficient implementation. Our new paper shows how to apply these ideas to prune circuits made of lookup tables, leaving a simplified circuit capable of highly-efficient inference. Such pruning can consist of reducing the number of inputs of each Boolean LUT and, in the limit, removing the LUT completely from the netlist. We show that very significant savings are possible compared to binarising a neural network, pruning that network, and only then mapping it into reconfigurable logic – the standard approach to date.

We have open-sourced our flow, and our paper has been given a number of ACM reproducibility badges. Please try it out at https://github.com/awai54st/Logic-Shrinkage and let us know what you think. And, if you’re attending FPGA next week, reach out and say hello.

Better accuracy / silicon area tradeoffs through pruning circuits

Islands of Certainty

Readers of this blog may remember that my PhD student Jianyi Cheng (jointly supervised by John Wickerson) has been working on high-level synthesis, combining dynamic scheduling with static scheduling. His latest contribution, to be presented on 28th February at the ACM FPGA conference, is about finding islands of static control flow in a sea of dynamic behaviour.

Here’s the story so far:

So now, two years later, we are back at the same conference to present a method to do just that. We now have an automated flow to select parts of a program to statically schedule, resulting in a 4x reduction in area combined with a 13% boost in performance compared to a fully dynamic circuit, a result that is close to the best achievable — as shown by exhaustively enumerating different parts of the program to schedule statically.

The basic idea of the paper is to develop the concept of a static island — a part of a dataflow graph where making decisions on scheduling of operations once, at compile time, is likely to have minimal impact on performance (or may even improve it) while opening the door to static resource sharing. We can throw a boundary around these islands, synthesise them efficiently with commercial HLS tools (we use Xilinx Vitis HLS), and integrate the result into the overall dynamic circuit using our previous open-source compiler flow.

So what makes a good static island? Unsurprisingly, these islands should exhibit static control flow or control flow with balanced path timing, e.g. in a conditional statement the if and else branch should take the same time, and loops should have constant dependence distances (or none at all). Jianyi also shows that there is an advantage to having these islands consume their inputs at offset-times, e.g. for a two-input island we may wish the static scheduler to be aware that second input is available — on average — two cycles after the first. He shows precisely how to generate ‘wrapper’ circuits for these components, allowing them to communicate with a dynamically scheduled environment.

The overall design flow, shown below, is now fully automated – freeing the user from writing the pragmas we required two years ago.

MATs, SATs and Maintained Schools

This post forms the second in a short series of blog posts about the current governance structure of schools in England, ahead of the Government’s expected while paper this spring. The previous post was about the review of the Education and Skills Funding Agency. In this post I focus on the landscape of school structures: multi-academy trusts (MATs), single-academy trusts (SATs) and local authority (LA) maintained schools from the perspective of a governor / trustee.

I will draw heavily on Sam Freedman’s report this month released by the Institute for Government as well as the excellent discussion in the Confederation of School Trusts’ report “What is a Strong Trust?“. One does not have to completely agree with Freedman’s or the CST’s perspectives to find both reports clear, well-argued, and a very useful step towards a greater debate in the education sector.

Freedman notes what is obvious to anyone who has worked in or with the sector – that the current dual system of maintained schools and academies leads to significant duplication and inefficiency. He notes that the regulatory system for academies is ‘incoherent’, largely as a result of the ESFA/DfE split I dealt with in my last blog post. He mentions – quite rightly – that LAs’ statutory responsibility for SEND and certain safeguarding requirements further complicates the picture when they have no effective oversight or intervention powers over academy trusts.

Early Academisation Errors

My experience is that the rapid expansion of the academies programme after the Academies Act 2010 was mismanaged. We have been left with a patchwork of schools and a web of contractual agreements. Schools which converted early have often been left with legacy articles of association based on the early model articles which demonstrated little insight into how schools could evolve under poor governance (modern model articles are much improved, though not perfect). Regulators have been left with very limited powers to improve matters, and it should not have come as a surprise to government that – as Freedman states – “[By 2012/3] with things moving so fast it quickly became apparent that the Department for Education (DfE) was becoming overwhelmed and could not properly oversee that many schools.” The unfortunate reality is that many schools are still stuck with the legacy of poor decisions made during this period.

Three School Structures

There are currently three main models for schools in England: those which are part of a multi-academy trust (MAT), those which are a single-academy trust (SAT) and those maintained by local authorities. Often in these discussions the first two types of school are lumped together, and it becomes a discussion about academies versus non-academies, but I think these three situations deserve distinct analysis. In particular, oversight of an individual school’s performance in the maintained sector and in the MAT sector is, in my experience, stronger than in the SAT sector, which is the outlier in terms of sufficient oversight. I wonder how many SATs are maintaining an Ofsted grade above inadequate, and therefore not subject to intervention, but are nevertheless not performing at the level they might be if they had closer interaction with other schools.

It is clear to anyone who has worked with or for the education sector that schools benefit immensely from working together and supporting each other, and I agree with Leora Cruddas’s argument made at a governance meeting I attended last year that, in times of difficulty for a school, a compulsion to work with other schools is important. At the moment, this primarily comes through membership of a strong multi-academy trust, though I do not see why a strong, well-resourced and empowered Local Authority could not form an equally viable mechanism to drive school improvement.

Pragmatics: Control and Intervention

Unsurprisingly, Freedman’s paper seeks to advise Zahawi on an appropriate way forward to a more coherent fully academised education system, and without entering the discussion over whether academy trusts are the best structure to deliver school education in the first place, it is worth engaging with the recommendations he makes.

Freedman sees the future of LAs – as per the 2016 white paper – as ensuring that every child has a school place, ensuring the needs of vulnerable children are met, and acting as champions for all parents and families, and – quite reasonably – proposes greater powers for LAs to ensure they can actually fulfil these objectives.

There are some proposals made in the paper that I would fully support:

  • Setting a transparent framework of expectations for MATs and giving the regulator powers to intervene, close or merge MATs for either financial/compliance reasons or educational reasons, not only tied to Ofsted grades.
  • Ensuring that MATs take ownership of educational improvement and are not simply back-office consolidation bodies as is sometimes the case currently.
  • Giving local authorities the right of access to MAT data.
  • A single regulator for academies, ideally organised as an arm’s length body.

There are also some proposals that are less clear cut for me:

  • Giving LAs the power to direct a change in individual academy PANs and admissions policies. Let’s assume that we move to an “all-MAT” system with LA’s still retaining the statutory duty to ensure a place for every pupil. To ensure clear lines of accountability, it seems appropriate for these to be at MAT level not individual academy level: surely mechanisms can still be put in place for intervention at MAT level to ensure they play their part in the strategic place-planning of LAs, rather than micromanaging a MAT’s academies over the head of its trustee board?
  • Moving SEND and exclusions policy monitoring / appeals from ESFA to LAs. I agree that ESFA is an odd place for this to sit at the moment in terms of ensuring joined up working between the RSC offices, ESFA and the Local Authority. But moving this to LAs rather than to the DfE again seems to introduce dual lines of accountability for MATs; might it not be better for RSCs to be required to ensure MATs meet the LA’s planning needs?
  • Giving individual academies a mechanism to argue to ‘break free’ from a MAT, involving giving schools legal status independent from the MAT. I agree that there may be very good reasons for academies to want to move to another MAT if the MAT is not functioning well, however under an all-MAT system it seems that a more appropriate approach is to provide the regulator with powers to intervene at MAT level than to provide independent legal status to individual academies.

There is an important question of democratic control, which I believe is required to balance some of these suggestions. In particular: who gets to appoint trustees (and members) of an academy trust, and what geographical area is it reasonable for a MAT to cover? On the first point, in the early days of academisation, academies needed to have some staff trustees, some (elected) parent trustees and a trustee appointed by the Local Authority. The Secretary of State was empowered to change the composition under specific conditions laid out in the academy’s funding agreement / articles of association. Government views on this composition has changed over time, with parent trustees going out of and then back into fashion, while staff trustees are definitely out of fashion at the moment. Local Authorities no longer get to appoint trustees at all in recent model articles. The situation locally will vary from trust to trust, depending on when their articles were approved — differences that cannot be justified, in my view. I would suggest that trusts articles are updated and that the ability (though not the requirement) for local authorities and the DfE (via RSC offices) to appoint trustees is included in the new model. This would provide LAs and the DfE direct information on trusts, rather than having to rely on existing trust boards to provide accurate information, in addition to providing a powerful mechanism for spreading best practice across trusts.


There is a huge opportunity for development of the schools sector in England. I look forward to publication of the white paper!


Appendix: Minor Quibbles

It’s probably worth pointing out a couple of very minor inaccuracies in the Institute for Government report:

  • Financial Notices to Improve, mentioned in the report, no longer exist since September 2021, precisely in recognition of the broader ESFA role currently; they are now subsumed within the broader “Notice to Improve” title.
  • A few times in the report, the ‘Special Measures’ category is cited as giving the Regional Schools Commissioners power to re-broker academies. While there may be additional powers – depend on the trust’s funding agreement – under a Special Measures Ofsted category, it’s clear in the Schools Causing Concern guidance that RSCs have the power to re-broker any inadequate school, i.e. also those judged to have ‘Serious Weaknesses’.

Restructuring Education

In the run-up to the Government’s planned white paper on education, I hope to be publishing a few brief blog posts on the landscape of education leadership and management in England. This post focuses on the summary findings from the ESFA review led by Sir David Bell and published this week.

Summary: this review is very welcome and I am supportive of all the key recommendations.

My perspective is one of governance: I have been a long-term member of Essex Schools Forum, first as an elected representative of maintained primary schools, then as an elected representative of secondary academies. Throughout this time (and beyond) I have been involved in helping Essex shape its local funding formula and respond to the national funding formula. I have governance experience both in the maintained sector and in a single academy trust, and work with a new post-16 multi-academy trust through my workplace.

From the perspective of accountability of academy trusts, I think such a review has long been overdue, in particular over clarity of roles between the offices of the regional schools commissioners and the lines of accountability through academy funding agreements.

The findings suggest retaining ESFA’s funding delivery function as an Arms Length Body (ALB) while moving a considerable number of responsibilities to the DfE. This seems sensible. In particular, from a governance perspective, I wholeheartedly endorse the finding that “The Regional School Commissioners (RSCs) and ESFA work together to provide oversight of the school system; the RSCs focus on educational performance, ESFA on financial management, with both contributing to governance. This sometimes creates points of friction internally and a lack of clarity externally”. The proposal, to move all governance oversight not required to be at ESFA to the new regional DfE structures, also seems entirely reasonable. The review also recommends clarifying the relationship between Ofsted, the DfE and ESFA – my experience of this is that there is already clarity over the different roles of Ofsted versus the DfE and ESFA, although admittedly this knowledge is not widespread, even amongst school leaders.

In line with this move to the DfE, the proposal to move ownership of the Academy Trust Handbook to the DfE (unless scaled back to a purely financial management document) is also to be welcomed by governors and trustees.

The final sensible proposal I would highlight from the review aims to achieve greater alignment in dealing with complaints between the maintained and academy sector. As part of this process, I would urge the DfE to consider mandating a more uniform complaints policy at trust level for academies: although the model DfE policies are entirely reasonable, they are not statutory, and academies minimally complying with legislation set out in the Education (Independent School Standards)(England) Regulations 2014 essentially force complaints to be dropped or escalated to Ofsted or ESFA which could be dealt with at trust level under better procedures.

Of course there are bigger questions relating to the role of multi-academy trusts and local authorities and their interaction with the department for education, and I hope to cover some of these issues in future blog posts. But within the confines of our current system, these reforms seem very much worthwhile.

What’s the Rush?

At FPL 2021, my PhD student Jianyi Cheng (jointly supervised by John Wickerson) will present our short paper “Exploiting the Correlation between Dependence Distance and Latency in Loop Pipelining for HLS”. In this post, I explain the simple idea behind this paper and how it can significantly accelerate certain neglected corner cases in high-level synthesis (HLS).

By far the most significant way to extract high performance from a hardware accelerator in high-level synthesis is to use loop pipelining. Loop pipelining is the idea of starting the next iteration of a loop before the previous one finishes, allowing multiple iterations to be executing simultaneously. However, some loop iterations may need a result produced by earlier loop iterations, limiting the extent to which this can be done. HLS tools generally determine a ‘safe’ initiation interval – the number of clock cycles between starting two adjacent loop iterations – and then schedule the iterations statically at multiples of this interval.

This limit on initiation interval of the loop essentially derives from two properties. Firstly, if it takes a long time for the computation of a loop iteration to execute, then any iterations waiting on its result must be delayed. But secondly if an iteration’s result is only needed many iterations later, it can afford to take a long time to compute: what’s the rush? These two factors – latency and dependence distance – together determine the safe initiation interval.

The simple observation of our paper is that typically HLS tools will generally independently over-approximate latency and under-approximate dependence distance. However, there are some examples of programs where there is a correlation between dependence distance and latency. Jianyi gives this nice motivating example in the paper:

double f( double a ) {
  return (((((a+0.64)*a+0.7)*a+0.21)*a+0.33)*a+0.25)*a+0.125;
}

void example( double vec[M] ) {

  for (int i = 0; i < N; i++) {
    double e = vec[i];
    if (e > 0) vec[i+63] = f(e);
    else vec[i*i+9] = e * e;
  }

}

In this code snippet, you can see two control paths in the loop. The if branch has a long latency (it computes the Horner scheme polynomial f) but also writes to elements of vec that only get read many iterations later. Meanwhile the else branch has a short latency but can write – in the early stages of the loop at least – to values read in nearby iterations.

The end result is that the commercial tools Jianyi tried don’t cope very well with scheduling this loop. However, Jianyi has developed an approach that uses the formal verification tool Boogie to show that this loop can actually be scheduled very efficiently by exploiting this correlation.

He has developed an LLVM pass called iiProver that proves that it is safe to use a certain II with the commercial Vitis HLS tool from Xilinx. iiProver and our benchmarks are available – please take a look: https://github.com/JianyiCheng/iiProver. And you can hear Jianyi talking about his work on Youtube here: https://www.youtube.com/watch?v=SdQeBBc85jc.

It Probably Works!

Followers of my research will know that I’ve long been interested in rounding errors and how they can be controlled to best achieve efficient hardware designs. Going back 20 years, I published a book on this topic based on my PhD dissertation, where I addressed the question of how to choose the precision / word-length (often called ‘bit width’ in the literature) of fixed point variables in a digital signal processing algorithm, in order to achieve a controlled tradeoff between signal-to-noise ratio and implementation cost.

Fast forward several years, and my wonderful collaborators Fredrik Dahlqvist, Rocco Salvia, Zvonimir Rakamarić and I have a new paper out on this topic, to be presented by Rocco and Fredrik at CAV 2021 next week. In this post, I summarise what’s new here – after all, the topic has been studied extensively since Turing!

I would characterise the key elements of this work as: (i) probabilistic, i.e. we’re interested in showing that computation probably achieves its goal, (ii) floating point (especially of the low custom-precision variety), and (iii) small-scale computation on straight-line code, i.e. we’re interested in deep analysis of small kernels rather than very large code, code with complex control structures, or code operating on very large data structures.

Why would one be interested in showing that something probably works, rather than definitely works? In short because worst-case behaviour is often very far from average case behaviour of numerical algorithms, a point discussed with depth in Higham and Mary’s SIAM paper. Often, ‘probably works’ is good enough, as we’ve seen recently with the huge growth of machine learning techniques predicated on this assumption.

In recent work targeting large-scale computation, Higham and Mary and, independently, Ipsen, have considered models of rounding error that are largely / partially independent of the statistical distribution of the error induced by a specific rounding operation. Fredrik was keen to take a fresh look at the kind of distributions one might see in practice, and in our paper has derived a ‘typical distribution’ that holds under fairly common assumptions.

Rocco and Fredrik then decided that a great way to approximate the probabilistic behaviour of the program is to sandwich whatever distribution is of interest between two other easy to compute distributions, utilising the prior idea of a p-box.

One of the core problems of automated analysis of numerical programs has always been that of ‘dependence’. Imagine adding together two variables each in the range [-1,1]. Clearly their sum is in the range [-2,2]. But what if we knew, a priori, that these two variables were related somehow? For example in the expression X + (-X), which is clearly always zero. Ideally, an automated system should be able to produce a tighter result that [-2,2] for this! Over the years, many approaches to dealing with this issue have arisen, from very the very simple approach of affine arithmetic to the more complex semialgebraic techniques Magron, Donaldson and myself developed using sequences of semidefinite relaxations. In our CAV paper, we take the practical step of cutting-out regions of the resulting probability space with zero probability using modern SMT solver technology. Another interesting approach used in our paper is in the decision of which nonlinear dependences to keep and which to throw away for scalability reasons. Similar to my work with Magron, we keep first-order dependence on small rounding error variables but higher-order dependence on input program variables.

I am really excited by the end result: not only a wonderful blend of ideas from numerical analysis, programming languages, automated reasoning and hardware, but also a practical open-source tool people can use: https://github.com/soarlab/paf. Please give it a try!

Readers interested in learning more about the deeply fascinating topic of numerical properties of floating point would be well advised to read Higham’s outstanding book on the topic. Readers interested in the proofs of the theorems presented in our CAV paper should take a look at the extended version we have on arXiv. Those interested in some of the issues arising (in the worst case setting) when moving beyond straight-line code could consult this paper with Boland. Those interested in the history of this profoundly beautiful topic, especially in its links to linear algebra, would do well to read Wilkinson.

Scheduling with Probabilities

Readers of this blog may remember that Jianyi Cheng, my PhD student jointly supervised by John Wickerson, has been investigating ways to combine dynamic and static scheduling in high-level synthesis (HLS). The basic premise has been that static scheduling, when it works well due to static control, works very well indeed. Meanwhile, for programs exhibiting highly dynamic control flow, static scheduling can be very conservative, a problem addressed by our colleagues Lana Josipović, Radhika Ghosal and Paolo Ienne at EPFL. Together with Lana and Paolo, we developed a scheme to combine the best of both worlds, which we published at FPGA 2020 (and recently extended in IEEE Transactions on CAD). I blogged about this work previously here. We provided a tool flow allowing us to stitch large efficient statically-scheduled components into a dynamic circuit.

However, when scheduling a circuit statically, there are many design choices that can be made, typically to trade off time (throughput, latency) against area. So while our previous work was useful to stitch pre-existing statically-scheduled components into a dynamically-scheduled environment, we had no way of automatically designing those components to optimally fit the dynamic environment.

Enter Jianyi’s latest contribution – to be presented at FCCM 2021 next week.

In his paper “Probabilistic Scheduling in High-Level Synthesis”, Jianyi tackles this problem. He demonstrates that the dynamic environment, including data-dependent decisions and even load-store queues, can be adequately modelled using a Petri net formalism, and uses the PRISM model checker from Kwiatowska et al. to extract an appropriate initiation interval for each statically-scheduled component.

One of Jianyi’s Petri net models of some memory accesses.

The initiation intervals inferred by Jianyi’s tool can then be given to a commercial HLS tool – in our case Vitis HLS – to schedule each component. The components – together with any remaining dynamically-scheduled code – is then integrated using our previously published framework, producing the complete FPGA-ready design. The whole process provides a quality of result very close to an exhaustive search of possible initiation intervals, without having to perform multiple scheduling runs, and so in a fraction of the time.

Easter Coq

This Easter I set myself a little challenge to learn a little bit of Coq – enough to construct a proof of a simple but useful theorem in computer arithmetic. Long-time readers of this blog will know that this is not my first outing with dependent types, though I’ve never used them in anger. Four years ago – also during the Easter break! – I read Stump‘s book on Agda and spent some time playing with proofs and programming, as I documented here.

This blog post documents some of the interesting things in Coq I observed over the last few days. I’ve decided to write the majority of this post in Coq itself, below, before finishing off with some concluding remarks. In this way, anyone really interested can step through the definitions and proofs themselves.


(* 
 * A first datapath identity
 * George A. Constantinides, 2/4/21
 *
 * This is an attempt to learn some basic Coq by proving a standard identity used in computer arithmetic,
 * namely \bar{x} + 1 = \bar{x – 1}. 
 * 
 * This identity is useful because it allows Boolean operations to move through arithmetic operations.
 *
 * The code is for learning and teaching purposes only. It is not intended to be an efficient or elegant
 * approach, nor is it intended to make best use of existing Coq libraries. On the contrary, I have often used
 * many steps when one would do, so we can step through execution and see how it works.
 *)


Require Import Coq.Program.Equality.
Require Import Coq.Logic.Eqdep_dec.
Require Import Coq.Arith.Peano_dec.

(* Create my own bitvector type. It has a length, passed as a nat, and consists of bools. *)
Inductive bv : nat -> Set :=
| nilbv : bv 0
| consbv : forall n : nat, bool -> bv n -> bv (S n).

(* Head and tail of a bitvector, with implicit length arguments *)
Definition hdi {n : nat} (xs : bv (S n)) :=
  match xs with
  | consbv _ head _ => head
  end.

Definition tli {n : nat} (xs : bv (S n)) :=
  match xs with
  | consbv _ _ tail => tail
  end.

(* The basic carry and sum functions of a Boolean full adder *)
Definition carryfunc (a : bool) (b: bool) (c: bool) : bool :=
  orb (orb (andb a b) (andb a c)) (andb b c).

Definition sumfunc (a : bool) (b : bool) (c : bool) : bool :=
  xorb (xorb a b) c.

(*
 * A ripple carry adder, with implicit length argument
 * Note that this definition makes use of a trick known as the ‘convoy pattern’ [1]
 * to get the dependent typing to work in a match clause. We use a ‘return’ clause
 * to make the type of the match result be a function which is then applied to the unmatched
 * argument. In this way the type system can understand that x and y have the same dependent type.
 * Note also the use of Fixpoint for a recursive definition.
 *)

Fixpoint rcai {n : nat} (x : bv n) (y : bv n) (cin : bool) : (bool * bv n) :=
  match x in bv n return bv n -> ( bool * bv n ) with
  | nilbv => fun _ => (cin, nilbv) (* an empty adder passes its carry in to its carry out *)
  | consbv n1 xh xt => fun y1 =>
       let (cout, sumout) := rcai xt (tli y1) (carryfunc cin xh (hdi y1)) in
                        (cout, consbv n1 (sumfunc cin xh (hdi y1)) sumout)
  end y.

(* We define addition modulo 2^n by throwing away the carry out, using snd, and then define an infix operator *)
Definition moduloadder {n : nat} (x : bv n) (y : bv n) : (bv n) :=
  snd (rcai x y false).

Infix “+” := moduloadder.

(* Bitwise negation of a word *)
Fixpoint neg {n : nat} (x : bv n) : (bv n) :=
  match x with
  | nilbv => nilbv
  | consbv n1 xh xt => consbv n1 (negb xh) (neg xt)
  end.

(* The word-level constant zero made of n zeros *)
Fixpoint bvzero {n : nat} : (bv n) :=
  match n with
  | O => nilbv
  | (S n1) => consbv n1 false bvzero
  end.

(* The word-level constant one with n leading zeros *)
Definition bvone {n : nat} :=
  consbv n true bvzero.

(* Additive inverse of a word, defined as ‘negate all the bits and add one’  *)
Definition addinv {n : nat} (x : bv (S n)) : (bv (S n)) :=
  neg(x) + bvone.

(* Subtraction modulo 2^n is defined as addition with the additive inverse and given its own infix operator *)
Definition modulosub {n : nat} (x : bv (S n)) (y : bv (S n)) :=
  x + (addinv y).

Infix “-” := modulosub.

(* a bit vector of just ones *)
Fixpoint ones {n : nat} : (bv n) :=
  match n with
  | O => nilbv
  | S n1 => consbv n1 true ones
  end.

(* OK, now we have some definitions, let’s prove some theorems! *)

(* Our first lemma (‘Lemma’ versus ‘Theorem’ has no language significance in Coq) says that inverting a
 * bitvector of ones gives us a bitvector of zeros.
 * There’s a couple of interesting points to note even in this simple proof by induction:
 * 1. I had to use ‘dependent destruction’,
 *    which is defined in the Coq.Program.Equality library, to get the destruction of variable x to take into account
 *    the length of the bitvector.
 * 2. The second use of inversion here didn’t get me what I wanted / expected, again due to dependent typing, for
 *    reasons I found explained in [2]. The solution was to use a theorem inj_pair_eq_dec, defined in 
 *    Coq.Logic.Eqdep_dec. This left me needing to prove that equality on the naturals is decidable. Thankfully,
 *    Coq.Arith.Peano_dec has done that.
 *)


Lemma invertzeros : forall {n : nat} (x : bv n),
  x = bvzero -> neg x = ones.
Proof.
  intros n x H.
  induction n.
  dependent destruction x.
  auto. (* base case proved *)
  dependent destruction x.
  simpl.
  f_equal.
  simpl bvzero in H.

  inversion H.
  reflexivity.

  simpl bvzero in H.
  inversion H. (* inversion with dependent type starts here…          *)
  apply inj_pair2_eq_dec in H2. (* goes via this theorem                                 *)
  2: apply eq_nat_dec. (* and completes via a proof of decidability of equality *)

  apply IHn.
  apply H2.
Qed.

(* 
 * The next lemma says that if you fix one input to a ripple carry adder to zero and feed in the carry-in as zero
 * too, then the carry out will not be asserted and the sum will just equal the fixed input.
 * I proved this by induction, reasoning by case on the possible Boolean values of the LSB.
 * The wrinkle to notice here is that I didn’t know how to deal with a ‘let’ clause, but thanks to Yann Herklotz
 * (https://yannherklotz.com) who came to my aid by explaining that a ‘let’ is syntactic sugar for a match.
 *)


Lemma rcai_zero: forall (n : nat) (x : bv n),
  rcai x bvzero false = (false, x).
Proof.
  intros n x.
  induction n.
  dependent destruction x.
  auto. (* base case proved *)
  dependent destruction x.
  simpl bvzero.
  simpl rcai.
  destruct b.
  unfold sumfunc. simpl.
  unfold carryfunc. simpl.

  destruct (rcai x bvzero false) eqn: H.
  f_equal.

  rewrite IHn in H.
  inversion H.
  reflexivity.

  rewrite IHn in H.
  inversion H.
  f_equal.

  unfold sumfunc. simpl.
  unfold carryfunc. simpl.

  destruct (rcai x bvzero false) eqn: H. (* The trick Yann taught me *)
  f_equal.

  rewrite IHn in H.
  inversion H.
  reflexivity.

  rewrite IHn in H.
  inversion H.
  f_equal.
Qed.

(* 
 * The next lemma proves that -1 is a vector of ones
 * One thing to note here is that I needed to explicitly supply the implicit argument n to addinv using @.
 *)

Lemma allones: forall {n : nat}, @addinv n bvone = ones.
Proof.
  intros n.
  induction n.
  auto. (* base case proved *)

  simpl.
  unfold bvone.
  unfold addinv.
  simpl.
  unfold bvone.
  unfold “+”.

  simpl.

  unfold carryfunc.
  simpl.
  unfold sumfunc.
  simpl.

  destruct (rcai (neg bvzero) bvzero false) eqn: H.
  simpl.

  f_equal.
  f_equal.

  rewrite rcai_zero in H.
  inversion H.

  apply invertzeros.
  reflexivity.
Qed.

(* 
 * This lemma captures the fact that one way you can add one to a bitvector using a ripple carry adder is
 * to add zero and assert the carry in port. 
 *)

Lemma increment_with_carry : forall (n : nat) (x : bv (S n)),
  x + bvone = snd (rcai x bvzero true).
Proof.
  intros n x.
  dependent destruction x.

  (* first peel off the LSB from the two operands *)

  simpl bvzero.
  simpl rcai.

  unfold bvone.
  unfold “+”.
  simpl rcai.

  (* now case split by the LSB of x to show the same thing *)

  destruct b.

  unfold carryfunc.
  simpl.
  unfold sumfunc.
  simpl.
  reflexivity.

  unfold carryfunc.
  simpl.
  unfold sumfunc.
  simpl.
  reflexivity.
Qed.

(* This lemma says that if you add a vector of ones to a value x using a ripple carry adder, while asserting the
 * carry in port, then the sum result will just be x. Of course this is because -1 + 1 = 0, though I didn’t prove
 * it that way.
 * A neat trick I found to use in this proof is to use the tactic ‘apply (f_equal snd)’ on one of the hypotheses
 * in order to isolate the sum component in the tuple produced by the ripple carry function rcai.
 *)

Lemma rcai_ones_cin_identity : forall (n : nat) (x : bv n),
  snd (rcai x ones true) = x.
Proof.
  intros n x.
  induction n.
  dependent destruction x.
  simpl.
  reflexivity.
  dependent destruction x.
  simpl ones.
  simpl rcai.

  (* case analysis *)
  destruct b.
  unfold carryfunc.
  unfold sumfunc.
  simpl.
  destruct (rcai x ones true) eqn: H.
  simpl.
  f_equal.
  apply (f_equal snd) in H. (* a neat trick *)
  simpl in H.
  rewrite IHn in H.
  auto.

  unfold carryfunc.
  unfold sumfunc.
  simpl.
  destruct (rcai x ones true) eqn: H.
  simpl.
  f_equal.
  apply (f_equal snd) in H.
  simpl in H.
  rewrite IHn in H.
  auto.
Qed.

(* 
 * This lemma is actually the main content of what we’re trying to prove, just not wrapped up in
 * very readable form yet.
 * Note the use of ‘rewrite <-‘ to use an existing lemma to rewrite a term from the RHS of the equality
 * in the lemma to the LHS. Without the ‘<-‘ it would do it the other way round.
 *)

Lemma main_helper : forall (n : nat) (x : bv (S n)),
  neg (x + ones) = neg x + bvone.
Proof.
  intros n x.
  induction n.
  dependent destruction x.
  destruct b.
  dependent destruction x.
  auto.
  dependent destruction x.
  auto. (* base case proved *)

  dependent destruction x.
  simpl.
  unfold bvone.
  unfold “+”.
  simpl rcai.

  destruct b.
  unfold carryfunc.
  unfold sumfunc.
  simpl.

  rewrite rcai_zero.

  destruct (rcai x (consbv n true ones) true) eqn: H.
  simpl neg.
  simpl snd.
  f_equal.
  f_equal.

  apply (f_equal snd) in H.
  simpl snd in H.
  rewrite rcai_ones_cin_identity in H.
  auto.

  unfold carryfunc.
  unfold sumfunc.
  simpl.

  destruct (rcai (neg x) (consbv n false bvzero)) eqn: H.
  apply (f_equal snd) in H.
  simpl snd in H.

  rewrite <- increment_with_carry in H.

  simpl snd.

  destruct (rcai x (consbv n true ones) false) eqn: H1.
  simpl snd.
  simpl neg.
  f_equal.

  apply (f_equal snd) in H1.
  simpl snd in H1.

  rewrite <- H1.
  rewrite <- H.

  apply IHn.
Qed.

Theorem main_theorem: forall (n : nat) (x : bv (S n)),
  neg x + bvone = neg (xbvone).
Proof.
  intros n x.
  unfold “-“.
  rewrite allones.
  rewrite <- main_helper.
  reflexivity.
Qed.

(* 
 * References
 * [1] http://adam.chlipala.net/cpdt/html/MoreDep.html
 * [2] https://stackoverflow.com/questions/24720137/inversion-produces-unexpected-existt-in-coq&nbsp;
 *)

Some Lessons

So what have I learned from this experience, beyond a little bit of Coq? Firstly, it was fun. It was a nice way to spend a couple of days of my Easter holiday. I am not sure I would want to do it under time pressure, though, as it was also frustrating at times. If I ever wanted to use Coq in anger for my work, I would want to take a couple of months – or more – to really spend time with it.

On the positive side, Coq really forced me to think about foundations. What do I actually mean when I write \overline{x} + 1 = \overline{x - 1}? Should I be thinking in {\mathbb Z}, in {\mathbb Z}/n\mathbb{Z}, or in digits, and when? How should bitvector arithmetic behave on zero-sized bitvectors? (Oh, and I certainly did not expect to be digging out a proof of decidability of natural equality from Coq’s standard library to prove this theorem!) The negative side is the same: Coq really forced me to think about foundations. And I remain to be convinced that I want to do that when I’m not on Easter holiday and in a philosophical mood.

I loved the type system and the expression of theorems. I’m luke warm about the proof process. At least the way I wrote the proofs – which was probably intolerably amateur – it felt like someone could come along and change the tactics at some point and my proof would be broken. Maybe this is not true, but this is what it felt like. This was a different feeling to that I remember when playing with Agda four years ago, which felt like everything needed to be explicit but somehow felt more nailed down and permanent. In Agda, the proofs are written in the same language as the types and I enjoyed that, too. Both languages are based on dependent types, and so as – I understand – is Lean. My colleague Kevin Buzzard is a strong advocate of Lean. Perhaps that’s one for another Easter holiday!

Thinking about this proof from a hardware perspective – designing efficient bit-parallel arithmetic hardware – it is clear that we do not need to have proved the theorem for all n. Each bit slice occupies silicon area, and as this is a finite resource, it would be sufficient to have one proof for each feasible value of n. Of course, this makes things much easier to prove, even if it comes with much more baggage. I can fire up an SMT solver and prove the theorem completely automatically for a specific value of n. As an example, if you paste the code below into the Z3 prover (hosted at rise4fun), the solver will report unsat, i.e. there is provably no satisfying value of the variable x violating the theorem for n = 4.

(declare-fun x () (_ BitVec 4))
(assert (not (= (bvadd (bvneg x) #x1) (bvneg (bvadd x #xF)))))
(check-sat)
(exit)

There are pluses and minuses to this. On the plus side, the SMT query is fast and automatic. On the minus side, in addition to only being valid for n = 4, it gives me – and perhaps some future AI – none of the intuition as to why this theorem holds. When I read mathematics, the proofs are not incidental, they are core to the understanding of what I’m reading.

Will this also be true for future AI-driven EDA tools?


Notes

In case this is useful to anyone (or to me in the future): I got syntax highlighting playing well for Coq with WordPress.com by using coqdoc to generate HTML and CSS, then hacking at the CSS so that it didn’t affect the rest of my WordPress theme, pasting it into the WordPress.com CSS customiser, and putting the generated HTML in a WordPress.com HTML block. Take care to avoid the CSS class .comment, used by coqdoc for code comments but also used by WordPress for blog post comment formatting!

Thanks again to Yann Herklotz for help understanding let bindings in Coq.

False Negatives and False Positives

Yesterday, I posted some comments on Twitter regarding the press focus on lateral flow test (LFT) false positives on the eve of the return of most students to school in England. It seems that I should probably have written a short blog post about this rather than squeezing it into a few tweets, given the number of questions I’ve had about the post since then. This is my attempt.

The press seem to be focusing on the number of false positives we are likely to see on return to school and the unnecessarily lost learning this may cause. My view is that given the current high case numbers and slow rate of decline, this is not the primary issue we should be worried about. Primarily, we should be worried about the false negative rate of these tests. My concern is that the small number of true positives caught by these tests may have less impact on reducing the rate of infection than behavioural relaxation induced by these tests has on increasing the rate of infection. Time will tell, of course.

Let me explain some of the data that has been released, the conclusions I draw from it, and why false positive rate is important for these conclusions regarding false negative rates.

For any given secondary-age pupil, if given both a LFT and a PCR test, excluding void tests there are four possible outcomes: LFT+ & PCR+ (LFT positive & PCR positive), LFT+ & PCR-, LFT- & PCR+ and LFT- & PCR-. We can imagine these in a table of probabilities, as below.

LFT+LFT-Total
PCR+??0.34% to 1.45%
PCR-???
Total~0.19%?100%

Here the total LFT+ figure of 0.19% comes from the SchoolsWeek article for secondary school students based on Test and Trace data for late February, while the total PCR+ figure is the confidence interval provided in the REACT-1 9b data from, which says “In the latter half of round 9 (9b), prevalence varied from 0.21% (0.14%, 0.31%) in those aged 65 and over to 0.71% (0.34%, 1.45%) in those aged 13 to 17 years.” Note that REACT-1 9b ran over almost the same period as the Test and Trace data on which the SchoolsWeek article is based.

What I think we all really want to know is what is the probability that a lateral flow test would give me a negative result when a PCR test would give me a positive result? We cannot get this information directly from the table, but we can start to fill in some of the question marks. Clearly the total LFT- probability will be 100% – 0.19% = 99.81%, and the total PCR- probability will be 98.55% to 99.66%. What about the four remaining question marks?

Let’s consider the best case specificity of these tests: that all the 0.19% of LFT+ detected were true positives, i.e. the tests were producing no false positives at all. In that case, the table would look something like this:

LFT+LFT-Total
PCR+~0.19%~0.15% to ~1.26%0.34% to 1.45%
PCR-0.00%98.55% to 99.66%98.55% to 99.66%
Total~0.19%~99.81%100%

Under these circumstances, we can see from the table that the LFTs are picking up 0.19% out of 0.34% to 1.45%, so we can estimate that the most they’re picking up is 0.19/0.34 = 56% of the true positive cases.

However, this assumed no false positives at all, which is a highly unrealistic assumption. What if we consider a more realistic assumption on false positives? The well-cited Oxford study gives a confidence interval for self-trained operatives of 0.24% to 0.60% false positives. Note that the lower end of this confidence interval would suggest that we should see at least 0.24% x 98.55% = ~0.24% of positive LFTs just from false positives alone. This is a higher value than the LFT positive rate we saw over this period of 0.19% (as noted by Deeks here). So this means it’s also entirely feasible that none of the LFT+ results were true positives, i.e. the results table could look more like this:

LFT+LFT-Total
PCR+0%0.34% to 1.45%0.34% to 1.45%
PCR-~0.19%~98.36% to ~99.47%98.55% to 99.66%
Total~0.19%~99.81%100%

Now this time round, we can see that the tests are picking up 0% out of 0.34% to 1.45%, so we can estimate that they’re picking up 0% of the true positive cases (i.e. 100% false negatives).

This is why I think we do need to have a conversation about false positives. Not because of a few days of missed school, as reported in the press, but because hiding behind these numbers may be a more significant issue of a much higher false negative rate than we thought, leading to higher infections in schools as people relax after receiving negative lateral flow tests.

Perhaps most importantly, I think the Government needs to commit to following the recommendations of the Royal Statistical Society, which would enable us to get to the bottom of exactly what is going on here with false positive and false negative rates.

(Note that I have assumed throughout this post that LFTs are being used as a ‘quick and easy’ substitute for PCRs, so that the ideal LFT outcome is to mirror a PCR outcome. I am aware that there are those who do not think this is the case, and I am not a medical expert so will not pass further comment on this issue.)

GCSEs and A-Levels in 2021: Current Plans

This week, a flurry of documents were released by Ofqual and the Department for Education in response to the consultation over what is to replace GCSEs and A-Levels in 2021 I blogged about previously.

In this post, I examine these documents to draw out the main lessons and evaluate them against my own submission to the consultation, which was based on my earlier post. The sources of information I draw on for this post are:

  1. A letter (‘Direction’) from the Secretary of State for Education to Ofqual sent to Ofqual on the 23rd February (and published on the 25th).
  2. Guidance on awarding qualifications in Summer 2021 published on the 25th February.
  3. Ofqual’s new Consultation on the general qualifications alternative awarding framework published on the 25th February, and to close on 11th March.

In my post on 16th January, I concluded that the initial proposals were complex and ill-defined, with scope to produce considerable workload for the education sector while still delivering a lack of comparability. The announcements this week from the Secretary of State and Ofqual have not helped allay my fears.

Curriculum Coverage

The decision has been made by Government that “teachers’ judgements this year should only be made on the content areas that have been taught.” However, the direction from the Secretary of State also insists that “teachers should assess as much course content as possible to ensure in the teachers’ judgement that there has been sufficient coverage of the curriculum to enable progression to further education, training, or employment, where relevant.”

Presumably, these two pieces of information are supposed to be combined to form a grade, or at least the latter modulates the decision over whether a grade is awardable. However, in what way this should happen is totally opaque. Let’s assume we have Student A who has only covered a small part of the GCSE maths curriculum, in the judgement of teachers ‘insufficient to enable progression to further education’ (In what? A-Level maths, or something else? Surely the judgement may depend on this.) However, Student A is ‘performing at’ a high grade (say Grade 8) in that small part of the curriculum. What do they get? A Grade 8? Lower than a Grade 8? No grade? How well will they do compared to Student B who has broad curriculum coverage, but little depth?

The issue of incomplete curriculum coverage has not nearly been addressed by these proposals.

The Return of Criterion Referencing

Several of us made the point in our submissions to the consultation that GCSE grades are norm-referenced, not criterion-referenced (with the exception of Grade 8, 5 and 2, as per the Grade Descriptors). As a result, if national comparability of exams is to be removed, then solid grade descriptors would need to be produced. Several of us suggested that the existence of so many GCSE grades, with such a low level of repeatability between assessors (see, e.g. Dennis Sherwood’s articles on this topic), suggests that grades should be thinned out this year. It seems that the Government agrees with this principle, as they have directed Ofqual to produce ‘grade descriptors for at least alternate grades’. This is good in as far as it goes, but if grade descriptors for only alternate grades are produced – quite rightly – then surely only alternate grades should be awarded!

In addition, if grade descriptors are to be useable no matter what narrow range of the curriculum has been covered, they will necessarily be very broad in nature, further narrowing the precision with which such judgements can be made. I await exemplar grade descriptors with some trepidation.

As I pointed out in my submission, calling 2021 and 2020 results GCSEs and A-Levels just misleadingly suggests comparability to previous years – better to wipe the slate clean and call them something else.

Fairness to Students: Lack of Comparability

One of my main concerns in the original proposal – not addressed in the final outcome – is that of fairness between centres and even between individual pupils in a centre. Centres will be allowed to use work conducted under a wide variety of circumstances, and while they are supposed to sign off that they are ‘confident’ that this work was that of the student without ‘inappropriate’ levels of support, I am not clear how they are supposed to gain that confidence for work conducted during lockdown, for example – which is explicitly allowed. There are many opportunities for unfairness here even within a centre.

Now if we bring in different centres having covered different parts of the curriculum and using different methodologies for quality assurance, the scope for unfairness is dramatic. The problem for students is that such unfairness will be much harder to identify come Summer compared to 2020, when ‘mutant algorithms’ could be blamed.

It seems odd to me that centres are asked to “use consistent sources of evidence for a class or cohort” yet there does not appear to be any reasonable attempt to maintain that consistency between cohorts at different centres. Exam boards will be asked to sample some subjects in some centres to check grades, but given the points I raise above over curriculum coverage it is very unclear how it will be possible to reach the conclusion that the incorrect grade has been awarded in all but the most extreme cases.

The guarantee of an Autumn exam series (as in 2020) is, however, to be welcomed.

Kicking the Can to Exam Boards

It is now the job of exam boards to come up with “a list of those sources of and approaches to collecting evidence that are considered most effective in determining grades” by the end of March. Good luck to them.

Exam boards must also undertake checks of all centres’ internal quality assurance processes before grades are submitted to them on the 18th of June. It is unclear what these checks will entail – I find this hard to imagine. Schools: please do share your experience of this process with me.

Teacher Workload

One of the big concerns I had in my original consultation response was over the increase in teacher workload. One aspect of this has been addressed: unlike in the draft proposals, teachers will no longer be responsible for appeals. However, there is still a very considerable additional workload involved in: getting to grips with the assessment materials released by the exam boards, developing an in-house quality assurance system, getting that agreed by the exam board, making an assessment of students, showing the students the evidence on which this assessment is based (this is a requirement), and submitting the grades, all over the window 1st April 2021 – 18th June 2021. I asked in my consultation response whether additional funding will be made available for schools, e.g. to provide cover for the release time required for this work. No answer has been forthcoming.

The development of an in-house quality assurance system is non trivial. The proposed GQAA framework imposes the requirements for such a system to have:

  • ‘a set policy on its approach to making judgements in relation to each Teacher Assessed Grade, including how Additional Assessment Materials and any other evidence will be used,’
  • ‘internal arrangements to standardise the judgements made in respect of the Centre’s Learners and a process for internal sign-off of each Teacher Assessed Grade,’
  • ‘a comparison of the Teacher Assessed Grades to results for previous cohorts at the Centre taking the same qualification to provide a high-level cross-check to ensure that Teacher Assessed Grades overall are not overly lenient or harsh compared to results in previous years,’
  • ‘specific support for newly qualified Teachers and Teachers less familiar with assessment, and
  • ‘a declaration by the head of Centre.’

The third bullet point seems logically impossible to achieve. Results this year will not be comparable to previous years as they will be using a different system based on different evidence. So there appears to be no way to check whether TAGs (not CAGs this year!) are comparable to those in previous years.

Private Candidates

Private candidates got a really bad deal last year. This year we are told that private candidates “should be assessed in a similar way to other students”, but that this will be “using an adapted range of evidence”. I’m not completely convinced that these two statements are logically consistent. It will be interesting to hear the experience of private candidates this year.

What about 2022?

It is unfortunate that schools will be left with such a narrow window of time, so we must start to think about 2022 right now. However, I note that in the current Consultation on the General Qualifications Alternative Awarding Framework, there is scope for whatever is decided now to bleed into 2022: ”We have not proposed a specific end date for the framework because it is possible some measures will be required for longer than others. Instead we propose that the GQAA Framework will apply until we publish a notice setting an end date.”

All the more reason to get it right now.