Put Your Resource Where it’s Needed!

We all know that we have finite resources to accomplish a variety of tasks. Some tasks are easier than others, so we tend to spend less time and energy on them. Yet – when we survey the landscape of traditional neural networks – this observation does not apply to them. Take a convolutional neural network classifying pictures: does it do any less work (floating point operations, perhaps) if the picture is “obviously a cat” than if it’s a “cat hiding in the bushes”? Traditionally, the answer is a clear “no”, because there is no input dependence in the control-flow of the inference algorithm: every single image is processed in exactly the same way and takes exactly the same work.

But there are people who have looked to address this fundamental shortcoming by designing algorithms that put in less work if possible. Chief amongst these techniques is an approach known as “early exit neural networks”. The idea of these networks is that the network itself generates a confidence measure: if it’s confident early on, then it stops computing further and produces an early result.

My student Ben Biggs (jointly advised by Christos Bouganis) has been exploring the implications of bringing FPGAs to automatically accelerate such networks. There exist several good tool flows (e.g. fpgaConvNet, HPIPE) for automatically generating neural network implementations on FPGAs, but to the best of our knowledge, none of them support such dynamic control flow. Until now. Next week at FCCM 2023, Ben will be presenting our paper ATHEENA: A Toolflow for Hardware Early-Exit Network Automation.

The basic idea of ATHEENA is that early exit networks come in chunks, each of which is utilised a certain proportion of the time (e.g. 10% of images are hard to classify, 90% are not). So we make use of an existing neural network generator to generate each chunk, but allocate different FPGA resources to the different parts of the network, in order to maintain a steady throughput with no back pressure build-up points in the inference pipeline.

This principle is illustrated below. Imagine that the first stage of a network executes no matter what the input. Then I can create a throughput / area Pareto curve shown on the top left. Equally, I can create a throughput / area Pareto curve for the second stage, and then scale up the throughput achievable by a function of how frequently I actually need to use that stage: if it’s only needed 10% of the time, then I can support a classification rate 10x higher than the nominal throughput of the design returned by a standard DNN generator. By combining these two throughput / area curves for the same nominal throughput, I get an allocation of resources to each part of the network. Of course, if the nominal proportion p of ‘hard’ samples I used to characterise the network varies in practice from the actual proportion q, then I may end up with a somewhat different behaviour, as indicated by the purple region.

In practice, it turns out that this works really well. The graph below from Ben’s paper shows that on a real example, running on a real FPGA board, he’s able to improve throughput by more than 2x for the same area or reduce area by more than half for the same throughput.

I’m delighted that Ben’s work has been nominated for the best paper prize at FCCM this year and that it has received all three reproducibility badges available: Open Research Objects, Research Objects Reviewed and Results Reproduced, indicating Ben’s commitment to high quality reproducible research in our field.

If you plan to be in Los Angeles next week, come and hear Ben talk about it!

On a definition of Frege

Over the last few weekends I’ve been reading Frege’s “The Foundations of Arithmetic”, a wonderful book mixing philosophy, logic and mathematics from 1884. I hope I will find time to post on the book more generally, but for now I was intrigued enough by this interesting passage to want to play a little with it.

Frege writes:


The proposition

“if every object to which x stands in the relation to \phi falls under the concept F, and if from the proposition that d falls under the concept F it follows universally, whatever d may be, that every object to which d stands in the relation to \phi falls under the concept F, then y falls under the concept F, whatever concept F may be”

is to mean the same as

y follows in the \phi-series after x

G. Frege, “The Foundations of Arithmetic”, Section 79.

Let’s write “y follows in the \phi-series after x” as (x,y) \in G. Then I read Frege’s definition as:

(x,y) \in G iff

\forall F. (\forall a. (x,a) \in \phi \to F(a)) \wedge (\forall d. F(d) \to (\forall e. (d,e) \in \phi \to F(e))) \to F(y)

Frege’s definition appears to be a characterisation of the transitive closure of \phi, that is the smallest transitive relation containing \phi. Or is it? Perhaps this is not immediately apparent. I thought it might be a fun exercise to formalise this and explore it in Coq, a formal proof assistant I’ve played with a little in the past.

I enclose the Coq code I wrote below (note I am very much an amateur with Coq) that shows that Frege’s definition, once tying down an object universe ({\mathtt A} in the code) indeed (i) contains \phi, (ii) is transitive and (iii) is contained in any other transitive relation containing \phi.

I hope you enjoy.

(* Coq analysis of a definition of Frege                                                  *)
(* Formalises Frege’s ‘follows’ definition and proves it equivalent to transitive closure *)
(* George A. Constantinides, 25/03/23                                                     *)

Require Import Coq.Relations.Relation_Definitions. (* for inclusion and transitive *)
Require Import Coq.Relations.Relation_Operators.

Variable A : Type.
Variable phi : AAProp.

Definition FregeFollows (x : A) (y : A) :=
  (F : AProp),
  ( (a : A), (phi x a) → (F a)) ∧
  ( (d : A), (F d) → ( (e : A), (phi d e) → (F e)))
  → (F y).

(* We will separately show:                                                    *)
(* 1. phi is included in FregeFollows                                          *)
(* 2. FregeFollows is transitive                                               *)
(* 3. Amongst all relations meeting conditions 1 & 2, FregeFollows is minimal. *)

(* 1. phi is included in Frege’s relation *)

Theorem IncludesPhi : inclusion A phi FregeFollows.
Proof.
  unfold inclusion.
  intros x y.
  unfold FregeFollows.
  intros H F.
  intros H1.
  destruct H1. (* we only need base case *)
  apply (H0 y).
  auto.
Qed.

(* 2. Frege’s relation is transitive *)

Theorem IsTransitive : transitive A FregeFollows.
Proof.
  unfold transitive.
  intros x y z.
  unfold FregeFollows.
  intros H0 H1.
  intros F.
  specialize (H0 F).
  specialize (H1 F).
  intros H2.

  apply H1.

  pose proof H2 as G. (* keep a copy of H2 to use to prove F y *)

  apply H0 in G.

  destruct H2.
  split.
  2: assumption.

  intros a.
  intros H3.

  specialize (H2 y).

  specialize (H2 G).
  specialize (H2 a).
  specialize (H2 H3).
  assumption.

Qed.

(* 3. Amongst all transitive relations including phi, Frege’s relation is minimal *)

Theorem IsMinimal : R : relation A,
  (inclusion A phi R) ∧ (transitive A R) → (inclusion A FregeFollows R).
Proof.
  intros R.
  intros H.
  unfold inclusion.
  intros x y.
  unfold FregeFollows.
  intros H1.

  specialize (H1 (R x)). (* Choose F to be R(x,.) *)
  apply H1.

  split.

  destruct H.
  unfold inclusion in H.

  specialize (H x).
  1: assumption.

  intros d Rxd e.

  destruct H.
  unfold transitive in H0.
  specialize (H0 x d e).
  intros.
  apply H0.
  1: assumption.

  unfold inclusion in H.
  specialize (H d e).

  apply H.
  assumption.
Qed.

Industry & Academia Unite!

I’ve just returned from a very busy week in California visiting Intel and AMD in Folsom before chairing the SpatialML annual meeting in Monterey and then attending the first in-person ACM FPGA conference, also in Monterey. It was a great trip, and I think the theme for me was just how well the tech industry and academia can work together, building on each other’s strengths.

Industry Visits

I was delighted to have the first chance to pay an in-person visit to the team built by my former PhD student Theo Drane at Intel in Folsom, where he has been able to gather an exceptionally strong and diverse mix of skills. Intel is sponsoring my PhD student Sam Coward (far left of photo) who also a member of staff in the group. Together we have been working on some really exciting new ways to optimise datapath designs using word-level rewrite rules. While on the trip, we learnt that Sam’s DAC paper has been accepted, so there will be more I will want to say publicly about the development of this work soon!

With the team at Intel

Also in Folsom, I was able to meet members of the graphics team at AMD, who have recently agreed to support two new PhD students in my lab working on the interaction between microarchitecture and machine learning. I expect them to start in October 2023, with one co-supervised by Deniz Gündüz at Imperial and one co-supervised by Wayne Luk, also at Imperial. We’re planning exciting work, so watch this space.

SpatialML

In 2019 I was awarded a grant by EPSRC to bring together Imperial and Southampton (lead there Geoff Merrett) in the UK with Toronto (Anderson, Betz, Chow) and UCLA (Cong, Ercegovac) and industry (Imagination, Xilinx, Maxeler, NVIDIA, Google, Intel, Arm & Corerain) to reimagine the full stack of machine learning hardware when computing across space, not just time. You can visit the centre website here and see an up-to-date feed of our work on Twitter. Due to COVID, the last time we were able to hold an all-hands meeting was just after kick off, in February 2020, so it was truly a delight to be able to do so again this year. In the meantime, we have been very active and also expanded the centre to include Cornell & Cornell Tech (Abdelfattah, Zhang) as well as Sydney (Boland, Leong) and Samsung AI, also since then Maxeler has been acquired by Groq and Xilinx by AMD. Whereas in 2020, I had primarily focused the workshop on faculty and industry talks, this year I re-focused on hearing the progress that various PhD researchers had made since 2020 as well as very successful 2min lightning talks from all participants to aid networking and build a solid foundation for our researcher exchange programme. In addition, the new participants were given the chance to highlight their work in extended talks.

Members of the EPSRC-funded International Centre for Spatial Computational Learning

We had the following speakers:

At the meeting dinner, we were also able to celebrate the announcement that our member, Jason Cong, has just won the EDAA Achievement Award, to be presented at DATE 2023. We were able to end the workshop with a productive session, planning collaboration and researcher exchange between our sites over the coming year.

FPGA 2023

The ACM International Symposium on FPGAs (known as ‘FPGA’) is one of the key dates in my calendar. As a former chair of the conference, I sit on its steering committee, and I have always been a great supporter of the genuine community around this conference, combining a strong academic and industrial presence. It was wonderful to be back in person after two years of virtual FPGAs.

This year two of my collaborators were chairing: Zhiru Zhang (Cornell) was Program Chair and Paolo Ienne (EPFL) was General Chair. They put together a great event – you can read the program here. There were many highlights, but I would particularly mention the paper by my collaborator Lana Josipović and her PhD student Jiahui Xu and collaborators on the use of model checking to remove overhead in dynamically-scheduled HLS, closely related to my student Jianyi’s stream of work which I’ve described in other blog posts. I also had the privilege to serve on the best paper award panel this year, alongside Deming Chen and Viktor Prasanna, an award that went to some work from Jane Li‘s group for their development of a plug-and-play approach to integrating SSD storage into high-level synthesis, presented by the lead author Linus Wong, during the session I chaired on “High-Level Abstraction and Tools”.

It was a personal delight to see so many of my former PhD students, and the definite highlight was getting to meet the first child of one of my former students, born over the period I’ve been out of the loop due to COVID. My former student Sam Bayliss was in attendance and presenting a tutorial on programming the AMD AI Engines developed at using MLIR. My former student Kan Shi (Chinese Academy of Sciences) was also presenting his work together with my other former student David Boland (Sydney) and my current undergrad student Yuhan Diao (Imperial College) on on-chip debugging. Kan is also a part-time YouTuber, and his recent YouTube interview with me was mentioned to me by several attendees – I suspect it has had far wider reach than my publications!

Some of the Imperial College staff, students and alumni attending FPGA 2023

Overall, I have come away from the trip energised from the technical conversations, with far too many ideas for collaboration than I’ll be able to follow up on, and with an even further strengthened belief in the immense promise of industrial / academic collaboration in my field.

Probably Approximately Correct

This Christmas, one of the books I read during my holiday was Leslie Valiant’s book Probably Approximately Correct, if not quite a truly general readership-oriented book, then certainly one suitable for anyone with a scientific background and equally suitable for adults and teens.

Valiant needs no introduction to those with a background in computer science – for those outside the field, you can read about his deep contributions on his Wikipedia page. Inspired by the work of Turing, in this book (from 2013) he presents some wide-ranging thoughts on the Probably Approximately Correct (PAC) framework for understanding learning kicked off by his paper from 1984, presenting the concept as fundamental to learning as it occurs naturally. He deals with both “learning of the species, through evolution” and the more commonly considered “learning of the individual” within the same algorithmic framework – which he calls “ecorithms” – identifying the weaker learning behaviours possible in the former setting.

This is a great book, full of ideas and interesting philosophical discussion points. The key exciting elements to me relate to the use of computational complexity arguments to shed light on the different classes of function that can be learnt in a resource-constrained setting, as per the original PAC technical work, and the broader philosophical conclusions Valiant draws from this. Some particular ideas that stood out to me as worthy of further thought (my own list, Valiant does not prioritise these ideas):

  • The modularity of biological systems as a side-effect of the limited learnability (through evolution) of large-scale non-modular systems
  • The idea that “artificial” in “artificial intelligence” should not be considered as defining “the material basis of the realisation” (i.e. an intelligence may be natural or artificial in a computer, for example), but rather in the way knowledge is obtained from the environment. The key distinction for Valiant is whether we are trying to reproduce learning the same way this happens in nature (this not “artificial” for Valiant) or whether we are trying to do something different, e.g. shortcutting evolutionary processes (imagine a classical Chess-playing AI).
  • The idea that unsupervised learning is not a natural phenomenon, because the training is present even in the natural state, e.g. via evolutionary fitness for evolution-as-learning.
  • Some interesting thoughts on the role of ecorithms in the development of human culture and knowledge, and how humans have tipped the balance from evolutionary learning to stronger forms of algorithmic learning through their ability to pass on culture and knowledge to future generations.

In addition to Valiant’s ideas, a few thoughts that struck me while reading the book, which I wonder whether others have considered in more depth:

  • During our work together on resource-constrained model-predictive control, my friend and colleague Eric Kerrigan would often say to me that the best resource-constrained model for an open-loop system may be very different to the best resource-constrained model for that same system in a closed-loop setting for control purposes. It strikes me that the notion of fitness used by Valiant to pose evolvability as a learning might be further enriched by thinking along these lines, especially as the species being evolved changes the environment within which it lives.
  • Valiant argues that questions such as “can a computer have free will?” and “is a termite conscious?” are meaningless “because the concepts the words describe have meanings only for the distribution of examples from which they have been PAC learned”. This seems a step too far for me, not least because many of the most creative leaps I’ve seen in thought have come from applying ideas in one area to a completely distinct one. It would be useful, I think, to make the assertion more precise and to explore it a bit more deeply.
  • Throughout the book, Valiant makes heavy use of the classical “polynomial versus non polynomial” approach to characterising complexity (e.g. arguing that useful learning takes polynomial time to learn classification of exponential items). This is, as one would expect, well justified in terms of the robustness of these classes, but it leaves me wondering what we lose by taking an asymptotic approach to drawing conclusions in the non-asymptotic settings of animals and their behaviours.

I would recommend this book to all. When I visit sixth forms to talk about my work and about engineering at Imperial, I’m often asked by teachers and students if I have recommendations for accessible but deep reading. I will be recommending Valiant’s book in the future.

Redundant Life Totals

It’s nearly Christmas, and despite illness hitting our house this year, we’re still managing to play games. One of the games I enjoy playing with my family is Star Realms, however the built-in mechanism for keeping track of life total – known in this game as “Authority” – is a little clunky, and so my son and I quickly decided to use spin-down life counters from Magic: The Gathering instead.

Authority starts at 50, and it can go up and down during a game of Star Realms. This means we need at least three spin down life counters to represent our Authority, assuming that the total authority is equal to the sum of the values displayed on the life counters. We did briefly consider using one to represent tens and one to represent units, but this was too fiddly to operate in practice.

Two ways of representing 22 Authority: 12+10 and 20+2.

Over the years, I’ve had many disputes with my son and wife over how I choose to represent my Authority total. If they have a total Authority of n, then they like to have d = \lceil n/20 \rceil spin-downs on the battlefield. For d > 1, if n is not a multiple of 20, then they keep precisely d-1 of these spin downs at 20 Authority, with the remaining one at n \mod 20. They get very annoyed with me, as that’s not how I keep them at all: I choose a seemingly random collection of values that adds up to n.

But there is method in my madness. When I want to deduct, say, 10 authority from a total represented as 20+3, it’s easy for me: I find a spin down with at least the amount I wish to deduct (if one exists) and deduct it from only that one, so I’d get 10+3. In the same position, they’d to deduct the 3, remove that spin down from the battlefield, and then deduct the remaining 7 from the final spin down to get a single spin down showing 13.

They complain that it’s hard to figure out how much Authority I have and whether I’m in the lead or not. And they’re right. In my system, arithmetic tends to be easier – I tend to need to flip fewer spin downs – but comparison is harder.

This is exactly the same situation, for essentially exactly the same reason, as one is faced with in computer arithmetic. They have chosen a canonical number representation, leading to efficient storage (few spin downs on the table) and efficient comparison. I have chosen a redundant number system, leading to fast computation but complex comparison.

Such redundant number systems have a long history. My friend and colleague Miloš Ercegovac, who is an expert in this field, has pointed out to me that an account of such a system was described in a 1726 paper by Mr John Colson FRS (please do download the PDF – the language is a joy). More recently, Miloš developed a deep and profoundly appealing system known as online arithmetic based on exploiting the properties of redundancy in computer arithmetic, a topic we continue to work on to this day. I strongly recommend Miloš’s book for details on this arithmetic and much much more.

Computer arithmetic pops up everywhere. Even when ill and on a Christmas holiday!

(Fellow gamers who also like spin down life counters might also be interested in this paper my son and I published some years ago.)

St. Anselm’s Ontological Argument

Over the past few weeks, my son has been telling me about St. Anselm’s ontological argument for the existence of God. Last weekend I decided to play around with trying to formalise a version of this argument mathematically. This post briefly records my thoughts on the matter. I make no claim to originality, nor have I undertaken a review of what other people have done in this area; this was just a fun way to spend some of the weekend. I would also caution that I am neither a philosopher nor a logician, however I am a user of formal logic. If any philosophers or logicians happen across this blog post and strongly disagree with anything I’ve said, it would be fun to hear about it and I’d love to learn more. It is entirely possible that I’ve committed some cardinal sin. Pun intended.

The version of the argument given in the Internet Encyclopaedia of Philosophy is as follows:

  1. It is a conceptual truth (or, so to speak, true by definition) that God is a being than which none greater can be imagined (that is, the greatest possible being that can be imagined).
  2. God exists as an idea in the mind.
  3. A being that exists as an idea in the mind and in reality is, other things being equal, greater than a being that exists only as an idea in the mind.
  4. Thus, if God exists only as an idea in the mind, then we can imagine something that is greater than God (that is, a greatest possible being that does exist).
  5. But we cannot imagine something that is greater than God (for it is a contradiction to suppose that we can imagine a being greater than the greatest possible being that can be imagined.)
  6. Therefore, God exists.
https://iep.utm.edu/anselm-ontological-argument/#H4

Our first challenge is to understand what it means to be “an idea in the mind”. I tend to think of an idea in the mind as a description of something and, formally, as a logical formula defining a predicate. (I am not sure if this is what others think!) For the sake of simplicity, I will consider in the below “an idea in the mind” to simply mean a predicate defined on “physical things”, but keep in mind that when I refer to a predicate below, I’m really referring to the equivalence class of formulae defining this predicate.

The reading of ideas as formulae (or predicates) makes it hard to say what we mean when we compare two things, one which exists and one which doesn’t, in Line 3 of Anselm’s argument, though, an issue we will address below.

Finally, to my way of thinking, there appears to be an unnecessary implicit focus on uniqueness in the way the argument presented above is given, so let’s strip that away to simplify matters, so we just want to focus on a predicate which may be taken to describe ‘godliness’, or ‘godlike things’ – call it \Theta.

Premises

Taking the above arguments together, I will redefine the first premise as “the concept of godliness is a concept of which there is none greater.”

The other unnecessary element in this argument seems to be the use of language to draw the reader into trying to define ‘greater’ as some kind of second-order predicate denoting an order (of some kind) on first-order predicates. We will see below that this is not necessary, so we will consider ‘greater’ as simply referring here to an arbitrary 2-place second-order predicate G. Thus we obtain

1. \neg \exists P G( P, \Theta ).

We obtain Premise 2 directly by simply stating Premise 1 formally, given our conception of ‘an idea in the mind’.

Premise 3 is interesting. In our reformulation, we have a few approaches to this premise depending on what is meant by “other things being equal”. It seems that the weakest interpretation would be “for any concept describing something that doesn’t exist, there’s some greater concept”:

3a. \forall P. (\neg \exists x P x) \rightarrow \exists Q G(Q, P).

(Note that here, and in the following, I’ve used the convention that a ‘dot’ indicates an opening bracket that is closed at the end of the statement.)

Proof

The proof then follows Lines 4-6 directly:

From Premise 3a, conclude \neg \exists x. \Theta x \to \exists Q G(Q,\Theta) via universal instantiation. Then using Premise 1 and modus tollens, one obtains \exists x. \Theta x.

In English: we may conclude from Premise 3a that if a god does not exist then there exists a concept greater than godliness. But since there is no concept greater than godliness (Premise 1), a god exists.

What does this mean?

This argument is valid, but what is its semantic content? Let’s try to build some ‘unorthodox’ models to see if we can satisfy these premises and understand the rigidity these premises provide. In this section I will use \llbracket \phi \rrbracket to denote the interpretation of a formula \phi in a model, and I will use \Xi to denote the false predicate, i.e. \llbracket \Xi \rrbracket(x) is false for all x.

One simple model has (M1) \llbracket G( P, \Xi ) \rrbracket true for any predicate P with different interpretation to \Xi, (M2) \llbracket G( P, Q ) \rrbracket false for all other predicate pairs, and (M3) \llbracket \Theta \rrbracket(x) true for all x. Let’s verify the premises under this model. Since \Theta and \Xi differ in interpretation, Premise 1 is true by (M2 + M3). Equally, Premise 3a is true by (M1). What’s this model (one of several) actually saying? “Everything is godlike! A godlike thing exists!”. So quite reasonable logically, but perhaps not in line with a typical reading – perhaps it can be taken as a pantheistic reading!

Another simple model has all the same interpretations as above but with \llbracket \Theta \rrbracket(x) true for exactly one element. This model is saying “There’s a unique existent God! But the concept of God is no greater than the concept of any other existent thing.”

Both of these models do actually have G as a partial order, though there are also models where it is not. Even if we were to impose strict totality on G, then there are minimal restrictions on the order: we require only that godliness is at the top.

In general, in every model, the interpretation of G is one for which there is an epimorphism \delta from \llbracket G \rrbracket onto a specific relation on \{\top, \bot\}. Then the relation in question is \Gamma = \{ (\top,\bot), (\bot,\bot) \}. We also require \delta(\Theta) = \top and \delta(\Xi) = \bot. Another way of saying this is that any relation G will do, so long as its digraph can be partitioned into two vertex subsets: a nonempty set containing zero indegree nodes (including \Theta), and a nonempty subset containing the rest (including \Xi). Note that to achieve this, the interpretation of \Theta and \Xi must necessarily differ, i.e. a god must exist.

It appears that Anselm’s argument has basically shifted the problem ‘existence of God’ onto the problem of “what does ‘greater’ mean?” The special relation described in the preceding paragraph forms the archetype of acceptable answers to the latter question: ideas that are greatest must be those that describe an existing thing.

“Other things being equal”

Let’s go back to the line ‘other things being equal’. The best I can muster here as an alternative interpretation of Premise 3 taking this line into account is that “for every concept P describing something that doesn’t exist, there’s some greater concept describing only those things described by P that also exist”:

3b. \forall P. (\neg \exists x P x) \rightarrow \exists Q. G(Q, P) \wedge \exists x Q(x) \wedge \forall x. Q(x) \rightarrow P(x).

However, (3b) is clearly unsatisfiable: For an arbirary predicate \Pi, assume the antecedent \neg \exists x \Pi x. Then the third conjunct enforces \neg \exists x Q(x), which directly contradicts the second conjunct.

Discussion

I have provided two readings of St. Anselm’s argument, based on simplification away of uniqueness of “things possessing godliness” and using formulas (second-order predicates) to discuss “ideas in the mind”. The difference of these two readings hinges on the interpretation of the phrase “other things being equal”. One reading leads to a variety of models, including a model in which every (real) thing has godliness. The other reading renders Premise 3 of St. Anselm’s argument unsound. Neither reading makes strong use of any order properties relating to the phrase ‘greater than’, however the first reading is satisfiable exactly when existence is encoded within the concept ‘greater than’.

Quite an interesting way to spend a few hours – thanks to my son for interesting me in this!

Equivalent. But better.

Ever since primary (elementary) school, we’ve known that multiplying an integer by 10 is easy. No need for the long written calculations we churn through when doing multiplication by hand. Just stick an extra zero on the end, and we’re done. Multiplication is (relatively) hard, concatenation of digits is easy. And yet, in this case, they’re equivalent in terms of the operation they perform.

Similar equivalences abound in the design of digital hardware for arithmetic computation, and my PhD student Sam Coward (jointly supervised by Theo Drane from Intel) has been devising ways to automatically take advantage of such equivalences to make Intel hardware smaller and more efficient. He will be presenting our work on this topic at the main computer arithmetic conference, ARITH, next week. The conference will be online, and registration is free: https://arith2022.arithsymposium.org.

Let’s explore this example from our early school years a bit more. I’ll use Verilog notation \{\cdot,\cdot\} to denote a function taking two bit vectors and concatenating them together. Of course the ‘multiplication by 10 is easy’ becomes ‘multiplication by 2 is easy’ in binary. Putting this together, we can write 2*x \simeq \{x,0\}, meaning that multiplication by two is the same as concatenation with a zero. But what does ‘the same as’ actually mean here? Clearly they are not the same expression syntactically and one is cheap to compute whereas one is expensive. What we mean is that no matter which value of x I choose, the value computed on the left hand side is the same as the value computed on the right hand side. This is why I’ve chosen to write \simeq rather than =. \simeq clearly defines a relation on the set of expressions. This is a special kind of relation called a congruence: it’s an equivalence relation, i.e. it is symmetric, transitive, and reflexive, but it also ‘plays well’ with function application: if x \simeq y then it necessarily follows that f(x) \simeq f(y) for every function symbol f. Like any equivalence relation on a set, \simeq partitions the set into a set of equivalence classes: in our setting a class corresponds to expressions that can be freely interchanged without changing the functionality of our hardware, even if it changes the performance, area or energy consumption of the resulting design.

Our colleagues Willsey, Nandi, Wang, Flat, Tatlock and Panchekha recently published egg, a wonderful open source library for building and exploring data structures known as ‘e-graphs’, specifically designed to capture these relations on expressions. Sam, Theo and I have developed a set of ‘rewrites’ capturing some of the important intuition that Intel designers apply manually, and encoded these for use within egg. To give you a flavour of these rewrites, here’s the table from Sam’s paper; you can see the example we started with is hiding in there by the name ‘Mult by Two’. The subscripts are used to indicate how many digits we’re dealing with; not all these rules are true for arbitrarily-sized integers, and Sam has gone to some lengths to discover simple rules – listed here as ‘sufficient condition’ – for when they can be applied. This is really important in hardware, where we can use as few or as many bits as the job requires.

You can imagine that, when you have this many equivalences, they all interact and you can very quickly build up a very large set of possible equivalent expressions. e-graphs help us to compactly represent this large set.

Once our tool has spent enough time building such a representation of equivalences, we need to extract an efficient implementation as a hardware implementation. This is actually a hard problem itself, because common subexpressions change the hardware cost. For example if I’m calculating (x+1)*(x+1) then I wouldn’t bother to calculate x+1 twice. We describe in our paper how we address this problem via an optimisation formulation. Our tool solves this optimisation and produces synthesisable Verilog code for the resulting circuit.

So, does it generate good circuits? It certainly does! The graph below shows the possible circuit area and performance achievable before (blue) and after (orange) the application of our tool flow before standard logic synthesis tools. For this example, silicon area can be reduced by up to around 70% – a very significant saving.

Area/Delay tradeoff for a smoothing kernel before and after our work

I’ve really enjoyed working on this topic with Sam and Theo. Lots more exciting content to follow. In the meantime, please tune in to hear Sam talk about it next week.

Notes on Model Theory

This summer, I decided to read something about Model Theory. Aware of my limited time, I decided to aim for a slim volume as a guide, so went for An Invitation to Model Theory by Jonathan Kirby.

Here I collect some notes I made while reading Kirby, completely biased to my own interests. Definitions and theorems largely follow the book, but have often been tweaked by me to make my notes more self-contained. I’ve also tried to ‘sloganise’ each of the originally-unnamed theorems in a way that helps me grasp their essence. Additional commentary around the definitions and theorems is mine, and I’ve added a couple of extra definitions to make things easier for me to read back in the future. Of course there is some risk that I’ve introduced errors or lack of clarity, in which case I’m sure the fault lies with me not with the book – please do reach out if you spot anything.

What this post will discuss

We will look at defining basic syntactical objects, terms and formulas, and their interpretations (all in first-order logic). We will look at ideas relating to whether mathematical structures can be distinguished via the truth of formal sentences, including embedding of one structure inside another. We will look at results guaranteeing the existence of structures (especially infinite structures) satisfying sets of such sentences. Turning this around, we’ll also look at some results on which structures can be axiomatised by such sets of sentences. We will consider the extent to which one can define a subset using first-order sentences and the relationship to quantifier elimination (see also my post including algorithms for quantifier elimination). Finally, we will look at the idea of ‘how uniquely’ mathematical structures can be described in this way.

What this post will not discuss

I have purposely not given proofs or examples. Kirby’s book of course has (most) proofs and is rich in examples, through which much of the material is developed. This means I have not discussed some really interesting results on, say, algebraically closed fields, on which Kirby spends considerable time. Those interested in how to apply some of these ideas would do well to consult the book. In addition, I have avoided the topic of types in this simple set of notes, which Kirby highlights as a “central concept in model theory”, despite the interesting results they enable later in the book, e.g. saturated models. The exception is that I included a simplified version of the Ryll-Nardzewski theorem at the end of this post (the proof given uses types) because this particular part of the theorem may be of use to me in the future and can be stated without recourse to types.

I hope to do a later blog post with some of the material using types in due course.


Languages and Structures

Definition (Language). A language L consists of a set of relation symbols, a set of function symbols, a set of constant symbols and, for each relation and function symbol, a positive number known as its arity. The set of all these symbols is denoted \text{Symb}(L).

Definition (Lstructure). An L-structure \mathcal{A} consists of a set A called its domain, together with interpretations of the symbols in L: for each relation symbol R of arity n, a subset R^\mathcal{A} of A^n; for each function symbol f of arity n, a function f^\mathcal{A} : A^n \to A; for each constant symbol c, a value c^\mathcal{A} \in A.

Definition (Embedding). Let \mathcal{A} and \mathcal{B} be L-structures. An embedding of \mathcal{A} into \mathcal{B} is an injection \pi: A \to B s.t.: for all relation symbols R of L and all a_1,\ldots,a_n \in A, (a_1,\ldots,a_n) \in R^\mathcal{A} iff \pi(a_1),\ldots,\pi(a_n) \in R^\mathcal{B}; for all function symbols f of L and all a_1,\ldots,a_n \in A, \pi(f^\mathcal{A}(a_1,\ldots,a_n)) = f^\mathcal{B}(\pi(a_1),\ldots,\pi(a_n)); for all constant symbols c of L, \pi(c^\mathcal{A}) = c^\mathcal{B}.

Definition (Isomorphism). An embedding \pi: \mathcal{A} \to \mathcal{B} is an isomorphism iff there is an embedding \sigma: \mathcal{B} \to \mathcal{A} s.t. \pi \circ \sigma is the identity on \mathcal{B} and \sigma \circ \pi is the identity on \mathcal{A}.

Definition (Automorphism). An automorphism is an isomorphism from a structure to itself.

Terms, Formulas and their Models

Definition (Lterm). The set of terms of the language L is defined recursively as follows: every variable is a term; every constant symbol of L is a term; if f is a function symbol of L of arity n and t_1, \ldots, t_n are terms of L, then f(t_1, \ldots,  t_n) is a term; only something built from the preceding clauses in finitely many steps is a term.

Definition (Lformula). The set of formulas of L is defined recursively as follows: if t_1 and t_2 are terms, then (t_1 = t_2) is a formula; if t_1, \ldots t_n are terms and R is a relation symbol of L of arity n, then R(t_1, \ldots t_n) is a formula; if \varphi and \psi are formulas, then \neg \varphi and \varphi \wedge \psi are formulas; if \varphi is a formula and x is a variable, then \exists x[\varphi] is a formula; only something built from the preceding four clauses in finitely many steps is a formula. We will write \varphi(x) for a formula with free variables x = (x_1, \ldots x_n) in some defined order.

Definition (Lsentence). An L-sentence is a L-formula with no free variables.

Definition (Atomic formula). Formulas of the form t_1 = t_2 for some L-terms t_1, t_2 or of the form R(t_1,\ldots,t_n) for some relation symbol R and terms t_1,\ldots,t_n are atomic formulas.

Definition (Interpretation of terms). Let \mathcal{A} be an L-structure, t be a term of L, and x = (x_1, \ldots, x_r) be a list of variables including all those appearing in t. The interpretation of t, written t^\mathcal{A} is a function from A^r to A defined by: if t is a constant symbol c then t^\mathcal{A} : x \mapsto c^\mathcal{A}; if t is x_i then t^\mathcal{A} : x \mapsto x_i; if t is f(t_1, \ldots, t_n) then t^\mathcal{A} : x \mapsto f^\mathcal{A}(t_1^\mathcal{A}(x), \ldots, t_n^\mathcal{A}(x)).

Definition (Interpretation of formulas). We define \mathcal{A} \vDash \phi(a) (read \vDash as ‘models’) recursively as: \mathcal{A} \vDash t_1(a) = t_2(a) iff t_1^\mathcal{A}(a) = t_2^\mathcal{A}(a); \mathcal{A} \vDash R(t_1(a), \ldots, t_n(a)) iff (t_1^\mathcal{A}(a), \ldots, t_n^\mathcal{A}(a)) \in R^\mathcal{A}; \mathcal{A} \vDash \neg \varphi(a) iff \mathcal{A} \nvDash \varphi(a); \mathcal{A} \vDash \varphi(a) \wedge \psi(a) iff \mathcal{A} \vDash \varphi(a) and \mathcal{A} \vDash \psi(a); \mathcal{A} \vDash \exists x[\varphi(x,a)] iff there is some b \in A s.t. \mathcal{A} \vDash \varphi(b,a).

Definition (Lsentence). An L-sentence is a L-formula with no free variables.

Definition (Models of sentences). For a sentence \varphi, if \mathcal{A} \vDash \phi we say \mathcal{A} is a model of \varphi. For a set of sentences \Sigma, we say that \mathcal{A} is a model of \Sigma (\mathcal{A} \vDash \Sigma) iff it is a model of all sentences in \Sigma.

Definition (Entailment). Let \Sigma be a set of L-sentences and \varphi be an L-sentence. \Sigma entails \varphi, written \Sigma \vdash \varphi if every model of \Sigma is also a model of \varphi.

Note that I am using the notation for entailment used by Kirby’s book, for consistency, but computer scientist readers of my blog will want to note that this symbol is commonly also used for (the more restricted, assuming sound) ‘proves’ relation for a formal deductive system.

Definition (Deductively closed). A set \Sigma of L-sentences is deductively closed iff, for every L-sentence \sigma, if \Sigma \vdash \sigma then \sigma \in \Sigma.

(Again note no formal deduction in play here.)

Definition (Satisfiable). A set of sentences \Sigma is satisfiable iff there exists some L-structure \mathcal{A} such that \mathcal{A} \vDash \Sigma.

Definition (Complete). A set \Sigma of L-sentences is complete if, for every L-sentence \varphi, either \Sigma \vdash \varphi or \Sigma \vdash \neg \varphi.

Definition (Theory). A theory is a satisfiable and deductively closed set of L-sentences.

Definition (Theory of a structure). The theory of an L-structure \mathcal{A}, denoted \text{Th}(\mathcal{A}) is the set of all L-sentences which are true in \mathcal{A}.

Elementary Equivalence

It’s clearly interesting to know whether structures can be distinguished by sentences:

Definition (Elementary equivalence). Two L-structures \mathcal{A} and \mathcal{B} are elementarily equivalent, denoted A \equiv B iff for every L-sentence \varphi, \mathcal{A} \vDash \varphi iff \mathcal{B} \vDash \varphi.

Compactness

Theorem (Compactness). Let \Sigma be a set of L-sentences. If every finite subset of \Sigma has a model then \Sigma has a model.

A huge amount of excitement flows from the compactness theorem, including many of the results that follow.

Proposition (Elementarily-equivalent non-isomorphic structures). Let \mathcal{A} be any infinite L-structure. Then there is another L-structure which is elementarily equivalent to \mathcal{A} but not isomorphic to it.

And so, via this route, we get wonderful non-standard models. As we will see in a moment, via the Löwenheim-Skolem theorems, there are actually many such structures.

Axiomatisable Classes

Definition (Axiomatisable class). If \Sigma is a set of L-sentences then denote by \text{Mod}(\Sigma) the class of all L-structures which are models of \Sigma. The class \text{Mod}(\Sigma) is axiomatised by \Sigma and is and axiomatisable class. The class is finitely axiomatised if it is axiomatised by a finite set of sentences.

By compactness, we can’t capture arbitrarily sized, but finite, models (e.g. the class of all finite groups):

Proposition (Large models for axiomatisable classes ensures infinite models). Let C be an axiomatisable class with arbitrarily large finite models (i.e. for every n \in \mathbb{N} there is a model \mathcal{A}_n \in C whose domain is finite and of size at least n). Then C also contains an infinite model.

If we can finitely axiomatise a class, then if we know we can axiomatise it by some (possibly infinite set of) sentences, then it suffices to choose an appropriate finite subset of those sentences:

Proposition (Finite subsets of sentences axiomatise). If C = \text{Mod}(\Sigma) and C is finitely axiomatisable, then it is axiomatised by a finite subset of \Sigma.

Cardinality of Languages

Definition (Language cardinality). We will write |L| for the cardinality of language L, which we define to be the cardinality of the set of all L-formulas containing only countably many variables.

Unsurprisingly, given this definition, the language cardinality is countable if the set of symbols is countable, otherwise it’s the same cardinality as the set of symbols:

Proposition (Language cardinality). |L| = \max( \aleph_0, |\text{Symb}(L)| ).

Substructures/Extensions and Elementary Substructures/Extensions

Definition (Substructure). Let \mathcal{A} and \mathcal{B} be L-structures, and suppose A \subseteq B. \mathcal{A} is a substructure of \mathcal{B} iff the inclusion of \mathcal{A} into \mathcal{B} is an embedding.

An example of a substructure would be the naturals as a substructure of the integers, with the language consisting of the < relation and 0; clearly the inclusion map is an embedding.

Lemma (Substructures preserve quantifier-free formulas). If \mathcal{A} \subseteq B, \varphi(x) is a quantifier-free formula and a \in A, then \mathcal{A} \vDash \varphi(a) iff \mathcal{B} \vDash \varphi(a).

Taking the same example, we can see why a stronger notion than substructure is required in order to preserve formulas with quantifiers. For example \exists x[ x < 0 ] clearly has a model in the integers but not in the naturals.

Definition (Elementary substructure/extension). \mathcal{A} is an elementary substructure of \mathcal{B} (and \mathcal{B} is an elementary extension of \mathcal{A}, written \mathcal{A} \preccurlyeq \mathcal{B}, iff A \subseteq B and, for each formula \varphi(x) and each a \in \mathcal{A}^n, \mathcal{A} \vDash \varphi(a) iff \mathcal{B} \vDash \varphi(a).

The Tarski-Vaught test is a test for whether a substructure is elementary. The essence of the test is to check whether we can always find an element in the potential substructure that could replace the equivalent element in the superstructure in a formula containing just one variable ranging over the superstructure domain:

Lemma (Tarski-Vaught test). Let \mathcal{A} \subseteq \mathcal{B} be an L-substructure and for every L-formula \varphi(x,y) and every a \in A^n and b \in B such that \mathcal{B} \vDash \varphi(a,b), there is d \in A such that \mathcal{B} \vDash \varphi(a,d). Then \mathcal{A} \preccurlyeq \mathcal{B}.

The Downward Löwenheim-Skolem theorem guarantees the existence of ‘small’ elementary substructures built from the base of an arbitrary subset of the domain of of the elementary extension:

Theorem (Downward Löwenheim-Skolem theorem). Let \mathcal{B} be an L-structure and S \subseteq B. Then there exists an elementary substructure A \preccurlyeq B such that S \subseteq A and |A| \leq \max(|S|,|L|).

While the Upward Löwenheim-Skolem theorem gives us a tower of models elementarily extending any infinite model:

Theorem (Upward Löwenheim-Skolem theorem). For any infinite L-structure \mathcal{A} and any cardinal \kappa \geq \max(|L|,|A|), there exists an L-structure \mathcal{B} of cardinality equal to \kappa such that \mathcal{A} \preccurlyeq \mathcal{B}.

Definable Sets

Definition (Definable set). In an L-structure \mathcal{A}, a subset S of A^n is said to be a definable set if there is an L-formula \varphi(x_1,\ldots,x_n) s.t. S = \{ a \in \mathcal{A}^n \; | \; \mathcal{A} \vDash \varphi(a) \}.

One way to show that a set is undefinable is to find an automorphism not preserving the set:

Theorem (Preservation of definability). If S \subseteq A^n is a definable set and \pi is an automorphism of \mathcal{A} then \pi(S) = S.

Notation (\text{Def}). We will write \text{Def}_n(\mathcal{A}) for the set of all definable subsets of A^n.

Quantifier Elimination

Quantifier elimination is a useful way to get a grip on which sets are definable.

Definition (Quantifier elimination for structures). An L-structure \mathcal{A} has quantifier elimination iff for every n \in \mathbb{N}^n, every definable subset of A^n is defined by a quantifier-free L-formula.

Definition (Quantifier elimination for theories). An L-theory T has quantifier elimination iff for every n \in \mathbb{N}^n, for every L-formula \varphi(x_1, \ldots, x_n), there is a quantifier-free L-formula \theta(x_1, \ldots, x_n) such that T \vdash \forall x [\varphi(x) \leftrightarrow \theta(x)].

These semantic and syntactic views line up:

Lemma (Structures and their theories are equivalent w.r.t. quantifier elimination). Let \mathcal{A} be an L-structure. Then \text{Th}(\mathcal{A}) has quantifier elimination iff \mathcal{A} has quantifier elimination.

Substructure Completeness

One way to show that a theory has quantifier elimination is via substructure completeness, a weaker notion than completeness.

Notation (\mathcal{A}^+). Given an L-structure \mathcal{A}, define a new language L_A by adding a new distinct constant symbol c_a for every a \in A. \mathcal{A}^+ is the L_A-structure obtained by interpreting each constant symbol c_a as a, with all other interpretations remaining as in \mathcal{A}.

Definition (Diagram). The diagram \text{Diag}(\mathcal{A}) of a L-structure \mathcal{A} is the set of all atomic L_A-sentences and negations of atomic L_A-sentences which are true in \mathcal{A}^+.

Definition (Substructure complete). A L-theory is substructure complete if, whenever \mathcal{A} is a substructure of a model of T, the theory T \cup \text{Diag}(\mathcal{A}) is a complete L_A-theory.

Proposition (Substructure completeness is equivalent to quantifier elimination). Let T be a L-theory. Then T is substructure complete iff T has quantifier elimination.

Principal Formulas

Notation. Given an L-structure \mathcal{A} and an L-formula \varphi with n free variables, we will write \varphi(\mathcal{A}) for \{ a \in A^n | \mathcal{A} \vDash \varphi(a) \}.

Definition (Principal formula). Let T be a complete L-theory. A principal formula w.r.t. T is a formula \psi(x) such that: (i) T \vdash \exists x [\psi(x)], (ii) for every L-formula \varphi(x) with the same tuple of free variables, either T \vdash \forall x[ \psi(x) \to \varphi(x) ] or T \vdash \forall x[ \psi(x) \to \neg \varphi(x)].

Another way to get a good handle on the definable sets of a L-structure is via their principal formulas, especially if there are only finitely many.

Proposition (Definable sets from principal formulas). Let \psi_1(x), \ldots, \psi_m(x) be principal formulas in n variables for the (complete) theory \text{Th}(\mathcal{A}), defining distinct subsets of A^n. Also let these formulas cover A^n, i.e. \cup_{i=1}^m{\psi_i(\mathcal{A})} = A^n. Then every definable set in \text{Def}_n(\mathcal{A}) is a union of some of the sets \psi_i(\mathcal{A}), and |\text{Def}_n(\mathcal{A})| = 2^m.

Categoricity

Bearing in mind the upward and downward Löwenheim-Skolem theorems, what’s the appropriate definition of a model that’s ‘effectively defined’ by a theory?

Definition (Categorical). A theory T is \kappa-categorical iff there is a model of T of cardinality \kappa and any two models of cardinality \kappa are isomorphic.

It turns out that categoricity can give us completeness:

Lemma (Łos-Vaught test). If an L-theory T is \kappa-categorical for some \kappa \geq |L| and T has no finite models, then T is complete.

How do we know whether a theory is categorical? We have a a special case for countably categorical (i.e. \aleph_0-categorical) theories. It turns out that countably categorical complete theories are exactly those whose models have only finitely many definable subsets for each fixed dimension:

Theorem (Ryll-Nardzewski theorem). Let T be a complete L-theory with an infinite model \mathcal{A} and with L countable. Then T is countably categorical iff for all n \in \mathbb{N}^+, \text{Def}_n(\mathcal{A}) is finite.

(Note to reader: the version of this theorem presented in Kirby instead links countable categoricity to the finiteness of Lindenbaum algebras, which I have avoided discussing in this blog post. But Lindenbaum algebras are isomorphic to algebras of definable sets for complete theories (see Kirby Lemma 20.3), so I have stated as above here. I have also eliminated elements of the full theorem presented, to make the presentation here self-contained.)


So how about the book? Would I recommend it to others?

From my perspective as an “informed outsider” to model theory, Kirby’s book was accessible. It’s brief – as the title says, it’s an invitation to model theory – and I think it does that job well. If you’re comfortable (or indeed enjoy) seeing material presented in a variety of ways, sometimes through example that generalises later in the book, sometimes through application, sometimes in its general form first, then you will probably enjoy the presentation – as I did. I felt the book falls a little short of a self-study guide, though, primarily because it doesn’t include worked answers to any of the exercises; this would be a welcome addition. This point is particularly relevant to Chapter 16, which (unlike other chapters in the book) takes the form of a ‘guided exercise’ to applying the ideas of the preceding chapters to the study of a particular structure.

Overall, I would definitely recommend to other outsiders wanting to catch a glimpse of what Model Theory is all about.

Schools Bill 2022

On the 11th May 2022 the Government published its Schools Bill, providing the legislation required to implement several of the ideas explained in their earlier White Paper which I analysed in a previous blog post. On the 12th May the Bill was followed up with a raft of explanatory policy statements. I have been going through all these documents, and summarise here points of key importance or that stood out for me.

It is worth noting at the outset that the policy statements go well beyond what’s in the Bill, as they set out plans for intended standards to be set in the future, with the Bill simply enforcing these standards as they arise. As such, the Bill provides for quite sweeping powers for the Secretary of State to amend these standards and duties over time without the requirement for additional legislation.

1. Academies

1.1. Intervention in Existing Academies

Part 1 of the Bill deals with changes to the landscape of Academies in England. I would characterise the scope here as two-fold: firstly, the aim is to move many existing powers from contractual funding agreements into primary legislation, and secondly to be able to apply these powers at the Trust level (‘proprietor’ in the Bill) rather than only at individual Academy level.

Currently, the DfE’s powers of intervention depend on the version of the Funding Agreement and Articles of Association used by a particular academy, which depends on when they converted to Academy status – a patchwork which is both inequitable and very hard to manage. The Bill would place all Academies under the same legislative framework, which is to be welcomed.

Notices to Improve, an existing mechanism (formerly known as ‘Financial Notices to Improve’) seem to be envisaged as the main approach for ensuring change at a Trust or Academy without using the ‘nuclear’ Termination option. It’s proposed that these notices can be issued under wider circumstances than currently, e.g. “[if the SoS is satisfied] that there are significant weaknesses in the proprietor’s governance procedures” Part 1 6(1)(b). I think on balance this is a good idea: it is much better for the DfE to be able to use appropriate levers to effect change, especially in cases of inadequate governance, rather than have to fall back on Termination as the only route.

Along the same lines, Part 1 Section 7 includes powers to appoint or require appointment of directors to an Academy either due to failure to act on a Notice to Improve, a serious breakdown in governance procedures, or for safeguarding reasons. This happens by directing the Trust to appoint a named person or, if the DfE prefers, a person with named skills and experience. The accompanying briefing paper discusses using this power to create Interim Trust Boards, not unlike existing Interim Executive Boards for maintained schools causing concern. It’s interesting that similar provisions existed at academy level in early versions of the model Articles of Association of academies, see e.g. Article 62 allowing the appointment of ‘Additional Governors’ for very similar reasons, and the consequent Article 64 requiring existing governors to stand down in these circumstances. These no longer exist in the current model Articles, and I have always thought this odd. I am pleased to see these intervention powers returning, and perhaps more importantly all schools placed on an equal footing in this regard.

1.2. Academisation

The main new power over LA maintained schools is the power for LAs to apply for an Academy Order via a new Section 3A of the Academies Act 2010. This appears as Part 1 29(2) of the Bill. This seems to be the way the Government is implementing its stated aim to allow LAs to establish MATs (see my analysis of the White Paper for more detail). In areas where the LA is aligned with Government plans, we can expect to see beginning of the end of maintained schools over the 2023/24 school year, I expect. However, there is no obligation for LAs to apply for academy orders, so I believe we will still see a mixed economy of MATs, maintained schools, and isolated SATs well into the future, but perhaps with more regional variation than currently.

1.3. Grammar Schools

Grammar schools and schools with religious character get significant emphasis in the Bill. At least for grammar schools, I read this as primarily a desire to reassure them that the only way they will lose their ability to select is through a parental ballot, not through one of the sweeping powers introduced in the rest of the Bill: essentially the Secretary of State appears to want to explicitly limit his own powers in this regard without bringing in further primary legislation. This makes sense in the context of the White Paper, as a way to bring more grammar schools into MATs.

1.4. Not in the Bill

There are a few things I noted previously as mentioned in the White Paper that the Bill gives no real insight into, such as:

  • the requirement to (re)broker schools into strong MATs after two consecutive Ofsted reports below Good
  • the setting up of an “arms length” curriculum body
  • mechanisms for local governance (though this is mentioned in the explanatory briefing)
  • MAT inspection
  • collaboration standards for MATs

There is also talk in the accompanying briefings of improvements to complaint handling by academies, another issue I have commented on before and where I agree that significant improvements need to be made.

I suspect the broad powers introduced by the Bill are likely to be used as an “Enabling Act” to introduce these various points via the school standards provisions outlined in Part 1 1(2) of the Bill. We are told that the Academy Trust Standards will come to parliament separately (see explanatory briefing), and I look forward to reading these.

2. The Rest of the Bill

The rest of the Bill deals with school funding, school attendance, independent educational institutions, and various miscellaneous items.

I was involved in determining the school funding formula for Essex LA for many years, and I don’t read this portion of the Bill as particularly radical. Long-term readers of my blog will know that I’m not very happy with the approach taken to the National Funding Formula (see, for example, my 2017 consultation response and the links to earlier posts going back some years therein). But the new content in this Bill is essentially simply about making the default funding position the “direct” (previously known as “hard”) National Funding Formula, with the Secretary of State authorising exceptions (presumably because they recognise that actually one size really doesn’t fit all!). I’m pleased to see a continued role for Schools Forums in the proposed legislation, including around de-delegation for maintained schools, in Part 2 (3)(a).

The school attendance provisions introduce a duty on LAs to maintain a register of children not in school and to exchange information between LAs when children on the register move, as well as various provisions to help clamp down on illegal schools.

Conclusion

From my perspective, the meat of the Bill is Part 1 on Academies, standards, and interventions. I am cautiously optimistic about the measures proposed to intervene in existing academies. In particular, I welcome the move away from high stakes “hands off” versus “terminate” interventions. Whether a particular intervention is best made at Trust or Academy level is worthy of considerable debate, and I think practice is likely to evolve here over time, but it appears to me that if the Government really wants MATs to take active responsibility for their Academies, as discussed in the White Paper, the balance will move toward Trust-level intervention and towards softer forms of intervention.

I am less comfortable with the proposed amendments to the Academies Act 2010 to allow LAs to issue Academy Orders. There appears to be no mechanism to provide for the LAs to decide which MATs academised schools will end up in following one of these orders. While there is no requirement for LAs to issue Academy Orders to all their maintained schools, I do wonder what will happen in those areas which keep significant maintained schools: how will the Government achieve their stated aim of full academisation here? Future legislation, or increasingly constraining LAs ability to adequately maintain their schools via funding or other levers?

Finally, in order to make the intervention measures work well in practice, good resourcing of Regional Director (currently Regional Schools Commissioner) offices will be absolutely essential. It therefore somewhat unfortunate that these papers arrived in the same week as a drive to cut civil service jobs.

Keeping the Pipelines Full

On the 16th May, my PhD student Jianyi Cheng (jointly advised with John Wickerson) will present his most recent paper “Dynamic C-Slow Pipelining for HLS” at FCCM 2022 in New York, the FPGA community’s first in-person conference since the pandemic hit.

Readers of this blog may remember that Jianyi has been working on high-level synthesis techniques that combine the best of dynamic scheduling with the best of static scheduling [FPGA 2020,FCCM 2021]. The general principle underlying his work is to make the most of what information we have at compile time to develop highly efficient custom architectures, while leaving what we don’t know at compile time to influence execution at run-time.

A very common design pattern in hardware acceleration is the idea of C-slow pipelining. Pipelining tends to be taught early in undergraduate programmes, but C-slow pipelining rarely gets mentioned. The idea arises in circuits with feedback loops. The basic approach to pipelining doesn’t really work in this setting: although we can throw multiple registers into the circuit, potentially improving clock frequency at the cost of latency, just like with feed-forward circuits, we can’t then overlap computation to achieve improved throughput, unlike the feed-forward case, because of the data dependency corresponding to the feedback loop.

C-slow pipelining essentially says “OK, but you can use the spare capacity induced by the pipeline registers to overlap computation of independent streams of data, if you happen to have them available.”

Our new paper introduces a dynamic HLS flow for C-slow pipelining. This is particularly valuable in the context of a globally dynamic environment but where certain components exhibit static control flow and can be efficiently pipelined, for example some deep but predictable computation that must be repeated many times but with the arrival times and sources for this computation may change dynamically at runtime, a perfect fit for our prior work.

Jianyi presents a way to leverage the Boogie language and tool flow from Microsoft Research to automatically prove sufficient conditions for C-slowing to be correct. He is then able to introduce a new hardware component within the Dynamatic HLS tool that allows the schedule to “run ahead” to implement certain bounded out-of-order executions corresponding to C-slowing at the circuit level.

At the cost of a small area overhead in the region of 10%, this combined software analysis and hardware transformation is able to reduce wall-clock execution time by more than half compared to the vanilla dynamic scheduling approach.

If you’ll be in NYC in mid-May, go along and hear Jianyi’s talk!