# HLS and Power: Some FPL Contributions

This week sees the IEEE International Conference on Field-Programmable Logic and Applications, in Ghent, Belgium.  Two of my team are attending to present their research papers on High Level Synthesis and on Run-time Power Estimation. In this post, I briefly summarise the key contributions of these papers.

High-Level Synthesis (HLS) is an important technology, which aims to automatically generate hardware designs from high-level (typically software) descriptions of their behaviour. In a previous blog post, I described some work from my PhD student Junyi Liu (joint with Sam Bayliss) on extending a common paradigm for analysis memory dependences – the polyhedral model – to a parametric version, for efficient pipelining in HLS. This week, Junyi presents an alternative use for the same parametric polyhedral HLS framework: automatic loop tiling (joint work with John Wickerson). Loop tiling is a very common compiler transformation – for example it is often used in matrix-matrix multiplication. The key advantage is to make sure that you only have a small set of data you’re working with at any given moment in time (traditionally for cache, in the FPGA context for embedded scratch-pad memories). The size of this working set can be traded off against the amount of off-chip memory traffic by selection of tile sizes. In a multi-dimensional loop, there are many possible options, and navigating this space is non-trivial. Junyi’s work provides a way to produce an explicit formula for both the memory requirement and the amount of off-chip data traffic required for any given tile size. He can then use nonlinear optimisation techniques to explicitly optimise the traffic subject to any given constraint on buffer size. This work is available as an open-source tool at https://github.com/Junyi-Liu/PolyTSS.

Back in 2016, some work I did with Eddie Hung, James Davis, Josh Levine, Ed Stott and Peter Cheung won the best paper prize at FCCM 2016. We showed that it is possible to use an online (recursive least squares) algorithm to learn the instantaneous power consumption of individual components in an FPGA design, with a view to some kind of run-time manager using this information. The solution worked by monitoring certain signal activity at run-time, but the missing part of the puzzle was which signals to monitor. James’s latest paper, STRIPE, with the same co-authors, answers this question. It turns out that the answer to this problem – as with so many in engineering (and life?) – lies in linear algebra. Golub and Van Loan describe in their classic textbook how QR factorisation can be used to heuristically select a subset of “nearly linearly independent” vectors from a larger set, and it’s this approach that tends to win out when given enough data to work with.

# Floating-Point Workshop

This week I attended a very enjoyable workshop at Dagstuhl on Analysis and Synthesis of Floating-Point Programs, organised by Eva Darulova, Ally Donaldson, Zvonimir Rakamarić, and Cindy Rubio González. As I noted in my talk on Algorithm-Architecture Codesign, this is one of the very few meetings I’ve been to at which delegates spanned some of – what I consider to be – the most exciting research areas at the moment: computer architecture, high-level synthesis, computer arithmetic, numerical analysis, programming languages, and formal methods. Hats off to the organisers for getting all these people in the room at once!

Below, I just choose a few of the many great talks I heard as some personal highlights of the workshop for me. Presentations and – more importantly! – debates during and after presentations were of uniformly excellent quality.

Rubio González, Hollingsworth, and Rakamarić all presented work on precision tuning. This is a topic I did some of the early work on back in 2001, in the context of fixed-point arithmetic for DSP algorithms in hardware, and have maintained an interest in ever since [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], over the years slowly migrating from the very special class of LTI algorithms implemented in fixed-point arithmetic to much broader classes of algorithm, including proving termination of loops under finite precision floating-point and making forays into real algebraic geometry. The topic is currently exhibiting a resurgence of interest, especially for floating-point software. Of the tools presented, I only personally have some experience with FPTuner.

Damouche discussed a tool for automatically rewriting floating-point code for accuracy improvement (joint work with Martel). I’ve been aware of the interesting work of Martel for a while, and it inspired our own SOAP tool and the associated papers [17,18,19] which extend the capability to hardware where one is concerned with performance, area, and numerical error. This theme was also picked up by Panchekha, who has developed some very interesting tools for diagnosing numerical instability and correcting it.

Another common theme of the workshop was reproducibility. Many researchers (and developers!) are unhappy about the non-reproducible nature of floating-point code: change compiler or platform, and you might get a different result – more insidiously, run again on the same platform and you still might get a different result. My colleague Miriam Leeser, Michaela Taufer, Ganesh Gopalakrishnan and Thomas Wahl all spoke eloquently on this topic. Wahl’s work considered the idea of stabilising programs against platform uncertainty. The work auto-inserts pragmas in order to only determinise certain “key” properties, like ensuring the same control flow path is taken each time the program is run.

Donaldson spoke about detecting compiler bugs by inserting (precise) semantics-preserving transformations, and highlighted several such bugs his group has found. A similar theme was picked up by Nagarakatte, who has found bugs in LLVM floating-point optimisations and is proposing a DSL to specify such optimisations precisely.

Jim Demmel gave an interesting summary of proposed changes being discussed by the IEEE-754 floating-point standards body (a new rounded addition useful as a component within reproducible summation), the BLAS standards body, and progress made since his outstanding paper with Dumitriu and Holtz. This paper, when first written, inspired me to pursue the implications of this research for hardware design with my former PhD student, Theo Drane, now at Cadence. For Theo’s thesis, we used Demmel’s work to develop a design flow for hardware implementation of polynomial evaluation, given desired relative error bounds.

Titolo discussed an abstract interpretation approach to proving numerical properties in floating-point, which is also the conceptual framework utilised by our SOAP tool. Aiming at a similar goal, my former postdoctoral researcher Victor Magron presented the approach we derived together (jointly with Donaldson) for bounding error in floating-point computation, closely aligned with the approach I initially kick-started with my former PhD student David Boland back in 2010. I’ve blogged informally about this approach before – see here. Rakamarić discussed the tool FPTaylor which also targets this problem, within the context of the SMACK toolflow developed at Utah. While she didn’t give a talk at the seminar, Eva Darulova, one of the organisers, has developed an excellent paper and tool in the same area, and it was a pleasure discussing her work with her.

There was much discussion at the workshop on the topic of tool inter-operability. Tatlock and Panchekha presented a format for numerical benchmarking, and are urging the research community to cohere around this – it could be very interesting.

There was industrial representation from both Imagination Technologies and Cadence. In Drane’s talk, he made – I believe – an important observation that the research community should take note of “in my experience, if a customer has the time to do in depth verification of their numerical hardware, they also have the time to customise their hardware.”

I had a very enjoyable few days at Dagstuhl, and I hope that we find a way to keep this community together and talking to each other.

# ARITH24: Some Highlights

Imperial College had the privilege of hosting the 24th IEEE Symposium on Computer Arithmetic (ARITH24) this week, thanks to the local arrangements work of my colleague Dr David Thomas. ARITH is the main conference dedicated solely to questions of computer arithmetic. It’s a small but active and dedicated community with – I would argue – increasing significance in the world of custom and FPGA-based accelerator computation, and I’ve enjoyed being on the Technical Programme Committee of the conference in recent years.

In this post, I describe my personal highlights from the conference.

Neil Burgess (ARM), opening the conference

Nick Higham from the University of Manchester gave the first keynote talk on the rise of mixed precision algorithms. This was an exciting tour de force. Nick highlighted the various floating-point precisions available in modern machines, as well as the move to low-precision computation in areas such as machine learning and ultra-high precision requirements in other areas. The key problem studied in the talk was how to solve systems of linear equations while taking advantage of the availability of mixed precision. Nick traced the idea of solution via iterative refinement of an initial approximate solution back to Wilkinson, and traced its development since then. Nick’s own recent work on this problem, in collaboration with Erin Carson, has been to introduce a method involving three different floating-point precisions. He went on to show how such an approach can produce numerical results equivalent to high precision while the bulk of the work is done in low precision. The approach uses approximate LU factorisation, followed by GMRES iterations. I found the whole approach – especially the analysis – fascinating. Moreover, it was an outstanding example of the link between algorithm development and data type selection, a link which was the topic of my own invited talk at ARITH, which I summarise below.

Gonzalez-Navarro and Hormigo presented an interesting floating-point arithmetic where normalisation is limited for efficiency reasons, and presented some empirical results from applying this to DSP tasks.

Oscar Gustafsson presented a very interesting fixed-point implementation of complex rotations, avoiding roundoff errors for low-complexity computation.

In my talk (extended abstract available here), I acted as devil’s advocate in a session organised by Martin Langhammer from Intel. Martin had invited me to give a talk putting a different perspective than “faster / better floating point.” I decided to do this by (semi-) formalising the joint problem of algorithm / data type design for numerical computation, in order to draw out the main differences between design for general purpose processors and for custom or FPGA implementations. After highlighting these from an abstract perspective, I gave a concrete example due to my PhD student Juan Jerez from a few years ago, before discussing work that is trying to automate the kind of design problems faced in this context. Such work includes bounding numerical errors, refactoring code, and in general synthesising numerical code.

Rocca, Dang, and Magron had a very interesting paper on Certified Roundoff Error Bounds using Bernstein Expansions and Sparse Krivine-Stengle Representations. This is the latest incarnation of work Magron, Donaldson and myself started together when Magron was a postdoc in my group, based in turn on the early work I did with Boland bringing automated roundoff error analysis and real algebraic geometry together. It is exciting to see this work branching off in new ways.

Pasca (Intel) and Istoan (INSA) presented a very nice approach to fixed-point function generation for FPGAs. Istoan will be joining my research group as a postdoctoral researcher in September, and I’m excited to be welcoming his expertise.

I organised a special session on Arithmetic in Digital Signal Processing on the second day of the conference. This featured interesting papers from Serre and Püschel (ETHZ) on optimal streamed linear permutations – I have been a fan of Püschel’s work since the first days of SPIRAL – as well as from Imagination Technologies (Rovers and Elliott), Linköping (Gustafsson, et al.), Intel (both Langhammer and Pasca from Intel PSG and Krishnamurthy.)

Unfortunately, I had to miss more than half of the presentations at ARITH due to an unscheduled hospital trip. So there are probably a huge number of exciting talks I have missed out on discussing here – my apologies to the authors. I will definitely be sure to prioritise ARITH attendance in future years.

# Arithmetic Workshop at Imperial

This week my research group had the pleasure of hosting Prof Miloš Ercegovac from UCLA, one of the giants of the field of computer arithmetic. Prof Ercegovac had come at my invitation to deliver a short three-day course on computer arithmetic to approx 40 delegates from Imperial College and elsewhere, including a significant delegation from UK industry.

I first encountered Miloš’s work when I was a PhD student reading around my topic, but I didn’t have the opportunity to make use of his work directly until 2014, when I published a paper with my former PhD student Kan Shi (now with Intel), demonstrating that a digit parallel form of Online Arithmetic can be used in the context of approximate computing, to provide an unexplored trade-off between clock frequency and arithmetic error. I also later developed an architecture based on online arithmetic for arbitrary precision with my BEng student Aaron Zhao (now at Cambridge).

The title of Miloš’s lectures was “An Enduring Pillar of Computer Arithmetic: Redundancy in Representation and its Uses in Algorithms and Implementations”.  I hope that in this brief blog post, I can provide you with an overview of the topics that were discussed at our workshop, aimed at a general technical reader. The best reference for most of the material, should you wish to dig further, is the book by Ercegovac and Lang.

Day 1 kicked off with a discussion of redundancy in representation (the idea that you can have more than one representation of a given number), its realisation in positional radix number systems, and the implications for addition. As a simple example, imagine working in decimal, but allowing your digits to range from -9 to 9 instead of 0 to 9 as usual. This is clearly redundant because, for example, $18 = 2\bar{2}$ where the bar over a digit indicates a negative digit. But it also has the remarkable property that addition can be performed without inducing carries, leading to the potential for very significant parallelism. I particularly enjoyed that Miloš was able to trace this idea back to the 1700s, in a paper by John Colson, published in the Philosophical Transactions of the Royal Society, entitled “A Short Account of Negativo-affirmative Arithmetick”.

Day 2 covered Miloš’s Online Arithmetic, referred to earlier. As I tell my students: when adding or multiplying two numbers, if the two numbers are big, you expect a big result, right? So why not use that information – why wait until you’ve added all the partial products from least-significant digits before producing the most significant digit. Usually, we must wait because a carry from the least significant digits can throw out the most significant. Not so with online arithmetic – taking advantage of redundancy allows one to produce multiplication and addition (and other) results most-significant-digit-first.

Day 3, the final day of our workshop, ended with a discussion of high-radix division, explaining some interesting ways that performance can be squeezed out of division by performing the division in high radices.

I would recommend anyone interested in arithmetic to get hold of Ercegovac and Lang and explore these ideas in more depth. With the rise of custom and accelerator architectures, there hasn’t been a better time to begin to re-explore some of the assumptions underlying our arithmetic hardware. Inspired readers, feel free to contact me about PhD or postdoc opportunities in this area!

# Review: Verified Functional Programming in Agda

Agda is a dependently typed programming language. I’ve read about dependent types on and off for the past few years in magazine articles as well as discussed them a little with my collaborator Dan Ghica and others, but never really had the time to properly experiment until this Easter holiday. This book – by Aaron Stump – is interesting because it’s basic, practical, and jumps right into coding. I thought it would be good for someone like me who wants to play with the practice of dependent types and only delve into theoretical concerns as necessary or motivated to do so.

The book comes with its own Agda library, the Iowa Agda Library (IAL), a basic library of Booleans, lists, etc., simply written for the beginner to understand. It has a lot of exercises and code snippets based on that library which I found useful and relatively easy to follow.

The book assumes no functional programming expertise, and since I’ve done a (very) little bit of functional programming in Haskell in the distant past, this made the introductory material easy to follow and a useful revision aid.

## Introductory Agda

The early parts of the book – Chapters 1 to 4 – are based on various data structures: Booleans, natural numbers, lists. Here the functional programming adds little to what I’ve seen before. The neat thing is the ability to use Agda as a theorem prover. This is what is described in the book as “external verification” – you write code and then you prove – separately – theorems about the code.

Firstly, let’s take a look at some example Agda code, taken from Chapter 4:


take : ∀{ℓ}{A : Set ℓ} → (n : ℕ) → 𝕃 A → 𝕃 A
take 0       _         = []
take (suc n) []        = []
take (suc n) (x :: xs) = x :: take n xs



There’s a few interesting things here already. Firstly, unicode! It feels surprisingly liberating to write unicode characters in code. Secondly the type declaration declares (via ∀), the function to be polymorphic – it operates on lists (𝕃 from IAL) of any type A (of any level ℓ within a hierarchy of type levels).

The real excitement comes when we start proving theorems. For example, given the function nthTail from IAL which returns all elements after the first n elements in a list (and the empty list if the list has fewer than n elements), we can write


ex46 : ∀{ℓ}{A : Set ℓ} (n : ℕ)(l : 𝕃 A) → (take n l) ++ (nthTail n l) ≡ l
ex46 0        _                           = refl
ex46 (suc n) []                           = refl
ex46 (suc n)  (x :: xs) rewrite ex46 n xs = refl



Here ex46 is just the name of my proof (it was Exercise 4.6 in the book). The type declaration for the function ex46 can be read as “I’ll produce a proof that concatenating (take n l) with (nthTail n l) just produces l for all natural numbers n and for all lists of elements of any type A. Here ≡, taken from IAL, is propositional equality, and the only property of it we’re using is reflexivity; in fact the only inhabitant of this type is refl. The first two lines of the proof work because the definitions of take and nthTail are such that when the first argument is zero or the second argument is empty, the left and right hand side normalize to the same result, namely .l ≡ .l in the first case (.l being the name assigned to the second unnamed argument _) and [] ≡ [] in the second case. The third case explicitly uses induction: rewrite is a directive that applies the induction hypothesis ex46 n xs to the proof of ex46 (suc n) (x :: xs). Very pleasing!

## Dependent Types Revealed

Chapter 5 is where we really start to see the power of dependent types. Up until this point, we’ve written functional programs and then proved properties about those programs. We have definitely used dependent types in the process, but they were somehow “invisible” to us as a user, and – at least for the exercises in the book – we could ignore the dependent-type based implementation without really affecting the usability of the approach. Chapter 5 shows how the proofs can be part of the programs themselves, using the type system to express the properties we wish to enforce. The simplest and clearest example given is a declaration of a vector datatype:


data 𝕍 {ℓ} (A : Set ℓ) : ℕ → Set ℓ where
[] : 𝕍 A 0
_::_ : {n : ℕ} (x : A) (xs : 𝕍 A n) → 𝕍 A (suc n)



Note here that a vector, unlike a list, has a specific length built into its type. As a result, you get various properties for free, for example Agda’s type system verifies that the length of the concatenation of two vectors is the sum of the individual lengths from a straight-forward definition of a concatenation operation; the code below type checks, and that’s all that necessary.


_++𝕍_ ∀{ℓ}{A : Set ℓ}{n m : ℕ} → 𝕍 A n → 𝕍 A m → 𝕍 A (n + m)
[] ++𝕍 ys = ys
(x :: xs) ++𝕍 ys = x :: (xs ++𝕍 ys)



A more complex example of Braun trees is described in detail in the book. These are binary trees where the data at a node has a value less than or equal to that of its children while the size of the node left and right subtrees are either equal or the left subtree is exactly one node larger than the right subtree. The author chooses to encode this latter property within the declaration of the Braun tree type, and then shows how to write code for tree insertion, deletion, etc.

I enjoyed this chapter, though it had me puzzling for quite some time over sigma types, a subtle and powerful idea that I think deserves a bit more discussion than the space given in the book; I ended up reading around this topic a bit before I felt comfortable. An example given considers a function that turns lists into vectors. Immediately we have a problem – what is the return type of this function? It should be (𝕍 A n) but n is not fixed! The solution presented is to return a type written in Agda as Σ ℕ (λ n → 𝕍 A n); an inhabitant of this type should be interpreted as “a pair consisting of a natural number and a function which, when applied to that natural number, returns the vector type we’re actually interested in.” This takes a little getting used to!

## Type-Level Computation

Chapter 6, introduces the idea of proof by reflection, amongst other things. This looks super-powerful, and I’m motivated to dig further into this topic when I have the time. The author illustrates the concept by creating a list simplifier, which simplifies expressions on lists. By reflecting the operations on lists as corresponding constructors for list expressions, and then proving the soundness of the simplifier, he is able to use this soundness proof to, inter alia, prove properties about the operations operating on lists. The example is really interesting, though unlike in previous chapters of the book, it’s not really discussed in detail in Chapter 6, and you have to look carefully at the code accompanying the book to see how it all fits together.

The other primary example of Chapter 6 is a statically typed version of printf, the archetypal type-evasive function of my youth

int printf( const char *restrict format, ... );

Here, the author presents an example credited to Augustsson, showing that dependent types can come to the rescue essentially by providing a type for C’s “…” which depends on the format string.

The integer example from this chapter is the closest to my own interests, showing how integers can be defined using naturals in an interesting way: an integer is a natural together with a sign which is unit (trivial) if the natural is zero but has Boolean type otherwise, i.e. the type of the sign is dependent on the value of the natural, leading to the following type declaration and definition of ℤ [taken from here].


ℤ-pos-t : ℕ → Set
ℤ-pos-t 0 = ⊤
ℤ-pos-t (suc _) = 𝔹

data ℤ : Set where
mkℤ : (n : ℕ) → ℤ-pos-t n → ℤ



I had followed through the discussion in the book easily, but came totally unstuck when trying to implement a seemingly obvious exercise I set myself: prove that integer multiplication commutes using a pre-existing proof of the commutativity of multiplication over the naturals.

I had defined integer multiplication as:


_*ℤ_ : ℤ → ℤ → ℤ
_*ℤ_ (mkℤ 0 _)        _                = mkℤ 0 triv
_*ℤ_ (mkℤ (suc n) _)  (mkℤ 0 _)        = mkℤ 0 triv
_*ℤ_ (mkℤ (suc n) sn) (mkℤ (suc m) sm) = mkℤ ((suc n) * (suc m)) (sn xor sm)



where triv is the single inhabitant of the ⊤ type.  I had therefore naively assumed that one could simply write the proof as:

*ℤcommutes (mkℤ 0 _)        (mkℤ 0 _) = refl
*ℤcommutes (mkℤ 0 _) (mkℤ (suc bN) _) = refl
*ℤcommutes (mkℤ (suc aN) _) (mkℤ 0 _) = refl
*ℤcommutes (mkℤ (suc a) sa) (mkℤ (suc b) sb) rewrite xor-comm sa sb | (*comm (suc a) (suc b)) = refl


However, this fails – for the final pattern – for a very interesting reason that was, however, totally incomprehensible to me at first. Agda reports:

𝔹 !=< ℤ-pos-t w of type Set
when checking that the type
(b a w : ℕ) →
w ≡ suc (a + b * suc a) →
(sa sb : 𝔹) →
mkℤ w (sb xor sa) ≡ mkℤ (suc (a + b * suc a)) (sb xor sa)
of the generated with function is well-formed


“Huh? ‘Generated with function?'”

Thus began a small odyssey of Easter holiday evenings spent learning how rewrites are implemented as withs, and how withs in turn are implemented using generated auxiliary functions. I found Norrell and Chapman’s tutorial, together with the good quality documentation provided me with the necessary additional background required to teach myself what was going on here and to produce a final, working proof:


*ℤcommutes : ∀ (a b : ℤ) → a *ℤ b ≡ b *ℤ a
*ℤcommutes (mkℤ 0 _)        (mkℤ 0 _)        = refl
*ℤcommutes (mkℤ 0 _)        (mkℤ (suc m) sm) = refl
*ℤcommutes (mkℤ (suc n) _)  (mkℤ 0 _)        = refl
*ℤcommutes (mkℤ (suc n) sn) (mkℤ (suc m) sm)
with m + n * suc m | *comm (suc n) (suc m)
... | .(n + m * suc n) | refl rewrite xor-comm sn sm = refl



I have mixed feelings about this experience. On the one hand, I’ve learnt a lot about Agda specifically (and dependent types in general) by being forced to go through this process to understand what’s going on under the hood. On the other hand, this seems like a lot of hoops to jump through for this simple proof, and the book under review didn’t have all the answers I was looking for to solve the exercise. I don’t see the latter point as a big deal, but if you’re going to do your explorations on holiday, make sure you have a wifi connection available so you can look beyond the book for help!

## Selected Topics

The remainder of the book is eclectic but interesting.

Chapters 7 and 8 combine to produce a larger example piece of code – a Huffman coder / decoder. It’s good to see the ideas put into practice in a larger case study, of course, though these chapters were less interesting to me personally.

Chapter 9 really recaptured my attention, though! The author uses Agda’s types to reason about termination. He explicitly sets up a general datatype representing what it means for a relation DAG to not have any infinite chain of successor nodes starting from a given named node, something he calls a terminating relation. He then proves in Agda that > over ℕ is terminating in this sense. The chapter then goes on to apply this to producing a termination proof for a recursively defined division function. I’m aware of this kind of reasoning, and have even played a bit with tools making use of these ideas like Byron Cook et al.‘s Terminator. So it’s interesting to see this in the context of the type system of Agda where in fact it becomes rather important since all functions must terminate in Agda – otherwise none of the proofs discussed above make sense!

The final chapter of the book, Chapter 10, formalises some ideas in a particular logic within Agda, using Agda to prove soundness and completeness of the logic.

## Conclusion

The book ends rather abruptly with Chapter 10 on intuitionistic logic and Kripke semantics. There is no rounding off, summarising lessons learnt or directions for future research or indeed practical programming advice.

Nevertheless, I learnt a lot from reading this book – and a lot more from doing its exercises! It’s given me an initial understanding of one of the most recently talked-about developments in programming. I’m left keen to try to apply this to some real-world problems in my research! Well recommended holiday reading.

# Primary Assessment

Readers of this blog will know that I have been critical of the Government’s assessment system for the new National Curriculum in England [1,2,3]. I therefore greet the Secretary of State’s recently launched consultation over the future of primary assessment with a cautious welcome, especially since it seems to follow well from the NAHT’s report on the topic.

What is Statutory Assessment for?

The consultation document states the aim of statutory assessment as follows:

Statutory assessment at primary school is about measuring school performance, holding schools to account for the work they do with their pupils and identifying where pupils require more support, so that this can be provided. Primary assessment should not be about putting pressure on children.

Firstly, let me lay my cards on the table: I do think that school “performance” deserves to be measured. My experiences with various schools suggests strongly that there are under-performing schools, which are in need of additional support to develop their educational practice. There is a subtle but telling difference between these perspectives, my own emphasising support while the Government’s emphasising accountability. While some notions of accountability in schools appear uncontroversial, the term has recently  become associated with high-stakes educational disruption rather than with improving outcomes for our children. We can definitely agree that primary assessment should not be about putting pressure on children; unfortunately, I don’t believe that the consultation proposals seriously address this question.

Consultation Questions

In this section, I focus on the questions in the Government’s consultation on which I have a strong opinion; these are by no means the only important questions.

Q2. The EYFSP currently provides an assessment as to whether a child is ‘emerging, expecting [sic] or exceeding’ the level of development in each ELG. Is this categorisation the right approach? Is it the right approach for children with SEND?

Clearly the answer here primarily depends on the use of these data. If the aim is to answer questions like “how well-aligned – on average – are children with the age-related expectations of the early-years curriculum at this school?” then this assessment scheme is perfectly reasonable. Nor does it need to be tuned for children with SEND who may have unusual profiles, because it’s not about individual pupils, nor indeed for high attaining children who may be accessing later years of the national curriculum during their reception years. But if it’s about understanding an individual learning profile, for example in order to judge pupil progress made later in the school, then any emerging / expected / exceeding judgement seems far too coarse. It groups together children who are “nearly expected” with those well below, and children who are “just above expected” with those working in line with the national curriculum objectives for half way up the primary school – or beyond.

Q3. What steps could we take to reduce the workload and time burden on those involved in administering the EYFSP?

Teacher workload is clearly a key issue. But if we are talking seriously about how to control the additional workload placed on teachers by statutory assessment, then this is an indication that our education system is in the wrong place: there should always be next to no additional workload! Assessment should be about driving learning – if it’s not doing that, it shouldn’t be happening; if it is doing that, then it should be happening anyway! So the key question we should be answering is: why has the statutory framework drifted so far from the need to support pupils’ learning, and how can we fix this?

Q5. Any form of progress measure requires a starting point. Do you agree that it is best to move to a baseline assessment in reception to cover the time a child is in primary school (reception to key stage 2)? If you agree, then please tell us what you think the key characteristics of a baseline assessment in reception should be. If you do not agree, then please explain why.

[… but earlier …]

For the data to be considered robust as a baseline for a progress measure, the assessment needs to be a reliable indicator of pupils’ attainment and strongly correlate with their attainment in statutory key stage 2 assessments in English reading, writing and mathematics.

I agree wholeheartedly with the statement regarding the requirements for a solid baseline progress measure. And yet we are being offered up the possibility of baselines based on the start of EYFS. There is no existing data on whether any such assessment strongly correlates with KS2 results (and there are good reasons to doubt it). If the government intends to move the progress baseline from KS1 down the school, then a good starting point for analysis would be the end of EYFS – we should already have data on this, although from the previous (points-based) EYFS profile. So how good is the correlation of end-of-EYFS and KS2? Because any shift earlier is likely to be worse, so at least this would provide us with a bound on the quality of any such metric. Why have these data not been presented?

It would, in my view, be unacceptable to even propose to shift the baseline assessment point earlier without having collected the data for long enough to understand how on-entry assessment correlates with KS2 results, i.e. no change should be proposed for another 6 years or so, even if statutory baseline assessments are introduced now. Otherwise we run the risk of meaningless progress metrics, with confidence intervals so wide that no rigorous statistical interpretation is possible.

Q9. If a baseline assessment is introduced in reception, in the longer term, would you favour removing the statutory requirement for all-through primary schools to administer assessments at the end of key stage 1?

The language is telling here: “to administer assessments.” If this were phrased as “to administer tests,” then I would be very happy to say “yes!” But teachers should be assessing – not examining – pupils all the time, in one form or another, because assessment is a fundamental part of learning. So really the question is the form of these assessments, and how often they should be passed up beyond the school for national comparison. Here the issue is more about the framework of support in which a school finds itself. If a school is “left to its own devices” with no local authority or other support for years (a common predicament at the moment with the abolition of the Education Services Grant by the Government!) then it way well be too long to wait six-and-a-half years before finding out that a school is seriously under-performing. Yet if the school exists within a network of supportive professionals from other schools and local authorities who have the time and resource to dig deeply into the school’s internal assessment schemes during the intervening years, these disasters should never happen. A prerequisite for a good education system is to resource it appropriately!

Q11. Do you think that the department should remove the statutory obligation to carry out teacher assessment in English reading and mathematics at key stage 2, when only test data is used in performance measures?

I think this is the wrong way round. Schools should only be required to report teacher assessment (and it should be “best fit”, not “secure fit”); tests at Key Stage 2 should be abolished. This would be fully consistent with high quality professional-led, moderated assessment, and address the very real stress placed on both children and teachers by high-stakes testing schemes. Remember the consultation document itself states “Primary assessment should not be about putting pressure on children.”

Q14. How can we ensure that the multiplication tables check is implemented in a way that balances burdens on schools with benefit to pupils?

By not having one. This is yet another situation where a tiny sliver of a curriculum (in this case tedious rote learning of multiplication tables) is picked out and elevated above other equally important elements of the curriculum. Boaler has plenty to say on this topic.

Q15. Are there additional ways, in the context of the proposed statutory assessments, that the administration of statutory assessments in primary schools could be improved to reduce burdens?

The best way to reduce the burden on schools seems to be to more closely align formative and summative assessment processes. However, schools have been explicitly encouraged to “do their own thing” when it comes to formative assessment processes. The best way the Government could help here is by commissioning an expert panel to help learn from the best of these experiments, combining what has been learnt with the best international educational research on the topic, and re-introducing a harmonised form of national in-school assessment in the primary sector.

Best Fit or Secure Fit?

The consultation appears to repeat the Government’s support for the “secure fit” approach to assessment. The document states:

The interim teacher assessment frameworks were designed to assess whether pupils have a firm grounding in the national curriculum by requiring teachers to demonstrate that pupils can meet every ‘pupil can’ statement. This approach aims to achieve greater consistency in the judgements made by teachers and to avoid pupils moving on in their education with significant and limiting gaps in their knowledge and skills, a problem identified under the previous system of national curriculum levels.

The key word here is every. This approach has been one of the key differentiators from the previous national curriculum assessment approach. I have argued before against this approach, and I stand by that argument; moreover, there are good statistical arguments that the claim to greater consistency is questionable. We are currently in the profoundly odd situation where teacher assessments are made by this “secure fit” approach, while tests are more attuned with a “best fit” approach, referred to as “compensatory” in previous DfE missives on this topic.

However, the consultation then goes on to actually suggest a move back to “best fit” for writing assessments. By removing the requirement for teacher assessments except in English, and relying on testing in KS2 for maths and reading, I expect this to be a “victory for both sides” fudge – secure fit remains in theory, but is not used in any assessment used within the school “accountability framework”.

High Learning Potential

The consultation notes that plans for the assessment of children working below expectation in the national curriculum are considered separately, following the result of the Rochford Review. It is sad, though not unexpected, that once again no particular mention is given to the assessment of children working well above the expectation of the national curriculum. This group of high attaining children has become invisible to statutory assessment, which bodes ill for English education. In my view, any statutory assessment scheme must find ways to avoid capping attainment metrics. This discussion is completely absent from the consultation document.

Arithmetic or Mathematics?

Finally, it is remarkable that the consultation document – perhaps flippantly – describes the national curriculum as having been reformed “to give every child the best chance to master reading, writing and arithmetic,” reinforcing the over-emphasis of arithmetic over other important topics still hanging on in the mathematics primary curriculum. It is worth flagging that these changes of emphasis are distressing to those of us who genuinely love mathematics.

Conclusion

I am pleased that the Government appears to be back-tracking over some of the more harmful changes introduced to primary assessment in the last few years. However, certain key requirements remain outstanding:

1. No cap on attainment
2. Baselines for progress measures to be based on good predictors for KS2 attainment
3. Replace high-stress testing on a particular day with teacher assessment
4. Alignment of summative and formative assessment and a national framework for assessment
5. Well-resourced local networks of support between schools for support and moderation

# DATE 2017: Some Personal Highlights

This week I was away at the 2017 Design, Automation and Test in Europe Conference (DATE) in Lausanne, Switzerland, where my collaborator and former staff member Kevin Murray was presenting our joint work with Andrea Suardi and Vaughn Betz.

In this blog post I note a couple of personal highlights from the conference.

Apart from Kevin’s paper, which I describe in accessible form in a previous blog post, Jie Han and Marc Riedel‘s tutorial on stochastic computation was thought provoking. It’s an old idea, which I’ve seen a few times, but it was interesting to catch up with the recent thinking in this area: results on synthesis using Bernstein polynomials and a de-randomisation of the logic.

The theme of approximate computing pervaded the conference. Newcastle had an interesting paper on approximate multiplication where multiple partial products are compressed by replacing (implicit) summation with Boolean or operations. This complements more traditional approaches like I’ve worked on, where we just throw away partial products. It seems like there could be a few interesting generalisations of this approach.

It was also, of course, great to see the many colleagues I often meet at these conferences, and to spend some time in the laboratory of Paolo Ienne and meeting his PhD student Lana Josipovic, who is doing some very interesting work on high-level synthesis.

EPFL also has possibly the best underpass in the world:

DATE remains one of the best conferences to catch up with the various diverse directions in which EDA researchers have gone in recent years.

# Overclocking For Fun and Profit

This week at the Design, Automation and Test in Europe (DATE) conference, Kevin Murray is presenting some exciting work I’ve done in collaboration with Kevin, his supervisor Vaughn Betz at the University of Toronto, and Andrea Suardi at Imperial College.

I’ve been working for a while on the idea that one form of approximate computing derives from circuit overclocking. The idea is that if you overclock a circuit then this may induce some error. However the error may be small or rare, despite a very significant performance enhancement. We’ve shown, for example, that such tradeoffs make sense for image processing hardware and – excitingly – that the tradeoffs themselves can be improved by adopting “overclocking-friendly” number representations.

In the work I’ve done on this topic to date, the intuition that a given circuit is “overclocking friendly” for a certain set of input data has been a human one. In this latest paper we move to an automated approach.

Once we accept the possibility of overclocking, our circuit timing analysis has to totally change – we can’t any longer be content with bounding the worst-case delay of a circuit, because we’re aiming to violate this worst case by design. What we’re really after is a histogram of timing critical paths – allowing us to answer questions like “what’s the chance that we’ll see a critical path longer than this in any given clock period?” Different input values and different switching activities give rise to the sensitization of different paths, leading to different timing on each clock cycle.

This paper’s fundamental contribution is to show that the #SAT probem can be efficiently used to quantify these probabilities, giving rise to the first method for determining at synthesis time the affinity of a given circuit to approximation-by-overclocking.

# Playing with L-Systems

For today’s session of the math circle I jointly run for 5-7 year-olds, we got the kids to play with Lindenmayer Systems (L-Systems for short). L-Systems can be used as compact representations of complex geometric shapes, including fractals. The aim of the session was for children to understand that simple formulae can describe complex geometric objects, building on the intuition that properties of shapes can be described algebraically that we got through a previous session on symmetry and algebra.

I stumbled across this excellent L-System generator on the web, which was perfect for our needs as we didn’t need to install any software on the school laptops. After illustrating how the Koch Snowflake could be generated, we simply let them loose to experiment, suggesting that each time they set the number of iterations to 1 before exploring a greater depth of iteration. They seemed to really enjoy it. On a one-to-one basis, we discussed the reason that various formulae generated their corresponding shapes, trying to embed the link between the equations and the graphical representation, but the main emphasis was generating visually pleasing images.

Here are some of the curves they produced. In each case, the caption is of the form: number of iterations, angle, axiom, production rule.

I would have liked to have the time to discuss in more depth why the curve that appeared to fill the triangle had no white space visible.

Once we had finished, we finally drew together where I presented a simple L-System for the Sierpinski Triangle, an object they’d seen before in a previous session. There were several exclamations of awe, which are always great to hear!

# Concurrent Programming in High-Level Synthesis

This week, my student Nadesh Ramanathan presents our FPGA 2017 paper “Hardware Synthesis of Weakly Consistent C Concurrency”, a piece of work jointly done with John Wickerson and Shane Fleming.

High-Level Synthesis, the automatic mapping of programs – typically C programs – into hardware, has had a lot of recent success. The basic idea is straightforward to understand, but difficult to do: automatically convert a vanilla C program into hardware, extracting parallelism, making memory decisions, etc., as you go. As these tools gain industry adoption, people will begin using them not only for code originally specified as sequential C, but for code specified as concurrent C.

There are a few tricky issues to deal with when mapping concurrent C programs into hardware. One approach, which seems modular and therefore scalable, has been adopted by LegUp: schedule threads independently and then build a multithreaded piece of hardware out of multiple hardware threads. This all works fine, indeed there is an existing pthreads library for LegUp. The challenge comes when there’s complex interactions between these threads. What if they talk to each other? Do we need to use locks to ensure synchronisation?

In the software world, this problem has been well studied. The approach proposed by Lamport was to provide the programmer with a view of memory known as “sequentially consistent” (SC). This is basically the intuitive way you would expect programs to execute. Consider the two threads below, one on the left and one on the right, each synthesised by an HLS tool. The shared variables x and y are both initialised to zero. The assertion is not an unreasonable expectation from a programmer: if r0 = 0, it follows that Line 2.3 has been executed (as otherwise r0 = -1). We can therefore conclude that Line 1.2 executed before Line 2.2. It’s reasonable for the programmer to assume, therefore that Line 1.1 also executed before Line 2.3, but then x = 1 when it is read on Line 2.3, not zero! Within a single thread, dependence analysis implemented as standard in high-level synthesis would be enough to ensure consistency with the sequential order of the original code, by enforcing appropriate dependences. But not so in the multi-threaded case! Indeed, putting this code into an HLS tool does indeed result in cases where the assertion fails.

My PhD student’s paper shows that we can fix this issue neatly and simply within the modular framework of scheduling threads independently, by judicious additional dependences before scheduling. He also shows that you can improve the performance considerably by supporting the modern (C11) standard for atomic memory operations, which come in a variety of flavours from full sequential consistency to the relaxed approach natively supported by LegUp pthreads already. In particular, he shows for the first time that on an example piece of code chaining circular buffers together that you can get essentially near-zero performance overhead by using the so-called acquire / release atomics defined in the C11 standard as part of a HLS flow, opening the door to efficient synthesis of lock-free concurrent algorithms on FPGAs.

As FPGAs come of age in computing, it’s important to be able to synthesise a broad range range of software, including those making use of standard concurrent programming idioms. We hope this paper is a step in that direction.