National Funding Formula

This post contains my personal response to the DfE’s second round consultation on the national funding formula for schools. Individuals, organisations, schools, etc., should feel free to reuse my response. My response is based on my earlier – and lengthier – analysis of the DfE’s proposals, which goes into much more depth. Those interested beyond the specific questions being asked by the DfE should consult the earlier post too.

Please note that consultation responses need to be submitted by 22nd March 2017 and can be done online. Please submit a response, even a partial one. This issue is too important to be ignored!

Q1. In designing our national funding formula, we have taken careful steps to balance the principles of fairness and stability. Do you think we have struck the right balance?

No.

The introduction of a funding floor is a fundamentally unfair approach. The stated purpose of the national funding formula was to provide fairness and end the “postcode lottery”. A funding floor, as envisaged by the consultation document, entrenches the postcode lottery in the current funding environment rather than eliminating it.

Q2. Do you support our proposal to set the primary to secondary ratio in line with the current national average?

Yes.

In the absence of further evidence, this seems like a sensible approach. However I am disappointed that further evidence has not been sought, e.g. a study of the relative primary/secondary performance across different local authorities with different ratios in place. By introducing the NFF, such studies will no longer be possible and so the Government is missing an opportunity to base these decisions on hard evidence.

Q3. Do you support our proposal to maximise pupil-led funding?

No – you should increase school-led funding compared to the current national average

This approach fundamentally ignores the different shape and makeup of schools. It would only be a reasonable approach if costs were all equally pupil-led. Instead, I support the principle of equalising disposable income per pupil. Once fixed costs are taken into account, this principle is opposed to that of maximising pupil-led funding.

Q4. Within the total pupil-led funding, do you support our proposal to increase the proportion allocated to the additional needs factors?

No – allocate a lower proportion to additional needs.

I would have said “don’t know” if given an option here, because the DfE has not presented any evidence that increasing this proportion is an appropriate decision. Equally, I do not have evidence that reducing it would be a good decision. Given local authorities central role in understanding local educational additional needs, in the absence of additional evidence, I believe LA averages should be used, which corresponds to a lower proportion compared to the proposals.

Q5. Do you agree with the proposed weightings for each of the additional needs factors?

Q5a. Deprivation – Pupil Based

No – allocate a higher proportion.

A number of studies, e.g. [1], have shown a stronger link between pupil-based deprivation indices and educational disadvantage than between area-based deprivation indices and educational disadvantage. Once again, a boundary discontinuity study would have been helpful, and I am disappointed that this work has not been undertaken.

[1] C. Crawford and E. Greaves, “A comparison of commonly used socio-economic indicators: their relationship to educational disadvantage and relevance to Teach First,” IFS Report R79, 2013.

Q5b. Deprivation – Area Based

No – allocate a lower proportion

See answer to “Deprivation – Pupil based”

Q5c. Low Prior Attainment

No – Allocate a lower proportion

There is simply no justification given in the consultation document for nearly doubling the funding going through this factor. It is frankly shocking that such radical changes can be proposed with no evidence presented in their favour.

No matter what proportion is decided upon for the initial implementation of the NFF, the Government should commit that in the future, should prior attainment metrics rise nationally, any funding lost to schools through the low prior attainment factor should be recycled into the APWU factor.

Q5d. English as an Additional Language

No – allocate a lower proportion

LAs have chosen to allocate an average of 0.9%. The proposal is to increase this to 1.2%. No evidence is presented to support this change, therefore I cannot support it.

On a related matter, for a fixed pot of EAL funding, there is an argument to be had over whether children would benefit more from considerable funding in year 1 for intensive English tuition to allow them to access the curriculum, rather than more “smeared out” three year funding at a lower level per year. Once again, it would be useful to see the research that suggests that one or the other approach actually reaps the greatest benefit before mandating EAL3.

Q6. Do you have any suggestions about potential indicators and data sources we could use to allocate mobility funding in 2019-20 and beyond?

None

Q7. Do you agree with the proposed lump sum amount of £110,000 for all schools?

7a. Primary

No – Allocate a higher amount

In order to ensure the principle of equal disposable income per pupil, the purpose of the lump sum should not be to “contribute to the costs that do not vary with pupil numbers” but rather to “be a genuine reflection of the costs that do not vary with pupil numbers”

7b. Secondary

No – Allocate a higher amount

See answer to “Primary”

Q8. Do you agree with the proposed amounts for sparsity funding of up to £25,000 for primary and up to £65,000 for secondary, middle and all-through schools?

Q8a. Primary

No – Allocate a lower amount

With a properly funded lump sum that reflects the costs that do not vary with pupil numbers, there is no need for a sparsity factor. Need for a sparsity factor is an admission that there are pupils in “non sparse” areas who are being deprived of appropriate disposable funding.

Q8b. Secondary

No – Allocate a lower amount

See answer to Primary

Q9. Do you agree that lagged pupil growth data would provide an effective basis for the growth factor in the longer term?

No comment

Q10. Do you agree with the principle of a funding floor?

No

The introduction of a funding floor completely eliminates the stated purpose of the NFF. It reintroduces and – worse – codifies – a postcode lottery based on historical funding rates.

Q11. Do you support our proposal to set the funding floor at minus 3%?

No

I do not support the funding floor.

Q12. Do you agree that for new or growing schools (i.e. schools that are still filling up and do not have pupils in all year groups yet) the funding floor should be applied to the per-pupil funding they would have received if they were at full capacity?

No

I do not support the funding floor

Q13. Do you support our proposal to continue the minimum funding guarantee at minus 1.5%?

No – the minimum funding guarantee should be higher (i.e. restrict losses to less than 1.5% per pupil in any year)

I strongly support the existence of a MFG as a way of phasing in funding changes. However, in the present financial climate a MFG of -1.5% is problematic coming on top of other issues such as the radical reduction in the education services grant to local authorities and the apprenticeship levy.

Q14. Are there further considerations we should be taking into account about the proposed schools national funding formula?

As a general principle, deviations from a move to average values across LA funding formulae should be justified and evidence-based. I have provided a lot more questions for consideration in a blog post at https://constantinides.net/2016/12/16/national-funding-formula-for-schools-a-critique/

Q15. Do you agree that we should allocate 10% of funding through a deprivation factor in the central school services block?

No Answer

I am disappointed the options in this questionnaire do not allow me to state “not sure”, because – once again – no evidence has been provided to justify the figure of 10%. What I can say confidently is that the overall level of central school services block is far too low. In effect schools are seeing a significant cut to their funding because they will need to fund services previously provided through the Education Services Grant.

Q16. Do you support our proposal to limit reductions on local authorities’ central school services block funding to 2.5% per pupil in 2018-19 and in 2019-20?

No – limit reductions to less that 2.5% per pupil per year

In line with my school-level comments on MFG, it is important to have transitional arrangements in place. Given the radical cut to the Education Services Grant, it is not clear why the proposed limit of 2.5% is so much greater than the school MFG rate of -1.5%.

Q17. Are there further considerations we should be taking into account about the proposed central school services block formula?

No Answer

Advertisements

Algebra and symmetry for 5-7 year-olds

At the encouragement of one of the teachers, I have resumed the math circle I was running a few years ago, jointly with her. Previous posts can be found here. This time we are targeting some of the youngest primary school children. So far, I’ve attended two sessions. In the first, we played with Polydron and I mainly used the session to observe the children and see what they’re capable of. In the second – described here – we experimented with symmetry.

Before the session, we created a few cardboard templates of three well-defined shapes, numbered 1, 2, and 3 in the picture below.

IMG_9197 (1).jpg

We told the children that we would be exploring a way of describing shapes as codes, and first had them cut out several copies of the shapes from coloured paper. This was not as easy as I had hoped – the children varied considerably in speed and accuracy of cutting.

We then told them that there were two operations that could be written down in our code, “next to”, written “;” meaning “the thing on the left of the semicolon is placed immediately to the left of the thing on the right” and “F” meaning “flip the shape in a vertical axis”. They seemed to understand this quite easily, and had no problems coming up and seeking clarification of any points. I hadn’t considered that they had not encountered the semicolon before, though, but this was not a problem, even if it was rendered more like a “j” in most of their written formulae.

We asked them to construct a couple of shapes, such as 2;F2 and F1;2, and then experiment with their own. They seemed to have a lot of fun, and the lower part of the photograph shows some that they were keen to show us when we reconvened in a circle on the carpet.

The children noticed that some shapes such as 2;F2 and F2;2 were symmetric, whereas some were not. I pointed out that nobody had tried to construct a shape such as FF1, and asked what that might look like. Several children correctly identified that FF1 would be indistinguishable from 1, so I wrote “FF1 = 1”, and I think I saw a flicker of an “ah ha!” moment in at least one child who seemed excited by the use of equations here.

At this point we ran out of time in our 1hr session. I would like to take this simple algebra further. In particular, I would like to get them to explore:

  • the generalisation from the observed FF1 = 1 to ∀x.FFx=x.
  • the observation that symmetry in a vertical mirror line corresponds to the statement Fx = x
  • the observation that all shapes of the form x;Fx are symmetric, and relate this to the algebraic definition of symmetry
  • the property F(x;y) = Fy;Fx

For those of a mathematical bent, these are the properties of a semigroup with involution.

I enjoyed myself, and I think the children did too!

POPL and VMCAI 2017: Some Highlights

Last week I was in Paris attending POPL 2017 and co-located events such as VMCAI. This was my first time at POPL, prompted by acceptance of work with John Wickerson. Here are some highlights and observations from my own, biased, position as an outsider to the programming languages community.

Polyhedral analysis

Due to my prior work with linear programming and polyhedral analysis, the polyhedral talks were particularly interesting to me.

Maréchal presented an approach to condense polyhedral representations through ray tracing, which seems to be of general interest.

While in general, polyhedral methods represent some set as \{x | Ax \leq b\} where A and b are parameters, there are various restricted approaches which more efficiently fix A and only allow $b$ as parameters. Seladji presented a technique to try to discover such good “templates” via Principal Component Analysis. I suspect that this is equivalent to an ellipsoidal approximation of the polyhedron in some sense.

Singh, Püschel, and Vechev presented an interesting method to speed up polyhedral abstract domain computations by partitioning, resulting in a much faster tool.

SMT Solvers

An interesting paper from Jovanović presented a nonlinear integer arithmetic solver which looks genuinely very useful for a broad class of applications.

 

Heroics with Theorem Provers

Lots of work was presented making use of Coq, some of which seemed fairly heroic to me. Of particular interest, given my work on non-negative polynomials for bounding roundoff error, was a Coq tactic from Martin-Dorel and Roux that allows the use of (necessarily approximate) floating-point computation in ways I’ve not seen before, essentially writing a sum-of-squares polynomial p \approx z^T Q z as p = z^T Q z + z^T E z and showing that Q+E is positive semidefinite for some small E.

Weak Memory

John presented our work on automatically comparing memory models, which seemed to be well received.

img_9144

 

Numerics

Chiang et al. presented a paper on rigorous floating-point precision tuning. The idea is simple, but elegant: perform error analysis with individual precisions as symbolic variables and then use a global optimization solver to look for a high-performance implementation. Back in 2001, I did a very similar thing but for a very limited class of algorithm (LTI systems) in fixed-point rather than floating-point and using noise power rather than worst-case instantaneous error as the metric of accuracy. However, the general setting (individual precisions, an explicit optimization) are there, and it is great to see this kind of work now appearing in a very different context.

 

Cost Analysis

There were a couple of papers [Madhavan et al., Hoffmann et al.] I found particularly interesting on automated cost analysis.

 

Community differences

As an outsider, I had the opportunity to ponder some interesting cultural differences from which the various research communities I’m involved with could potentially learn.

Mentoring: One of the co-located workshops was PLMW, the Programming Languages Mentoring Workshop. This had talks ranging from “Time management, family, and quality of life” to “Machine learning and programming languages”. Introductory material is mixed with explicitly nontechnical content valuable to early career researchers. Co-locating with the major conference of the field, I would imagine makes it relatively easy to pull in very high quality speakers.

Reproducibility: POPL, amongst other CS conferences, encourages the submission of artifacts. I am a fan of this process. While it doesn’t quite hit the holy grail of research reproducibility, it takes a definite step in this direction.

National Funding Formula for Schools: A Critique

England’s Department for Education released its long-awaited Phase 2 of the consultation on a national funding formula on the 14th December 2016. I have been heavily involved in determining the funding formula for schools in one of the most diverse English local authorities, and so have some detailed thoughts on this process. As a prelude to my own response to the funding formula consultation, I thought it might be helpful to others to lay out my comments against the paragraphs of the Government’s consultation document as a “guide to reading”. I have focused on the areas I know best, which relate to funding arriving at schools rather than proposed funding to be distributed to LAs still, such as funding for growth, central school services, etc.

The DfE seems to be considering two quite distinct drivers for the decisions being proposed. Many decisions use LA formulae and averages between LAs to drive appropriate funding formulae. Elsewhere, clear politically-driven approaches come through – the drive to increase the proportion of funding going into pupil-led factors, etc. These have been presented in a jumbled up fashion that makes it hard to understand the relative impact of these considerations. It would be a relatively straight-forward mathematical task to set up and solve an optimization problem to minimize school funding turbulence when moving to a funding formula using these formula elements. It is disappointing that the DfE has not done this to at least provide an element of transparency in the proposals, as deviation from any such minimal-turbulence formula should indicate the presence of evidence being used to drive a decision. Put plainly: changes to school funding should be either to even up funding between LAs or to achieve a certain outcome.

I have chosen to blog here about the nuts and bolts, and save a formal consultation response, or any overall conclusions, for a future post. I hope my fellow consultation readers and I can have a conversation about these points in the mean time.

As a result of this decision, the remainder of this post is fairly lengthy, and will only make sense if you read it alongside the DfE’s paper. Happy reading!

The Gory Details

1.12 and again in 2.20. This is flawed reasoning. The DfE is correct that if pupils tend to share deprivation (or any other) characteristics, then allocation of funding through these factors achieves the same end result as allocation through basic per-pupil funding. But this is true either in areas of high uniform deprivation or in areas of low uniform deprivation. As a result, the appropriate methodology to use LA formulae to determine the desirable size of deprivation factor would be to specifically look at the formulae of LAs with wide variations in deprivation from school to school, providing a low weighting to formulae of LAs with less varying deprivation, not to simply assume that deprivation funding needs to increase. (Which, incidentally, I am not against, I just want to see evidence before making decisions. Typically such evidence comes from boundary discontinuity studies between schools near borders of LAs. We therefore have a once-in-a-generation opportunity to grasp the nettle and do this properly, before a national funding formula arrives and discontinuities – and hence evidence – disappears.)

1.16. The lump sum is a critically important factor in school funding, especially in areas with schools of widely varying size. The DfE claim that they “cannot see any clear patterns in the specific lump sum values.” Yet it is unclear what analysis has been conducted to discern a pattern. I would not expect any pattern to emerge from the analysis published, because no correlation is looked for between lump sum and school size variability. Nor can this be extracted from the published LA pro-forma summaries. The DfE does note a pattern in this paragraph that a majority of LAs set the same lump sum for secondaries as for primaries, but this could well be only because it was a requirement for the first year of the recent reforms to funding formulae!

2.7 – 2.9 and 2.51-2.56. It is very clear that the DfE has set the maximisation of funding allocated through pupil-led factors as an objective, as evidenced by the title of this section and the explicit references to the aim within the paragraphs. The claim in Paragraph 2.8 is that this is to ensure that “funding is matched transparently to need”. I do not believe this maximisation of funding through pupil-led factors is consistent with matching funding to need. If the Government truly wishes to be fair in its distribution of funding, then with similar school population characteristics, every school should receive the same disposable per pupil funding. Unless lump sums are set to reflect the genuine fixed costs of running a school then in practice the Government will be creating significant inequality of access to education by ensuring that pupils attending large schools attract a significantly greater disposable per pupil funding.

2.13. While I recognise the potential need for an increase in funding when moving from KS1/2 to KS3 and KS4, reception classes are also generally more expensive to run than KS1/2 classes due to the nature of the curriculum in R. By setting a single rate across the primary sector, the funding formula will differentially impact negatively on infant schools, where reception classes make up a greater proportion of the children.

2.16. The consultation document claims that “reception uplift” has “a very small impact on schools’ budgets.” I would like to see what evidence has been used to come to this conclusion. No doubt it has a very small impact on overall school budgets nationally, but I expect that for small schools it could have a considerable impact. Maintained schools have to wait for about 7 months before their census data results in funding changes; academies for nearly a year. In a school with 100 pupils, having 5 more pupils than expected should rightly result in a significant “reception uplift.”

2.21. No justification is given for the figure of 18% given for additional needs factors. The text implies that this goes beyond LA averages and is a result of a conscious Government decision to increase AEN funding – such a decision should be evidence based.

2.26. Some “magic numbers” appear here also: 5.4% for pupil-level deprivation (FSM/FSM6) versus 3.9% for area level (IDACI). These numbers appear to have been plucked out of the air. Presumably there is some statistical evidence to support these figures – it would have been useful to have this sent out with the consultation.

2.28. This is confused. The claim seems to be that Ever6 FSM rate should be higher at secondary schools than primary schools because (i) the overall primary:secondary ratio is less than 1 (so what?) and (ii) the Pupil Premium is the other way round. But the DfE also sets the pupil premium rate (and why are these two not combined anyway since they’re both Ever6 based?) It seems that those setting the Pupil Premium rate want to tug the ratio one way and those setting the funding formula want to pull it back the other way. Most odd.

2.33. The IDACI index is being used in a questionable way here. An IDACI index is a probability that a given child, chosen at random from a geographic area, lives in an income-deprived household. It is not a measure of the severity of deprivation. Thus I can see no justification for funding being allocated by IDACI score in anything other than a purely proportional way, e.g. a child living in an area with IDACI score 0.8 should surely attract twice the IDACI funding of a child living in an area with IDACI score 0.4. Yet looking at Figure 5, we can see that children in Band C (IDACI 0.35 to 0.4) attract the same funding as those in Band D (IDACI 0.3 to 0.35). This makes no sense to me. As an aside, the banding also makes very little sense – why classify pupils into bands if you already know the IDACI score of that pupil’s address: just use it directly, avoiding cliff edges of over/under-funding around the band’s boundaries.

2.34. In line with my comments on 2.21 and 2.26, the “magic number” here is even more alarming. The DfE have looked at how much LAs allocate to low prior attainment (4.3%) and decided to nearly double this to 7.5%. The only justification given for this radical shift is that KS2 attainment is a good predictor for attainment at secondary school. There are several holes in this argument. Firstly, what is “prior attainment”? For primary schools, this used to be EYFS points scores. Then it became whether a child achieved a Good Level of Development in EYFS. Now it is likely to be based on a totally different on-entry baseline assessment in Reception. None of these are comparable, and the baseline Reception assessments are very much questionable and under review at the moment. Secondly, for secondary schools prior attainment means KS2 results. The same KS2 results that have changed so radically in 2016 that we have no knowledge whether these are likely to be good predictors for secondary school performance. Thirdly, even if we ignore these serious methodological concerns, correlation between poor attainment (actually it should be SEN) and prior attainment is cause for a factor greater than zero. Simply no justification is given for why this factor should be doubled. Perhaps it should, perhaps it shouldn’t. Why?

2.42. The move to use EAL3, i.e. funding is attracted for children with English as an Additional Language for the first three years of their education is an interesting one. Currently LA practice varies here. For a fixed pot of EAL funding, there is an argument to be had over whether children would benefit more from considerable funding in year 1 for intensive English tuition to allow them to access the curriculum, rather than more “smeared out” three year funding at a lower level per year. Once again, it would be useful to see the research that suggests that one or the other approach actually reaps the greatest benefit before mandating EAL3.

2.43. More magic numbers here: uplift from 0.9% to 1.2%. Why? Evidence?

2.52. This paragraph makes it clear that the proposal is explicitly to starve small schools of funding, by purposely under-funding the lump sum, in order to ensure that they “grow, form partnerships and find efficiencies.” Rather than starving schools of funds, it might be better properly fund the lump sum while providing time-limited financial enticements for schools to merge where that is possible, as is currently the case.

2.53. There is a methodological error in this paragraph. They state that they looked for a correlation between average school size and lump sum size and found none. Nor should they expect to find one. Imagine LA1 with schools each of 100 pupils and LA2 with schools each of 1000 pupils. There will be no difference in allocation of funding between schools in these LAs no matter what lump sum value is used. However if we now imaging LA3 where half the schools have 100 pupils and half have 1000 pupils, then the impact of lump sum changes will be dramatic here. So the correlation should be with the variation in school size, not with the average school size.

2.57. A sparsity factor is only a sensible option given the choice to under-fund fixed costs in a lump sum. If these were properly funded, a sparsity factor would be unnecessary.

2.59. The detailed calculations for the function of the sparsity factor are omitted from the consultation document – instead a link is provided to another document. The functioning leaves a lot to be desired. For example, primary schools are eligible if they have an average of less than 21.4 children per year group and the average distance between this school and their next-nearest school is at least two miles. The first of these criteria is essentially an admission that schools will less than one form entry are underfunded under the national funding formula. The second is more complex but equally serious, especially for small village schools sitting on the edges of towns. Imagine two schools, separated by a little more than two miles. It may well be that between the two schools is an area of dense population while following the line connecting these two schools out into the countryside leads to very sparsely populated areas. The distance for the children at the countryside end might be much more than 2 miles, yet the average will be less than two, and the school will not attract funding. If thresholds of distance must be used, why is it done on average distance rather than the number of pupils for whom that distance is more than the threshold? Finally, these thresholds necessarily lead to unfairness across the two sides of the threshold. If the lump sum were set to a value reflecting the fixed costs of running a school, none of this tinkering would be necessary.

2.60. The steep tapering proposed for the primary sparsity factor is grossly unfair to schools with average year group sizes around 25 – they get none of the benefit compared to their colleagues with smaller classes, yet they see the full impact of an under-funded lump sum which can be safely ignored by large primaries.

2.61. Even if we accepted the sparsity factor, the maximum value of £25k for primaries on top of the £110k lump sum still under-represents the fixed costs of running a school. Meanwhile, the use of a greater lump sum of £65k for secondaries seems inconsistent with the simplification proposed to use a single lump sum across all phases.

2.77 – 2.79. This part of the consultation, on area cost adjustment, refers to a technical note that does not yet appear to have been published on the consultation website. I reserve judgement on this issue, noting that the devil is likely to be in the detail, and that any methodology for taken into account labour market costs needs to avoid cliff edges where schools on one side of an artificial geographical boundary benefit significantly compared to those on the other, an issue the national funding formula was supposed to address.

2.81-2.82. It is of course welcome that any reduction in school budgets is phased in over time so that schools are able to cope with “the pace […] of those reductions.” However, it is not very clear what this means in practice. What does it mean for a school to “cope” with its reduction in funding – does it mean a reduction in expenditure with negligible loss in educational outcomes, or with “acceptable” loss in educational outcomes? If the latter, what is acceptable? If the former, what evidence do we have that the current MFG of -1.5% per annum has negligible impact on educational outcomes?

2.83-2.85. It is far less clear that any kind of “floor” is an equitable way of smoothing change, indeed it goes against the very aim of an equal funding formula for all. Some schools will receive more funding simply because they historically did, and others will therefore receive less as a result, from any fixed education funding pot. If a floor is required in order not to damage school performance in the long run, this suggests that funding reductions implied by the national funding formula are simply unsustainable in those schools. Therefore instead of clamping maximum loss to 3%, the DfE should be asking why some schools lose more than 3% and whether this is justifiable for those schools. If not, the formula is at fault and should be changed for all schools, not just those below -3%.

2.86. By maintaining the floor as a per pupil funding guarantee, the Government could potentially introduce severe differentials between schools. In particular in areas of high existing lump sum where there are some small schools that grow to be of comparable size to large schools, the formerly small school would be very significantly over-funded compared to its neighbour, for no good reason.

3.11. The consultation states here that “we have considered carefully the potential impact on pupil attainment in schools likely to face reductions as a result of these reforms,” yet this analysis is not presented. Instead we are simply told that “we see excellent schools at all points of the funding spectrum,” which is no doubt true but fairly meaningless when it comes to assessing the educational impact. A good starting point would be to look at what correlation exists between disposable income per pupil, i.e. per pupil funding once realistic fixed costs are subtracted, and progress measures at the school.

Time is Precision

Most modern FPGA arithmetic designs use bit-parallel binary arithmetic – typically two’s complement for signed computation. This generally makes for fast arithmetic, but has the distinct disadvantage that silicon area scales with precision of computation. Occasionally – much more often in the distant past when FPGA area was a precious resource – people compute with bit serial binary arithmetic. In this case, time scales with precision.

One problem with digit serial binary arithmetic for multiplication and division is that you need to know, in advance, the location of your least significant digit: while precision unfolds as time, precision is fixed a priori.

Milos Ercegovac‘s online arithmetic forms an interesting counterpoint to this: in this arithmetic, digits are produced most-significant-digit-first. This suggests that the longer we compute, the more precise our answer will be – and we can terminate whenever we’ve got a good enough answer. Or run out of time. The problem with this approach is that the hardware arithmetic units generally implemented for digit-serial multiplication or division still scale with precision requirements, placing an unwanted a priori bound on computational precision.

Enter my former BEng project student Aaron Zhao, who presented our paper “An Efficient Implementation of Online Arithmetic” at the IEEE International Conference on Field-Programmable Technology in Xi’an, China. Aaron’s contribution – for his BEng final year project – was to design a library of online arithmetic units whose logic area is constant with precision (of course, they still need more RAM for more precision.) This opens up a lot of practical possibilities for our research.

Precision (and energy) can scale elastically with time in FPGA-based compute.

Memory Consistency Models, and how to compare them automatically

John Wickerson explains our POPL 2017 paper on memory consistency

John's Blog

My POPL 2017 paper with Mark Batty, Tyler Sorensen, and George Constantinides is all about memory consistency models, and how we can use an automatic constraint solver to compare them. In this blog post, I will discuss:

  • What are memory consistency models, and why do we want to compare them?
  • Why is comparing memory consistency models difficult, and how do we get around those difficulties in our work?
  • What can we achieve, now that we can compare memory consistency models automatically?

What is a memory consistency model?

Modern computers achieve blistering performance by having their computational tasks divided between many ‘mini-computers’ that run simultaneously. These mini-computers all have access to a collection of shared memory locations, through which they can communicate with one another.

It is tempting to assume that when one of the mini-computers stores some data into one of these shared memory locations, that data will immediately become visible to any other mini-computer that subsequently loads from that location…

View original post 4,172 more words

Rounding Error and Algebraic Geometry

We’re often faced with numerical computation expressed as arithmetic over the reals but implemented as finite precision arithmetic, for example over the floats. A natural question arises: how accurate is my calculated answer?

For many years, I’ve studied the closely related problem “how do I design a machine to implement a numerical computation most efficiently to a given level of accuracy?” for various definitions of “numerical computation”, “efficiently” and “accuracy”.

The latest incarnation of this work is particularly exciting, in my opinion. ACM Transactions on Mathematical Software has recently accepted a manuscript reporting joint work of my former post-doc, Victor Magron, Alastair Donaldson and myself.

The basic setting is borrowed from earlier work I did with my former PhD student David Boland. Imagine that you’re computing a + b in floating-point arithmetic. Under some technical assumptions, the answer you get will not actually be a + b, but rather (a + b)(1 + \delta) where |\delta| << 1 is bounded by a constant determined only by the precision of the arithmetic involved. The same goes for any other fundamental operator *, /, etc. Now imagine chaining a whole load of these operations together to get a computation. For straight-line code consisting only of addition and multiplication operators, the result will be polynomial in all your input variables as well as in these error variables \delta.

Once you know that, we can view the problem of determining worst-case accuracy as bounding a polynomial subject to constraints on its variables. There are various ways of bounding polynomials, but David and I were particularly keen to explore options that – while difficult to calculate – might lead to easily verifiable certificates. We ended up using Handelman representations, a particular result from real algebraic geometry.

The manuscript with Magron and Donaldson extends the approach to a more general semialgebraic setting, using recently developed ideas based on hierarchies of semidefinite programming problems that converge to equally certifiable solutions. Specialisations are introduced to deal with some of the complexity and numerical problems that arise, through a particular formulation that exploits sparsity and the numerical properties of the problem at hand. This has produced an open source tool as well as the paper.

This work excites me because it blends some amazing results in real algebraic geometry, some hard problems in program analysis, and some very practical implications for efficient hardware design – especially for FPGA datapath. Apart from the practicalities, algebraic sets are beautiful objects.

Book Review: Complexity and Real Computation

Over several holiday and train journeys, I’ve been slowly reading and digesting this book by Blum, Cucker, Shub, and Smale. It’s one of a number of books I’ve read that are definitively not in my research field – hence holiday reading – and yet touches it tangentially enough to get me excited. I’ve been interested in computation defined in terms of real numbers ever since my own PhD thesis back in 1998-2001, which looked at ways to take a specific class of computation defined on the reals and synthesise hardware circuits operating in very efficient fixed-point arithmetic. While my own work has often involved specifying computation on the reals, including recently exploiting the differences between real and finite precision computation for circuits, this book takes a fundamental look at computation over the reals themselves. In particular, it looks at what happens to notions of computability and complexity if you consider real addition, multiplication, division, and subtraction as basic operators in a computation model: what is computable, what resulting complexity bounds can be proved? Ever since my first exposure to algebraic geometry, especially real algebraic geometry, I see it everywhere – and have even made some small forays [1,2] into harnessing its power for hardware design. It’s a delight to see computability and algebraic geometry coming together so neatly in this text.

Part I: Basic Development

The authors define a computational model that is very natural for analysis: a computation consists of a control-flow graph consisting of input nodes, computation nodes (polynomial or rational maps), branch nodes (checking for equality or inequality, depending on whether the ring / field being computed over is ordered), and output nodes. For the remainder of this review, I will consider only computation over the reals, though some of the results are more general. Theorems naturally flow from this setting. My favourites: the halting set of a machine is a countable union of semialgebraic sets; there are “small” systems of polynomial inequalities, of low degree, describing behaviour of a machine up to T time steps. As a by product, the first practical uses of this theory are illustrated, e.g. a proof that the Mandelbrot set is not decidable over {\mathbb R}. The interplay between the uncountable (program state) and the countable (program counter state) leads to some intricate and enjoyable analysis.

Chapter 3 generalises classical results for universal Turing machines, typically expressed in terms of unbounded tapes with binary storage at each location, to the case where an element of an arbitrary ring is stored at each location (imagine a real number stored at each location). Additional shift operations are introduced to deal with moving along the tape. It is shown that operation over unbounded sequences allows for uniformity of algorithm across dimensionality but does not add additional computational power for fixed dimensionality. In finite time we can only explore a finite portion of the tape, so we still retain the semi-algebraic (or countable union of semi-algebraic) sets for the halting sets derived in Chapter 2.

Chapter 4 extends the usual complexity-theoretic classes P and EXP to computation over a ring. Computation over {\mathbb Z}_2 corresponds to the classical case. There are no great surprises here for a reader who has read a basic text in computational complexity together with Chapters 1-3 of this book. The authors round off the chapter by using this new machinery to cast results from Goedel and Tarski from an interesting perspective. For example, Goedel becomes there exist definable undecidable sets over {\mathbb Z}, while Tarski becomes every definable set over {\mathbb R} is decidable over {\mathbb R}. Questions of definability are then put off until the very end of this book – and this review.

Chapter 5 shows how classical NP, NP-complete and NP-hard definitions carry forward into computation over a general ring. Then, specialising the results to particular rings and particular costs of elementary operations (unit? dependent on size of numbers?), the NP-completeness of various key problems are shown. In particular, it is shown that the classical 3-SAT NP-completeness result can be obtained by working over the ring {\mathbb Z}_2. The chapter ends with an interesting discussion of the P ?= NP question over various rings and with various costs. It’s interesting, for example, that the problem of determining whether there is a zero of a degree-4 polynomial is NP-complete over {\mathbb R} while that of a degree 3 is in P; analogous to the situation with 3-SAT versus 2-SAT in the classical setting.

Chapter 6 moves on to look specifically at machines over {\mathbb Z} with bit cost, i.e. the bigger the number, the more it costs to operate on it. It is shown that these machines are polynomially equivalent to those operating over {\mathbb Z}_2 with unit cost, i.e. anything you can write in terms of algebraic operations over {\mathbb Z} you can also write under the classical Turing model of computation, and vice versa, with only polynomial blow up in execution steps. This is not too surprising, and ultimately derives from the standard possibility of writing an integer as its binary expansion. The chapter then takes a detour via Farkas’s Lemma to show that linear programming feasibility (LPF) is in NP, a rather intricate result I first saw in Schrijver’s encyclopaedic book when working with my former PhD student Qiang Liu on discrete linear algebra for hardware memory optimization. It is noted that LPF’s status depends strongly on the choice of computation model and ring: in the (equivalent) classic setting of integers with bit cost it is NP-complete, over rationals with bit cost it is in P, while its status over {\mathbb R} is apparently open.

Part II: Some Geometry of Numerical Algorithms

This part of the book turns to look at specific algorithms such as Newton’s method and a homotopy method for nonlinear optimization. In each case, complexity results are given in terms of the number of basic real arithmetic operations required. Some fun analysis and quite a clear exposition of homotopy methods, which I’ve worked with before in the context of convex optimization, especially when using barrier methods for linear and convex quadratic programming, something I looked at quite carefully when trying to design FPGA-based computer architectures for this purpose.

The chapter on Bézout’s Theorem assumed a rather more thorough grounding in differential topology than I have, and I wasn’t entirely sure how it sat with the rest of the work, as the constructive / algorithmic content of the theorems was less apparent to me.

The final chapters of this part of the book took me back to more familiar territory – first introduced to me by the outstandingly clear encyclopaedic textbook by Nick Higham – the discussion of condition numbers for linear equations and the relationship to the loss of precision induced by solving such systems (roughly, the relative error in the right hand side of Ax = b gets amplified by the condition number of the matrix A). What was interesting here is the authors’ probabilistic analysis – looking at the expected condition number if the matrix elements are iid Gaussian variables, for example. I have skimmed another book also co-authored by Cucker – on this very topic, and I’m waiting to have the time to read it. Condition for polynomial problems is covered, leading to complexity results for logarithmic barrier interior point methods for linear equations over \mathbb{Q} (spoiler: it’s in \mathbb{P}).

Part III: Complexity Classes over the Reals

After Part II, which was only interesting to me in parts, I started to get excited again by Part III. Lower bounds on the computational complexity of various problem are derived here based on some very interesting ideas. The basic scheme is to bound the number of different connected components in a (semi-)algebraic set in terms of the degree of the polynomials and the number of variables involved. The number of different connected components can then be used to bound the number of steps an (algebraic) machine takes to decide membership of that set.

In Chapter 17, the framework is extended to probabilistic machines, and the standard definition of BPP is extended to computation over an arbitrary ring, giving BPP_{R}. The chapter then goes on to show that probabilistic machines over \mathbb{R} can be simulated by deterministic machines over \mathbb{R} without blowing up execution time, i.e. that P_\mathbb{R} = BPP_\mathbb{R}, whereas this is unknown for the classical setting. The essence of the argument makes use of a special real number (shown, non constructively, to exist), encoding all the different “coin flips” that a probabilistic program makes during its execution, which is then encoded as a machine constant in the simulating machine.

Parallelism – close to my heart – is covered in Chapter 18, through the introduction of complexity classes PL^k_\mathbb{R}, parallel polylogarithmic time, for sets that can be decided in time O(\log^k n) using a polynomial number of processors, and PAR_\mathbb{R}, parallel polynomial time, for sets that can be decided in polynomial time using an exponential number of processors (definitions following a SPMD computational model). Algorithms are derived for matrix inversion and product and a result is cited for parallel quantifier elimination over the reals.

Chapter 22 looks at the relative power of machines over \mathbb{Z}_2, i.e. the classical model, compared to real machines which take inputs restricted to be drawn from \{0,1\}. It’s fairly obvious that the latter setting can provide a lot of additional power, for example deciding membership of a set in \mathbb{Z}^\infty_2 can be done by encoding the characteristic function of the set as the digits of some real constant in the program. It is then shown that additive real machines (a restricted form of the machines discussed above, containing no multiplication or division) that additionally contain branching only on equality can solve exactly the same problems in polynomial time as in the classical model. This is unlike the case where branching can be over inequality, where some additional computational power is provided by this model.

Descriptive Complexity, Chapter 23 is interesting: the descriptive power of (a particular) first order logic extended with fixed point and maximisation operations and of existential second order logic are shown to correspond to a particular subclass of P_\mathbb{R} problems and EXP_\mathbb{R}, respectively. In the latter case, I think this essentially says that known results in the classical case carry forward to computation over the reals despite the move away from a finite ring. I am not familiar enough with computational complexity literature to comment on how the former result compares with the classical setting, and I didn’t feel that this point is was very well described in the book.

Conclusion

Overall, I found this book a little eclectic, especially Part II, reflective of the fact that it has clearly been put together drawing from a variety of different primary sources spanning several decades. This should not be seen as a criticism. The choice of material was wonderful for me – as holiday reading at least! – tangentially touching so many different topics I’ve seen during my career and drawing me in. In places, the development was over an arbitrary ring, in places over \mathbb{C} and in places over \mathbb{R}, and although for some sections it was absolutely clear why, for others it was less clear. Overall, though, I would very much recommend anyone who likes to keep a foot in both the continuous and discrete worlds to take a look at this book.

Book Reviews: Popular Science

A friend recently reintroduced me to the genre of popular science – specifically, popular physics – something I had left behind in my mid teens. I remember voraciously reading the books of John Gribbin as a teenager, sitting in bed in my parents’ house thinking about black holes and quantum physics. Since then I’ve been more focused on – and interested in – my particular field.

However, following prompting, I’ve recently read two popular science books: Seven Brief Lessons on Physics by Carlo Rovelli, and A Beautiful Question by Frank Wilczek. Both are good books, though very different. The former is a brief, almost poetically written waltz through relatively, quantum physics, and more recent unification efforts – it can easily be read in a single afternoon. The second is a more weighty tome, a very personal view from a Nobel Prize winner of beauty and its embodiment in the symmetries present in nature. The chapters of the latter vary in terms of how much I enjoyed them, primarily because many I felt some had too much information not to take a mathematical approach, yet this was not forthcoming. Meanwhile the former was a joy to read because it its brevity skimmed the surface and left me wanting to dig further, specifically into ideas of quantum loop gravity and into what  modern neuroscience may or may not be able to tell us about the emergence of notions of “self”.