GCSEs and A-Levels in 2021

I have collected my initial thoughts after reading the Ofqual consultation, released on the 15th January 2021, over GCSE and A-Level replacements for this year. Alongside many others, I submitted proposals for 2020 which I felt would have avoided some of the worst outcomes we saw in Summer last year. My hope is that, this year, some of the suggestions will be given greater weight.

The basic principle underlying the Ofqual consultation is that teachers will be asked to grade students, that they can use a range of different evidence sources to do so, and that exam boards will be asked to produce mini tests / exams as one such source of evidence. This is not unlike the approach used in Key Stage 1 assessments (“SATs”) in primary schools in recent years. The actual process to be used to come up with a summary grade based on various sources of information is not being consulted over now, and it appears this will come from exam boards in guidance issued to teachers at some undetermined time in the future. This is a significant concern, as the devil really will be in the detail.

Overall, I am concerned that the proposed process is complex and ill-defined. There is scope to produce considerable workload for the education sector while still delivering a lack of comparability between centres / schools. I outline my concerns in more detail below.

Exam Board Papers – What are They For?

Ofqual is proposing that exam boards provide teachers papers (‘mini exams’) to “support consistency within and between schools and colleges” and that they “could also help with appeals”. However, it is very unclear how these papers will achieve these objectives. Papers might be sat at school or at home (p.18), they might be under supervision and they might not. Teachers might be asked to ‘remotely supervise’ these tests (p.18). These choices could vary on a per-pupil basis. The taking of tests may even be optional, and certainly teachers “should have some choice” over which questions are answered by their students. Grades will not be determined by these papers, so at best they will form one piece of evidence. If consistency is challenged, will the grades on these papers (when combined in some, as yet undetermined way) overrule other sources of information? This could be a cause of some confusion and needs significant clarity. The scope for lack of comparability of results between centres is significant when placing undue weight on these papers, and I am left wondering whether the additional workload for teachers and exam boards required to implement this proposal is really worth it.

If tests are to be taken (there are good reasons to suggest that they may be bad idea in their currently-envisaged form – see below), then I agree with Ofqual that – in principle – the ideal place to take them is in school (p.18). However, it is absolutely essential that school leaders do not end up feeling pressured to open to all students in an unsafe environment, due to the need for these tests. This is a basic principle, and I would resist any move to place further pressure on school leaders to fully open their schools until it is safe to do so.

Quality Assurance by Exam Boards

The main mechanism being proposed to ensure comparability and fairness between two centres / schools is random sampling (p.20-21). The exam board will sample the evidence base of a particular school for a particular subject, and query this with the school if they feel there is inadequate evidence to support the grades (it is not clear in the consultation whether the sampling will be of all pupils or individual pupils at that centre). This is a reasonable methodology for that particular subject / centre / student, but there is a major piece of information missing to enable judgement of whether this is sufficient for quality assurance of the system as a whole: what proportion of student grades will be sampled in this way? My concern is that the resources available to exam boards will be too small for this to be a large enough sample and that therefore the vast majority of grades awarded will be effectively unmoderated. This approach appears to be motivated by avoiding the bungled attempt at algorithmic moderation proposed in 2020, but without adequate resourcing, comparability between centres is not guaranteed to be better than it was under the abandoned 2020 scheme, and may even be worse.

Moreover, the bar for changing school grades appears to be set very high: “where robust investigation indicates that guidance has not been followed, or malpractice is found” (p.21), so I suspect we are heading towards a system of largely unmoderated centre-assessed grades. In 2020, centres were not aware at the point of returning CAGs that these would end up being given in unmoderated form, and therefore many centres appear to have been cautious when awarding high grades. Will this still be the case in 2021?

Curriculum Coverage

It is acknowledged throughout the consultation that centres / schools will have been unable to cover the entire curriculum in many cases. There appear to be two distinct issues to be dealt with here:

A. How to assess a subject with incomplete coverage

There are many ways this could be done. For the sake of argument, consider this question in the simplest setting of an exam. Here, the most direct approach would be simply to assess the entire curriculum, acknowledging that many more students would be unable to answer all questions this year, but re-adjusting grade boundaries to compensate. This may not be the best approach for student wellbeing, however, and in any case the proposal to use non-controlled assessment methods opens up much more flexibility. My concern is that flexibility almost always comes at the cost of comparability.

Ofqual are proposing that teachers have the ability to differentially weight different forms of assessment (e.g. practicals in the sciences). Is is unclear in the consultation whether this is on a per-student or on a per-centre basis – either brings challenges to fairness and transparency, and this point needs to be clarified quite urgently. They are also effectively proposing that teachers can give zero weight to some elements of the curriculum by choosing not to set / use assessments based on these elements. It is as yet undecided whether past work and tests can be used, or whether only work from now on – once students are aware it can be used for these purposes. It is opaque in the consultation how they are proposing to combine these various partial assessments. One approach I would not like to see is a weighted average of the various pieces of evidence available. A more robust approach, and one which may overcome some objections to using prior work, may be to allow teachers to select a number of the highest-graded pieces of work produced to date – a ‘curated portfolio’ approach. This may mitigate against both incomplete curriculum coverage and different student attitudes to summatively-assessed work versus standard class / homework.

B. How to ensure fairness

The consultation acknowledges that students in different parts of the country may have covered different amounts of the curriculum, due to local COVID restrictions. There is an unavoidable tension, therefore, between ‘assessment as a measure of what you can do’ and ‘assessment as a measure of what you can do, under the circumstances you were in’. This tension will not go away, and the Government needs to pick an option as a political decision. Some forms of assessment may mitigate this problem, to a degree, such as the ‘curated portfolio’ proposal made above, but none will solve it.

Appeals

It is proposed that students are able to appeal to the exam board only ‘on the grounds that the school or college had not acted in line with the exam board’s procedural requirements’ (p.23). I am rather unclear how students are supposed to obtain information over the procedures followed at the school / college, so this sets a very high bar for appeals to the board. Meanwhile, the procedure for appeal to the school (p.23) appears to have a very low bar, and thus could potentially involve a significant extra workload for school staff. There is some suggestion that schools could be allowed to engage staff from other schools to handle marking appeals. If adequately financially resourced, Ofqual may wish to make this mandatory, to avoid conflicts of interest.

It is unclear in the consultation whether students will be able to appeal on the basis of an unfair weighting being applied to different elements of the curriculum (p.14). This could add an additional layer of complexity.

Grade boundaries have always been problematic. Can we really say that a student one mark either side of a Grade A boundary is that different in attainment? Last year, a bungled attempt was made to address this concern by requiring submission of student rankings within grade boundaries. Centre-Assessed Grades (CAGs) last year were optimistic, but this should come as no surprise – given a candidate I believe has a 50/50 chance of either getting an A or a B, why on earth would I choose a B? This issue will persist under the proposals for 2021, and I believe may be amplified by the knowledge that an algorithmic standardisation process will not be used. I suspect we may see even more complaints about ‘grade inflation’ in 2021, with significant knock-on effects for university admissions and funding. The root cause of this problem appears to be the aim to maintain the illusion of comparability between years for GCSE and A-Level results.

There are very significant workload implications for teachers, for school leaders, and for exam boards in these proposals – far more so than in 2020 arrangements. This workload has explicitly not yet been quantified in the consultation. I believe it needs to be quantified and funded: centres should receive additional funding to support this work, and teachers need to be guaranteed additional non-contact time to undertake the considerable additional work being requested of them.

Private Candidates

Private candidates, such as home-educated students, got a very poor deal last year. This must not be repeated, especially since many of the students who would be taking GCSEs and A-Levels this year are exactly the same home-educated students who decided to postpone for one year as a result of the changes last year. I am concerned to ensure comparability of outcomes between private candidates and centre-based candidates, and I am worried that two of the four proposed mechanisms for private candidates essentially propose a completely different form of qualification for these candidates.

Are 2021 (and 2020) qualifications actually GCSEs and A-Levels?

By labelling the qualifications of 2021 as GCSEs / A-Levels, rather than giving them a different title, there is an implicit statement of comparability between grades awarded in 2021 and those in previous years, which is rather questionable. Others made the point that in 2020 it may have been better to label these qualifications differently – the same argument applies in 2021. Even Ofqual implicitly make this point (p.27) when presenting the argument against overseas candidates taking exams as normal “might give rise to comments that there were 2 types of grades awarded”. The reality is that there will at least three types of grades awarded in recent years, pre-2020, 2020, and 2021. Is it time to face up to this and avoid the pretence of comparability between these different systems?

Equality Considerations

Ofqual seem to believe that if exam boards publish the papers / tests / mini-exams ‘shortly before’ they are taken then that will avoid leaking information but won’t put some students at a disadvantage because ‘students would not know which one(s) they would be required to complete’. I can envisage a situation where some students try to prepare for all published papers the moment they are released online, potentially a much greater number of papers than they will be required to sit, leading to considerable stress and anxiety, with potential equalities implications.

From the consultation, is not clear how exam board sampling will work, but there is the opportunity to bias the sampling process to help detect and correct for unconscious bias, if equalities information is available to exam boards. This could be considered.

Conclusion

On p.29, Ofqual state that ‘The usual assurances of comparability between years, between individual students, between schools and colleges and between exam boards will not be possible.’ This is not inspiring of confidence, but is honest. The question is how we can mitigate these impacts as far as possible. I hope Ofqual will listen carefully to the suggestions for 2021, and publish the approach taken in plenty of time. Releasing the algorithm used in 2020 on the day of A-level result release was unacceptable, and I hope Ofqual have learnt from this experience.

A-Levels and GCSEs in 2020

This week was A-Level results day. It was also the day that Ofqual published its long-awaited standardisation algorithm. Full details can be found in the 319-page report. In this blog post, I’ve set down my initial thoughts after reading the report.

Prelude

I would like to begin by saying that Ofqual was not given an easy task: produce a system to devise A-level and GCSE grades without exams or coursework. Reading the report, it is clear that they worked hard to do the best they could within the confines they operate, and I respect that work. Nevertheless, I have several concerns to share.

Concerns

1. Accounting for Prior Attainment

The model corrects for differences between historical prior attainment and prior attainment of the 2020 cohort in the following way (first taking into account any learners without prior attainment measures.) For any particular grade, the proportion to be awarded is equal to the historical proportion at that grade adjusted by a factor referred to in the report as $q_{kj} - p_{kj}$. (See p.92-93 of the report, which incidentally has a typo here — $c_k$ should read $c_{kj}$.) As noted by the Fischer Family Trust, it appears that this factor is based solely on national differences in value added, and this could cause a problem. To illustrate this requires an artificial example. Imagine that Centre A has a historical transition matrix looking like this – all of its 200 students have walked away with A*s in this subject in recent years, whether they were in the first or second GCSE decile (and half were in each). Well done Centre A!

Meanwhile, let’s say the national transition matrix looks more like this:

Let’s now look at 2020 outcomes. Assume that this year, Centre A has an unusual cohort: all students were second decile in prior attainment. It seems natural to expect that it would still get mainly A*s, consistent with its prior performance, but this is not the outcome of the model. Instead, its historical distribution of 100% A*s is adjusted downwards because of the national transition matrix. The proportion of A*s at Centre A will be reduced by 40% – now only 60% of them will get A*s! This happens because the national transition matrix expects a 50/50 split of Decile 1 and Decile 2 students to end up with 50% A* and a Decile 2-only cohort to end up with 10% A*, resulting in a downgrade of 40%.

2. Model accuracy

Amongst the various possible standardisation options, Ofqual evaluated accuracy based on trying to predict 2019 exam grades and seeing how well they matched to awarded exams. This immediately presents a problem: no rank orders were submitted for 2019 students, so how is this possible? The answer provided is “the actual rank order within the centre based on the marks achieved in 2019 were used as a replacement“, i.e. they back-fitted 2019 marks to rank orders. This only provides a reasonable idea of accuracy if we assume that teacher-submitted rank orders in 2020 would exactly correspond to mark orders of their pupils, as noted by Guy Nason. Of course this will not be the case, so the accuracy estimates in the Ofqual report are likely to be significant overestimates. And they’re already not great, even under a perfect-ranking assumption: Ofqual report that only 12 out of 22 GCSE subjects were accurate to within one grade, with some subjects having only 40% accuracy in terms of predicting the attained grade – so one is left wondering what the accuracy might actually be for 2020 once rank-order uncertainty is taken into account.

There may also be a systematic variation in the accuracy of the model across different grades, but this is obscured by using the probability of successful classification across any grade as the primary measure of accuracy. Graphs presented in the Ofqual report suggest, for example, that the models are far less accurate at Grade 4 than at Grade 7 in GCSE English.

3. When is a large cohort a large cohort?

A large cohort, and therefore one for which teacher-assessed grades are used at all, is defined in the algorithm to be one with at least 15 students. But how do we count these 15 students? The current cohort or the historic cohort, or something else? The answer is given in Ofqual’s report: the harmonic mean of the two. As an extreme example of this, centre cohorts can be considered “large” with only 8 pupils this year – so long as they had at least 120 in the recent past. It seems remarkable that a centre could have fewer pupils than GCSE grades and still be “large”!

4. Imputed marks fill grade ranges

As the penultimate step in the Ofqual algorithm, “imputed marks” are calculated for each student – a kind of proxy mark equally spaced between grade end-points. So, for example, if Centre B only has one student heading for a Grade C at this stage then – by definition – it’s a mid-C. If they had two Grade C students, they’d be equally spaced across the “C spectrum”. This means that in the next step of the algorithm, cut-score setting, these students are vulnerable to changing grades. For centres which tend to fill the full grade range anyway, this may not be an issue. But I worry that we may see some big changes at the edges of centre distributions as a result of this quirk.

5. No uncertainty quantification

What Must Happen Now?

I think everyone can agree that centres need to immediately receive all the intermediate steps in the calculations of their grades. Many examinations officers are currently scratching their heads, after having received only a small part of this information. The basic principle must be that centres are able to recalculate their grades from first principles if they want to. This additional information should include the proportion of pupils in both historical and current cohorts with matched prior attainment data for each subject and which decile each student falls into, the national transition matrices used for each subject, the values of $q_{kj}$ and $p_{kj}$ for each subject / grade combination, the imputed marks for each 2020 student, and the national imputed mark cut-points for each grade boundary in each subject.

At a political level, serious consideration should now be given to awarding teacher-assessed grades (CAGs) this year. While I was initially supportive of a standardisation approach – and I support the principles of Ofqual’s “meso-standardisation” – I fear that problems with the current standarisation algorithm are damaging rather than preserving public perception of A-Level grades. We may have now reached the point that the disadvantages of sticking to the current system are worse than the disadvantages of simply accepting CAGs for A-Levels.

Ofqual states in their report that “A key motivation for the design of the approach to standardisation [was] as far as possible [to] ensure that a grade represents the same standard, irrespective of the school or college they attended.”.  Unfortunately, my view is that this has not been achieved by the Ofqual algorithm. However, despite my concerns over Ofqual’s algorithm, it is also questionable whether any methodology meeting this objective could be implemented in time under a competitive education system culture driven by high-stakes accountability systems. Something to think about for our post-COVID world.

Some Notes on Metric Spaces

This post contains some summary informal notes of key ideas from my reading of Mícheál Ó Searcóid’s Metric Spaces (Springer, 2007). These notes are here as a reference for me, my students, and any others who may be interested. They are by no means exhaustive, but rather cover topics that seemed interesting to me on first reading. By way of a brief book review, it’s worth noting that Ó Searcóid’s approach is excellent for learning a subject. He has a few useful tricks up his sleeve, in particular:

• Chapters will often start with a theorem proving equivalence of various statements (e.g. Theorem 8.1.1, Criteria for Continuity at a Point). Only then will he choose one of these statements as a definition, and he explains this choice carefully, often via reference to other mathematics.
• The usual definition-theorem-proof style is supplemented with ‘question’ – these are relatively informally-stated questions and their answers. They have been carefully chosen to highlight some questions the reader might be wondering about at that point in the text and to demonstrate key (and sometimes surprising) answers before the formal theorem statement.
• The writing is pleasant, even playful at times though never lacking formality. This is a neat trick to pull off.
• There are plenty of exercises, and solutions are provided.

These features combine to produce an excellent learning experience.

1. Some Basic Definitions

A metric on a set X is a function $d : X \times X \to {\mathbb R}$ such that:

• Positivity: $d(a,b) \geq 0$ with equality iff $a = b$
• Symmetry: $d(a,b) = d(b,a)$
• Triangle inequality: $d(a,b) \leq d(a,c) + d(c,b)$

The combination of such a metric and a the corresponding set is a metric space.

Given a metric space $(X,d)$, the point function at $z$ is $\delta_z : x \mapsto d(z,x)$.

A pointlike function $u : X \to {\mathbb R}^\oplus$ is one where $u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)$

For metric spaces $(X,d)$ and $(Y,e)$, $X$ is a metric subspace of $Y$ iff $X \subseteq Y$ and $d$ is a restriction of $e$.

For metric spaces $(X,d)$ and $(Y,e)$, an isometry $\phi : X \to Y$ is a function such that $e(\phi(a),\phi(b)) = d(a,b)$. The metric subspace $(\phi(X),e)$ is an isometric copy of $(X,d)$.

Some standard constructions of metrics for product spaces:

1. $\mu_1 : (a,b) \mapsto \sum_{i=1}^n \tau_i(a_i,b_i)$
2. $\mu_2 : (a,b) \mapsto \sqrt{\sum_{i=1}^n \left(\tau_i(a_i,b_i)\right)^2}$
3. $\mu_\infty : (a,b) \mapsto \max\left\{\tau_i(a_i,b_i) | i \in {\mathbb N}\right\}$

A conserving metric $e$ on a product space is one where $\mu_\infty(a,b) \leq e(a,b) \leq \mu_1(a,b)$. Ó Searcóid calls these conserving metrics because they conserve an isometric copy of the individual spaces, recoverable by projection (I don’t think this is a commonly used term). This can be seen because fixing elements of all-but-one of the constituent spaces makes the upper and lower bound coincide, resulting in recovery of the original metric.

A norm on a linear space $V$ over ${\mathbb R}$ or ${\mathbb C}$ is a real function such that for $x, y \in V$ and $\alpha$ scalar:

• $||x|| \geq 0$ with equality iff $x = 0$
• $||\alpha x|| = |\alpha|\; ||x||$
• $||x + y|| \leq ||x|| + ||y||$

The metric defined by the norm is $d(a,b) = ||a - b||$.

2. Distances

The diameter of a set $A \subseteq X$ of metric space $(X,d)$ is $\text{diam}(A) = \sup\{d(r,s) | r, s \in A\}$.

The distance of a point $x \in X$ from a set $A \subseteq X$ is $\text{dist}(x, A) = \inf\{ d(x,a) | a \in A\}$.

An isolated point $z \in S$ where $S \subseteq X$ is one for which $\text{dist}(z, S \setminus \{z\}) \neq 0$.

An accumulation point or limit point $z \in X$ of $S \subseteq X$ is one for which $\text{dist}(z, S \setminus \{z\}) = 0$. Note that $z$ doesn’t need to be in $S$. A good example is $z = 0$, $X = {\mathbb R}$, $S = \{1/n | n \in {\mathbb N}\}$.

The distance from subset $A$ to subset $B$ of a metric space is defined as $\text{dist}(A,B) = \inf\{ d(a,b) | a \in A, b \in B\}$.

A nearest point $s \in S$ of $S$ to $z \in X$ is one for which $d(z,s) = \text{dist}(z,S)$. Note that nearest points don’t need to exist, because $\text{dist}$ is defined via the infimum. If a metric space is empty or admits a nearest point to each point in every metric superspace, it is said to have the nearest-point property.

3. Boundaries

A point $a$ is a boundary point of $S$ in $X$ iff $\text{dist}(a,S) = \text{dist}(a,S^c) = 0$. The collection of these points is the boundary $\partial S$.

Metric spaces with no proper non-trivial subset with empty boundary are connected. An example of a disconnected metric space is $X = (0,1) \cup (7,8)$ as a metric subspace of ${\mathbb R}$, while ${\mathbb R}$ itself is certainly connected.

Closed sets are those that contain their boundary.

The closure of $S$ in $X$ is $\bar{S} \triangleq X \cup \partial S$. The interior is $S \setminus \partial S$. The exterior is $(\bar{S})^c$.

Interior, boundary, and exterior are mutually disjoint and their union is $X$.

4. Sub- and super-spaces

A subset $S \subseteq X$ is dense in $X$ iff $\bar{S} = X$, or equivalently if for every $x \in X$, $\text{dist}(x,S) = 0$. The archetypal example is that $\mathbb{Q}$ is dense in $\mathbb{R}$.

A complete metric space $X$ is one that is closed in every metric superspace of $X$. An example is $\mathbb{R}$.

5. Balls

Let $b[a;r) = \{ x \in X | d(a,x) < r \}$ denote an open ball and similarly $b[a;r] = \{ x \in X | d(a,x) \leq r \}$ denote a closed ball. In the special case of normed linear spaces, $b[a;r) = a + rb[0;1)$ and similarly for closed balls, so the important object is this unit ball – all others have the same shape. A norm on a space $V$ is actually defined by three properties such balls $U$ must have:

• Convexity
• Balanced (i.e. $x \in U \Rightarrow -x \in U$)
• For each $x \in V \setminus \{0\}$, the set $\{ t \in \mathbb{R}^+ | t x \in U \}$,
• is nonempty
• must have real supremum $s$
• $sx \notin U$

6. Convergence

The $m$th tail of a sequence $x = (x_n)$ is the set $\mbox{tail}_m(x) = \{x_m | n \in {\mathbb N}, n \geq m \}$.

Suppose $X$ is a metric space, $z \in X$ and $x= (x_n)$ is a sequence in $X$. Sequence $x$ converges to $z$ in $X$, denoted $x_n \to z$ iff every open subset of $X$ that contains $z$ includes a tail of $x$. In this situation, $z$ is unique and is called the limit of the sequence, denoted $\mbox{lim }x_n$.

It follows that for $(X,d)$ a metric space, $z \in X$ and $(x_n)$ a sequence in $X$, the sequence $(x_n)$ converges to $z$ in $X$ iff the real sequence $(d(x_n,z))_{n \in \mathbb{N}}$ converges to $0$ in ${\mathbb R}$.

For real sequences, we can define the:

• limit superior, $\mbox{lim sup } x_n = \mbox{inf } \{ \mbox{sup } \mbox{tail}_n(x) | n \in \mathbb{N} \}$ and
• limit inferior, $\mbox{lim inf } x_n = \mbox{sup } \{ \mbox{inf } \mbox{tail}_n(x) | n \in \mathbb{N} \}$.

It can be shown that $x_n \to z$ iff $\mbox{lim sup } x_n = \mbox{lim inf } x_n = z$.

Clearly sequences in superspaces converge to the same limit – the same is true in subspaces if the limit point is in the subspace itself. Sequences in finite product spaces equipped with product metrics converge in the product space iff their projections onto the individual spaces converge.

Every subsequence of a convergent sequence converges to the same limit as the parent sequence, but the picture for non-convergent parent sequences is more complicated, as we can still have convergent subsequences. There are various equivalent ways of characterising these limits of subsequences, e.g. centres of balls containing an infinite number of terms of the parent sequence.

A sequence $(x_n)$ is Cauchy iff for every $r \in \mathbb{R}^+$, there is a ball of radius $r$ that includes a tail of $(x_n)$. Every convergent sequence is Cauchy. The converse is not true, but only if the what should be the limit point is missing from the space — adding this point and extending the metric appropriately yields a convergent sequence. It can be shown that a space is complete (see above for definition) iff every Cauchy sequence is also a convergent sequence in that space.

7. Bounds

A subset $S$ of a metric space $X$ is a bounded subset iff $S = X = \emptyset$ or $S$ is included in some ball of $X$. A metric space $X$ is bounded iff it is a bounded subset of itself. An alternative characterisation of a bounded subset $S$ is that it has finite diameter.

The Hausdorff metric is defined on the set $S(X)$ of all non-empty closed bounded subsets of a set $X$ equipped with metric $d$. It is given by $h(A,B) = \max \{ \sup\{ \mbox{dist}(b, A) | b \in B\}, \sup\{ \mbox{dist}(a, B) | a \in A\} \}$.

Given a set $X$ and a metric space $Y$, $f : X \to Y$ is a bounded function iff $f(X)$ is a bounded subset of $Y$. The set of bounded functions from $X$ to $Y$ is denoted $B(X,Y)$. There is a standard metric on bounded functions, $s(f,g) = \sup \{ e(f(x),g(x)) | x \in X \}$ where $e$ is the metric on $Y$.

Let $X$ be a nonempty set and $Y$ be a nonempty metric space. Let $(f_n)$ be a sequence of functions from $X$ to $Y$ and $g: X \to Y$. Then:

• $(f_n)$ converges pointwise to $g$ iff $(f_n(z))$ converges to $g(z)$ for all $z \in X$
• $(f_n)$ converges uniformly to $g$ iff $\sup\{ e(f_n(x),g(x)) | x \in X \}$ is real for each $n \in {\mathbb N}$ and the sequence $( \sup\{ e(f_n(x),g(x) | x \in X \})_{n \in {\mathbb N}}$ converges to zero in ${\mathbb R}$.

It’s interesting to look at these two different notions of convergence because the second is stronger. Every uniformly-convergent sequence of functions converges pointwise, but the converse is not true. An example is the sequence $f_n : \mathbb{R}^+ \to \mathbb{R}$ given by $f_n(x) = 1/nx$. This converges pointwise but not uniformly to the zero function.

A stronger notion than boundedness is total boundedness. A subset $S$ of a metric space $X$ is totally bounded iff for each $r \in {\mathbb R}^+$, there is a finite collection of balls of $X$ of radius $r$ that covers $S$. An example of a bounded but not totally bounded subset is any infinite subset of a space with the discrete metric. Total boundedness carries over to subspaces and finite unions.

Conserving metrics play an important role in bounds, allowing bounds on product spaces to be equivalent to bounds on the projections to the individual spaces. This goes for both boundedness and total boundedness.

8. Continuity

Given metric spaces $X$ and $Y$, a point $z \in X$ and a function $f: X \to Y$, the function is said to be continuous at $z$ iff for each open subset $V \subseteq Y$ with $f(z) \in V$, there exists and open subset $U$ of $X$ with $z \in U$ such that $f(U) \subseteq V$.

Extending from points to the whole domain, the function is said to be continuous on $X$ iff for each open subset $V \subseteq Y$, $f^{-1}(V)$ is open in $X$.

Continuity is not determined by the codomain, in the sense that a continuous function is continuous on any metric superspace of its range. It is preserved by function composition and by restriction.

Continuity plays well with product spaces, in the sense that if the product space is endowed with a product metric, a function mapping into the product space is continuous iff its compositions with the natural projections are all continuous.

For $(X,d)$ and $(Y,e)$ metric spaces, $\mathcal{C}(X,Y)$ denotes the metric space of continuous bounded functions from $X$ to $Y$ with the supremum metric $(f,g) \mapsto \sup\{ e(g(x),f(x)) | x \in X \}$. $\mathcal{C}(X,Y)$ is closed in the space of bounded functions from $X$ to $Y$.

Nicely, we can talk about convergence using the language of continuity. In particular, let $X$ be a metric space, and $\tilde{\mathbb{N}} = \mathbb{N} \cup \{ \infty \}$. Endow $\tilde{\mathbb{N}}$ with the inverse metric $(a,b) \mapsto |a^{-1} - b^{-1} |$ for $a,b \in {\mathbb N}$, $(n,\infty) \mapsto n^{-1}$ and $(\infty, \infty) \mapsto 0$. Let $\tilde{x} : \tilde{\mathbb{N}} \to X$. Then $\tilde{x}$ is continuous iff the sequence $(x_n)$ converges in $X$ to $x_{\infty}$. In particular, the function extending each convergent sequence with its limit is an isometry from the space of convergent sequences in $X$ to the metric space of continuous bounded functions from $\tilde{\mathbb{N}}$ to $X$.

9. Uniform Continuity

Here we explore increasing strengths of continuity: Lipschitz continuity > uniform continuity > continuity. Ó Searcóid also adds strong contractions into this hierarchy, as the strongest class studied.

Uniform continuity requires the $\delta$ in the epsilon-delta definition of continuity to extend across a whole set. Consider metric spaces $(X,d)$ and $(Y,e)$, a function $f : X \to Y$, and a metric subspace $S \subseteq X$. The function $f$ is uniformly continuous on $S$ iff for every $\epsilon \in \mathbb{R}^+$ there exists a $\delta \in \mathbb{R}^+$ s.t. for every $x, z \in S$ for which $d(z,x) < \delta$, it holds that $e( f(z), f(x) ) < \epsilon$.

If $(X,d)$ is a metric space with the nearest-point property and $f$ is continuous, then $f$ is also uniformly continuous on every bounded subset of $X$. A good example might be a polynomial on $\mathbb{R}$.

Uniformly continuous functions map compact metric spaces into compact metric spaces. They preserve total boundedness and Cauchy sequences. This isn’t necessarily true for continuous functions, e.g. $x \mapsto 1/x$ on $(0,1]$ does not preserve the Cauchy property of the sequence $(1/n)$.

There is a remarkable relationship between the Cantor Set and uniform continuity. Consider a nonempty metric space $(X,d)$. Then $X$ is totally bounded iff there exists a bijective uniformly continuous function from a subset of the Cantor Set to $X$. As Ó Searcóid notes, this means that totally bounded metric spaces are quite small, in the sense that none can have cardinality greater than that of the reals.

Consider metric spaces $(X,d)$ and $(Y,e)$ and function $f: X \to Y$. The function is called Lipschitz with Lipschitz constant $k \in \mathbb{R}^+$ iff $e( f(a), f(b) ) \leq k d(a,b)$ for all $a, b \in X$.

Note here the difference to uniform continuity: Lipschitz continuity restricts uniform continuity by describing a relationship that must exist between the $\epsilon$s and $\delta$s – uniform leaves this open. A nice example from Ó Searcóid of a uniformly continuous non-Lipschitz function is $x \mapsto \sqrt{1 - x^2}$ on $[0,1)$.

Lipschitz functions preserve boundedness, and the Lipschitz property is preserved by function composition.

There is a relationship between Lipschitz functions on the reals and their differentials. Let $I$ be a non-degenerate intervals of $\mathbb{R}$ and $f: I \to \mathbb{R}$. Then $f$ is Lipschitz on $I$ iff $f'$ is bounded on $I$.

A function with Lipschitz constant less than one is called a strong contraction.

Unlike the case for continuity, not every product metric gives rise to uniformly continuous natural projections, but this does hold for conserving metrics.

10. Completeness

Let $(X,d)$ be a metric space and $u : X \to \mathbb{R}$. The function $u$ is called a virtual point iff:

• $u(a) - u(b) \leq d(a,b) \leq u(a) + u(b)$ for all $a,b \in X$
• $\text{inf} \; u(X) = 0$
• $0 \notin u(X)$

We saw earlier that a metric space $X$ is complete iff it is closed in every metric superspace of $X$. There are a number of equivalent characterisations, including that every Cauchy sequence in $X$ converses in $X$.

Consider a metric space $(X,d)$. A subset of $S \subseteq X$ is a complete subset of $X$ iff $(S,d)$ is a complete metric space.

If $X$ is a complete metric space and $S \subseteq X$, then $S$ is complete iff $S$ is closed in $X$.

Conserving metrics ensure that finite products of complete metric spaces are complete.

A non-empty metric space $(X,d)$ is complete iff $(\mathcal{S},h)$ is complete, where $\mathcal{S}(X)$ denotes the collection of all non-empty closed bounded subsets of $X$ and $h$ denotes the Hausdorff metric.

For $X$ a non-empty set and $(Y,e)$ a metric space, the metric space $B(X,Y)$ of bounded functions from $X$ to $Y$ with the supremum metric is a complete metric space iff $Y$ is complete. An example is that the space of bounded sequences in $\mathbb{R}$ is complete due to completeness of $\mathbb{R}$.

We can extend uniformly continuous functions from dense subsets to complete spaces to unique uniformly continuous functions from the whole: Consider metric spaces $(X,d)$ and $(Y,e)$ with the latter being complete. Let $S \subseteq X$ be a dense subset of $X$ and $f : S \to Y$ be a uniformly continuous function. Then there exists a uniformly continuous function $\tilde{f} : X \to Y$ such that $\tilde{f}|_S = f$. There are no other continuous extensions of $f$ to $X$.

(Banach’s Fixed-Point Theorem). Let $(X,d)$ be a non-empty complete metric space and $f : X \to X$ be a strong contraction on $X$ with Lipschitz constant $k \in (0,1)$. Then $f$ has a unique fixed point in $X$ and, for each $w \in X$, the sequence $(f^n(w))$ converges to the fixed point. Beautiful examples of this abound, of course. Ó Searcóid discusses IFS fractals – computer scientists will be familiar with applications in the semantics of programming languages.

A metric space $(Y,e)$ is called a completion of metric space $(X,d)$ iff $(Y,e)$ is complete and $(X,d)$ is isometric to a dense subspace of $(Y,e)$.

We can complete any metric space. Let $(X,d)$ be a metric space. Define $\tilde{X} = \delta(X) \cup \text{vp}(X)$ where $\delta(X)$ denotes the set of all point functions in $X$ and $\text{vp}(X)$ denotes the set of all virtual points in $X$. We can endow $\tilde{X}$ with the metric $s$ given by $(u,v) \mapsto \sup\{ |u(x) - v(x)| | x \in X \}$. Then $\tilde{X}$ is a completion of $X$.

Here the subspace $(\delta(X),s)$ of $(\tilde{X},s)$ forms the subspace isometric to $(X,d)$.

11. Connectedness

A metric space $X$ is a connected metric space iff $X$ cannot be expressed as the union of two disjoint nonempty open subsets of itself. An example is $\mathbb{R}$ with its usual metric. As usual, Ó Searcóid gives a number of equivalent criteria:

• Every proper nonempty subset of $X$ has nonempty boundary in $X$
• No proper nonempty subset of $X$ is both open and closed in $X$
• $X$ is not the union of two disjoint nonempty closed subsets of itself
• Either $X = \emptyset$ or the only continuous functions from $X$ to the discrete space $\{0,1\}$ are the two constant functions

Connectedness is not a property that is relative to any metric superspace. In particular, if $X$ is a metric space, $Z$ is a metric subspace of $X$ and $S \subseteq Z$, then the subspace $S$ of $Z$ is a connected metric space iff the subspace $S$ of $X$ is a connected metric space. Moreover, for a connected subspace $X$ of $X$ with $S \subseteq A \subseteq \bar{S}$, the subspace $A$ is connected. In particular, $\bar{S}$ itself is connected.

Every continuous image of a connected metric space is connected. In particular, for nonempty $S \subseteq \mathbb{R}$, $S$ is connected iff $S$ is an interval. This is a generalisation of the Intermediate Value Theorem (to see this, consider the continuous functions $f : X \to \mathbb{R})$.

Finite products of connected subsets endowed with a product metric are connected. Unions of chained collections (i.e. sequences of subsets whose sequence neighbours are non-disjoint) of connected subsets are themselves connected.

A connected component $U$ of a metric space $X$ is a subset that is connected and which has no proper superset that is also connected – a kind of maximal connected subset. It turns out that the connected components of a metric space $X$ are mutually disjoint, all closed in $X$, and $X$ is the union of its connected components.

A path in metric space $X$ is a continuous function $f : [0, 1] \to X$. (These functions turn out to be uniformly continuous.) This definition allows us to consider a stronger notion of connectedness: a metric space $X$ is pathwise connected iff for each $a, b \in X$ there is a path in $X$ with endpoints $a$ and $b$. An example given by Ó Searcóid of a space that is connected but not pathwise connected is the closure in $\mathbb{R}^2$ of $\Gamma = \{ (x, \sin (1/x) | x \in \mathbb{R}^+ \}$. From one of the results above, $\bar{\Gamma}$ is connected because $\Gamma$ is connected. But there is no path from, say, $(0,0)$ (which nevertheless is in $\bar{\Gamma}$) to any point in $\Gamma$.

Every continuous image of a pathwise connected metric space is itself pathwise connected.

For a linear space, an even stronger notion of connectedness is polygonal connectedness. For a linear space $X$ with subset $S$ and $a, b \in S$, a polygonal connection from $a$ to $b$ in $X$ is an $n$-tuple of points $(c_1, \ldots c_n)$ s.t. $c_1 = a$, $c_n = b$ and for each $i \in \{1, 2, \ldots, n-1\}$, $\{(1 - t)c_i + t c_{i+1} | t \in [0,1] \} \subseteq S$. We then say a space is polygonally connected iff there exists a polygonal connection between every two points in the space. Ó Searcóid gives the example of $\{ z \in \mathbb{C} | \; |z|= 1 \}$ as a pathwise connected but not polygonally connected subset of $\mathbb{C}$.

Although in general these three notions of connectedness are distinct, they coincide for open connected subsets of normed linear spaces.

12. Compactness

Ó Searcóid gives a number of equivalent characterisations of compact non-empty metric spaces $X$, some of the ones I found most interesting and useful for the following material include:

• Every open cover for $X$ has a finite subcover
• $X$ is complete and totally bounded
• $X$ is a continuous image of the Cantor set
• Every real continuous function defined on $X$ is bounded and attains its bounds

The example is given of closed bounded intervals of $\mathbb{R}$ as archetypal compact sets. An interesting observation is given that ‘most’ metric spaces cannot be extended to compact metric spaces, simply because there aren’t many compact metric spaces — as noted above in the section on bounds, there are certainly no more than $|\mathbb{R}|$, given they’re all images of the Cantor set.

If $X$ is a compact metric space and $S \subseteq X$ then $S$ is compact iff $S$ is closed in $X$. This follows because $S$ inherits total boundedness from $X$, and completeness follows also if $S$ is closed.

The Inverse Function Theorem states that for $X$ and $Y$ metric spaces with $X$ compact, and for $f : X \to Y$ injective and continuous, $f^{-1}: f(X) \to X$ is uniformly continuous.

Compactness plays well with intersections, finite unions, and finite products endowed with a product metric. The latter is interesting, given that we noted above that for non conserving product metrics, total boundedness doesn’t necessarily carry forward.

Things get trickier when dealing with infinite-dimension spaces. The following statement of the Arzelà-Ascoli Theorem is given, which allows us to characterise the compactness of a closed, bounded subset of $\mathcal{C}(X,Y)$ for compact metric spaces $X$ and $Y$:

For each $x \in X$, define $\hat{x}: S \to Y$ by $\hat{x}(f) = f(x)$ for each $f \in S$. Let $\hat{X} = \{\hat{x} | x \in X \}$. Then:

• $\hat{X} \subseteq B(S,Y)$ and
• $S$ is compact iff $x \to \hat{x}$ from $X$ to $B(S,Y)$ is continuous

13. Equivalence

Consider a set $X$ and the various metrics we can equip it with. We can define a partial order $\succeq$ on these metrics in the following way. $d$ is topologically stronger than $e$, $d \succeq e$ iff every open subset of $(X,e)$ is open in $(X,d)$. We then get an induced notion of topological equivalence of two metrics, when $d \succeq e$ and $e \succeq d$.

As well as obviously admitting the same open subsets, topologically equivalent metrics admit the same closed subsets, dense subsets, compact subsets, connected subsets, convergent sequences, limits, and continuous functions to/from that set.

It turns out that two metrics are topologically equivalent iff the identity functions from $(X,d)$ to $(X,e)$ and vice versa are both continuous. Following the discussion above relating to continuity, this hints at potentially stronger notions of comparability – and hence of equivalence – of metrics, which indeed exist. In particular $d$ is uniformly stronger than $e$ iff the identify function from $(X,d)$ to $(X,e)$ is uniformly continuous. Also, $d$ is Lipschitz stronger than $e$ iff the identity function from $(X,d)$ to $(X,e)$ is Lipschitz.

The stronger notion of a uniformly equivalent metric is important because these metrics additionally admit the same Cauchy sequences, totally bounded subsets and complete subsets.

Lipschitz equivalence is even stronger, additionally providing the same bounded subsets and subsets with the nearest-point property.

The various notions of equivalence discussed here collapse to a single one when dealing with norms. For a linear space $X$, two norms on $X$ are topologically equivalent iff they are Lipschitz equivalent, so we can just refer to norms as being equivalent. All norms on finite-dimensional linear spaces are equivalent.

Finally, some notes on the more general idea of equivalent metric spaces (rather than equivalent metrics.) Again, these are provided in three flavours:

• topologically equivalent metric spaces $(X,d)$ and $(Y,e)$ are those for which there exists a continuous bijection with continuous inverse (a homeomorphism) from $X$ to $Y$.
• for uniformly equivalent metric spaces, we strengthen the requirement to uniform continuity
• for Lipschitz equivalent metric spaces, we strengthen the requirement to Lipschitz continuity
• strongest of all, isometries are discussed above

Note that given the definitions above, the metric space $(X,d)$ is equivalent to the metric space $(X,e)$ if $d$ and $e$ are equivalent, but the converse is not necessarily true. For equivalent metric spaces, we require existence of a function — for equivalent metrics this is required to be the identity.

ResearchED on Curriculum

A colleague recently pointed me to the ResearchED Guide to the Curriculum, a volume of essays edited by Clare Sealy. Following the guidance of Lemov and Badillo‘s essay in this volume that ‘reading a book should be a writing-intensive experience’, I’ve written down some of my thoughts after reading this book. They come from the perspective of someone who teaches (albeit in higher education rather than in a school) and is a researcher (but not in education). Of course, my background undoubtedly skews my perspective and limits my awareness of much educational theory.

The context here is that schools in England have been very busy over the last couple of years rethinking their curriculum, not least because the new Ofsted school inspection framework places it centre-stage. So now is a good moment to engage with schools over some of the more tricky questions involved.

I found this collection of essays very thought provoking, and would recommend engaging, whether or not you consider yourself to a fan of the “curriculum revolution” underway in English schools.

Knowledge and Teaching

Many of the contributions relate to a knowledge-based curriculum, but none give a working definition of knowledge. I think it’s useful for educators to reflect on, and leaders to engage with, epistemology at some level. When ideas like a “knowledge-based curriculum” start to be prioritised, then we need to understand what various meanings this might have, and precisely to what other ideas these terms may be being used in opposition. Of course this becomes even more important when politics enters the picture: I find it hard to envisage a definition of a knowledge-based curriculum that is broad enough to encompass both Michael Gove’s approach to knowledge in history and Michael F.D. Young‘s principles espoused in this book. A central problem in the theory of knowledge, of course, is the theory of truth; it’s interesting that Ashbee‘s essay in the same volume riffs on this theory when asking whether it is right that students should learn something false (the presence of electron shells in an atom) in order to facilitate understanding later on. Again, I think this could do with a more sophisticated analysis of truth and falsity here – it is by no means universally accepted that the presence of electron shells can be said to be ‘false’, and I do think the philosophical standpoint on such questions has implications for curriculum design – especially in the sciences.

The same holds for teaching. The role of teaching in imparting knowledge needs to be fully explored. Even if we accept the premise that, to quote Young, ‘schools in a democracy should all be working towards access to powerful knowledge for all their pupils’ (and also leave to one side the definition of a democracy) this leaves open the question of the role of the teacher in providing that access. At one extreme seems to lie the Gradgrindian approach best summarised by Dickens in Hard Times of students as ‘little vessels then and there arranged in order, ready to have imperial gallons of facts poured into them until them were full to the brim’, at the other an unschooling approach. But both can legitimately claim to be pursuing this aim. In the middle of these extremes, the teacher’s role in setting up experiences, and in developing understanding for example through Adey and Shayer’s concept of ‘cognitive conflict’ could explored more deeply.

It’s interesting that in his essay in this book, Young – described by Sealy as one of the ‘godfathers’ of the knowledge-based curriculum – has plenty to say about problematic ways this concept has been interpreted, in particular that “a school adopting a knowledge-led curriculum can spend too much time on testing whether students have memorised the knowledge of previous years“, and that “a focus on memorisation does not necessarily encourage students to develop a ‘relationship to knowledge’ that leads to new questions.” These concerns echo my own fears, and I see the latter also arise in higher education as students make the leap between undergraduate and postgraduate work.

My own teaching as an academic has spanned the full range from largely chalk-and-talk unidirectional presentations to undergraduate students to fairly laissez-faire mentoring of PhD students through their own discovery of the background material required for their research. It’s interesting to reflect on the different level of resourcing required to follow these models, a topic Young and Aurora both pick up in their essays: the need for a curriculum model that incorporates how teachers might engage with external constraints (resource, externally imposed exam syllabuses, etc.) in the short-term, even as we work towards a better long-term future for our students.

Memorisation is mentioned by several authors, and of course can be important, but – as Young says – it’s also important that students come to view it as “a step to acquiring new knowledge“, not as acquiring new knowledge. So my question to schools is this: how is that desirable outcome student perception reflected in your curriculum? How does your curriculum help develop that view by students?

My concerns over some of the ‘knowledge-based’ or ‘knowledge-led’ work in schools in recent years is broadly in line with Young’s view in this volume that teaching viewed as the transmission of knowledge excludes the process by which students develop a relationship with knowledge (my emphasis). I was also pleased by Young’s assertion that schools should treat subjects not just as bodies of knowledge but as communities of teachers and researchers and pupils as neophyte members of such communities. To me, this is wholly consistent with the exciting ideas behind Claxton’s Building Learning Power framework I reviewed some years ago here.

What Do We Want Students to Be Able to Do?

In addition to more traditional answers, Ashbee suggests some that should make schools pause for thought. For example, she suggests that while others may have chosen the curriculum content, students should be equipped to critically evaluate the inclusion of the knowledge in the curriculum they have studied and to ask what else could have been included. I like this idea: a key question for schools, though, is where do our curricula equip students for this task?

One aspect largely absent from this volume is a critical discussion of assessment, including testing, and its role in the curriculum, both in obvious terms of shaping the curriculum in the long- and short-term, and the – perhaps less obvious – nature of forms of assessments in themselves driving students’ behaviour and understanding of the nature of learning.

Planning Lessons

In an essay on Curriculum Coherence, Neil Almond discusses the role of sequencing in a subject curriculum. Almond uses the analogy of a box set, contrasting The Simpsons (minimal ordering required) to Game of Thrones (significant ordering required.)

Three aspects of the timing of lessons are, I think, missing in this discussion and deserve more explicit consideration.

Firstly, any necessary order of discussion of topics is rarely a total (linear) order, to put it mathematically. It’s maybe not even a partial order. Explicit dependencies between topic areas have been explored in depth by Cambridge Mathematics, who have built a directed graph representation of a curriculum. The lack of totality of the order provides significant freedom in practice; it is less clear what best practice might be, as a department or teacher, of how to take advantage of this freedom. It’s also less clear when to take advantage of this freedom: should this be a department-level once-and-for-all decision, one delegated to teachers to determine on the fly, or something in between? And why?

Secondly, even once this freedom has been exploited by mapping the dependence structure of concepts into a total order, there still remains the question of mapping from that order into time. Again, there is flexibility: should this unit take one week or two, or a whole term, and – importantly – curricula need to consider who should exercise this flexibility, when and why. Within mathematics, this has come under a lot of scrutiny in recent years, through discussions around the various definitions of mastery.

Finally, one aspect that the box set analogy obscures is the extent to which the sequencing of lessons is to be co-created with the students. Simpsons and Game of Thrones writers don’t have the option to co-create sequencing on the fly with their audience – schools and universities do. To what extent should this freedom be utilised, by whom, when, and to what end?

Linking Universities with Schools and Secondaries with Primaries

Ashbee discusses the very interesting question of how school curricula can engage with the mechanisms for knowledge generation in the broader discipline. For example, experiment, peer review, art exhibitions, all help reflect the norms of the discipline’s cutting edge back into the school curriculum. This is why I was sad, recently, to see Ofqual consulting on the removal of student-conducted experiments from 2021 GCSE science examinations, to give teachers time to cram in more facts: ‘science’ without experiment is not science.

Since one of the key venues for knowledge generation is the academy, increasing interaction between schools and universities should be very productive at the moment with increased school thinking about curriculum fundamentals. I am pleased to have played a very small part in the engagement of my university, both through my own outreach and through discussions leading up to our recent announcement of the opening of a new Imperial Maths School. More of all this, please, universities!

The theme of linking phases of education also appears in Andrew Percival‘s case study of primary curriculum development, where he emphasises the benefit his primary school obtained through teachers joining subject associations, e.g. in Design and Technology, making links with secondary specialists, and introducing self-directed study time for primary teaching staff to develop their subject knowledge. Those of us in all sectors should seek out links through joint professional associations.

Christine Counsell‘s essay tackles the topic of how school senior leadership teams (SLTs) should engage with developing and monitoring their departments under a knowledge-based curriculum. The main issue here is in the secondary sector, as SLTs will not include all subject specialisms. My colleague probably had this essay in mind when pointing out this edited volume, as many of the lessons and ideas here apply equally well to school governors engaging with subject leaders. I would agree with this. But actually, I would go further and say that many of Counsell’s suggestions for SLTs actually echo previous best practice in governance from before the new national curriculum. In meetings between subject leaders and school governors, governors have always played the role of knowledgable outsider, whose aim is to guide a subject leader in conversation to reflect on their role in the development of the teaching of their subject and its norms. It’s quite interesting to see this convergence. I was also struck by Counsell’s insistence on the importance of discussing curriculum content with middle leaders rather than relying on proxies such as attainment results alone, which can actually act to conceal the curriculum; in my day job this mirrors the importance we try to give in staff appraisal to discussing the research discoveries of academic staff, not focusing on the number or venue of publications. I think many of the arguments made are transferrable between schools and universities. Counsell also identifies positive and negative roles played by SLTs in developing a positive culture in middle leadership, and hence provides some useful material around which governance questions can be posed to probe SLT themselves.

Award of GCSEs and A-Levels in 2020

Readers of my blog based in England may know that due to COVID-19, GCSEs (typically taken at age 16) and A-Levels (age 18) are not going ahead as exams this year. Yesterday, the Office for Qualifications and Examinations Regulation (Ofqual) published a consultation on the methods to be used to ensure fairness in the award of these important qualifications. I intend to respond to this consultation, which is only open for two weeks, and have produced a draft response below. Before I submit it, I would welcome any feedback. Equally, others should feel free to borrow from my response if it helps them.

To what extent do you agree or disagree that we should incorporate the requirement for exam boards to collect information from centres on centre assessment grades and their student rank order, in line with our published information document, into our exceptional regulatory requirements for this year?

Agree

To what extent do you agree or disagree that exam boards should only accept centre assessment grades and student rank orders from a centre when the Head of Centre or their nominated deputy has made a declaration as to their accuracy and integrity?

Strongly Agree

To what extent do you agree or disagree that Heads of Centre should not need to make a specific declaration in relation to Equalities Law?

Disagree

To what extent do you agree or disagree that students in year 10 and below who had been entered to complete exams this summer should be issued results on the same basis as students in year 11 and above?

Strongly Agree

To what extent do you agree or disagree that inappropriate disclosure of centre assessment judgements or rank order information should be investigated by exam boards as potential malpractice?

Neither Agree not Disagree

1. While a separate Equalities Law declaration is not necessary, the Head of Centre should be able to declare that they have taken equality law into consideration as part of their declaration.
2. Ofqual should liaise with the National Governance Association and with teaching unions to provide guidance to governing bodies and staff on appropriate challenge and support to schools in order to ensure processes underlying Head of Centre declaration are appropriately evidenced.
3. While I understand and support the motivation for labelling inappropriate disclosure of centre assessments as malpractice, care must be taken and guidance given to centres over what is deemed “inappropriate”. I would not want to be in the situation where a teacher is unable to calm a student in a way they normally would, for example by telling them that “I can’t see any way you won’t get a Grade 7”. There may be an equalities implication for those students suffering from extreme anxiety, and this should be considered when drawing up guidance for centres.
4. While I accept that there is little time to provide detailed guidance for centres to follow when drawing up rank-order lists, the publication of examples of good practice may help centres, and I would recommend this is considered.

Issuing Results

To what extent do you agree or disagree that we should incorporate into the regulatory framework a requirement for all exam boards to issue results in the same way this summer, in accordance with the approach we will finalise after this consultation, and not by any other means?

Strongly Agree

Do you have any comments about our proposal for the issuing of results?

None

Impact on Students

To what extent do you agree or disagree that we should only allow exam boards to issue results for private candidates for whom a Head of Centre considers that centre assessment grades and a place in a rank order can properly be submitted?

Agree

To what extent do you agree or disagree that the arrangements we put in place to secure the issue of results this summer should extend to students in the rest of the UK?

Strongly agree

To what extent do you agree or disagree that the arrangements we put in place to secure the issue of results this summer should extend to all students, wherever they are taking the qualifications?

Neither agree nor disagree

Do you have any comments about the impact of our proposals on any particular groups of students?

1. Unfortunately, I see no other option than that proposed for private candidates. However, I am concerned that the definition of “properly” in the criterion given is made much more explicit and in objective terms to the heads of centres.
2. I suggest legal advice is sought over the enforceability of arrangements within centres outside the UK, in particular over the implications of breach of a head of centre’s declaration before proceeding with treating them the same as those within the UK.
3. I am concerned over the impact of the proposed arrangements for some groups of students who may be differentially affected by the change in routine due to lockdown, e.g. those with Autistic Spectrum Conditions (ASC). In order to be as fair as possible to these students, I suggest that explicit guidance be given to centres emphasising that centres are free to disregard any dip in attainment since lockdown when coming up with their rank-order list, and again emphasising their duties under equalities legislation.

Statistical standardisation of centre assessment grades

To what extent do you agree or disagree with the aims outlined above?

Agree

To what extent do you agree or disagree that using an approach to statistical standardisation which emphasises historical evidence of centre performance given the prior attainment of students is likely to be fairest for all students?

Agree

To what extent do you agree or disagree that the trajectory of centres’ results should NOT be included in the statistical standardisation process?

Agree

To what extent do you agree or disagree that the individual rank orders provided by centres should NOT be modified to account for bias regarding different students according to their particular protected characteristics or their socio-economic backgrounds?

Agree

To what extent do you agree or disagree that we should incorporate the standardisation approach into our regulatory framework?

Agree

1. I am unclear from the consultation on whether standardisation is to occur on an exam-board basis or across exam boards. If it is on an exam-board basis, it is not clear what will happen when centres have changed exam board over the time window used to judge prior grade distribution at the school, especially if the change is for the first time this year.
2. I have several statistical concerns over the proposed methodology, given the level of detail discussed so far. In particular,
(i) there is a recognition that small centres or small cohorts will be difficult to deal with – this is a significant issue, and may be exacerbated depending on the definition of cohort (see #3, below), leading to significant statistical uncertainty;
(ii) it is hugely important to avoid 2020 results being affected by outlier results in previous years. One possibility is to use median results from the previous three years – I would avoid using mean results or a single year’s results.
Given these concerns, my view is that it would be more appropriate to award a “grade range” to students (e.g. “9-7”, which may of course include degenerate ranges like just “7”). This allows statistical uncertainty arising from the various measures integrated into the standardisation algorithm to be explicitly quantified and provide a transparent per-pupil result. It would allow universities and sixth-forms to decide for themselves whether to admit a pupil optimistically, pessimistically or on the basis of the interval midpoint.
3. It is unclear from the consultation whether the past grade distributions used will be on a per-subject basis. If not, this is likely to violate proposed Aim 1 of the standardisation process. However, if so, this is likely to result in some very small cohorts for optional subjects at particular centres, so extreme statistical care must be taken in using these cohorts as the basis for grading in 2020. A possible solution us to produce grade ranges, as above.
4. From a statistical perspective, estimation of grade distributions at a per-centre level (rather than estimation of mean grade, for example) is fraught with danger and highly sensitive to cohort size. It is very important that you do not consider the empirical frequency distribution of grades in a centre over the last 1,2 or 3 years as the underlying probability distribution but rather as a sample from the latter, using an appropriate statistical method to estimate the distribution from the sample. Such methods would also allow the incorporation of notions of variance, which could be factored into the “grade ranges” for students, explained in #2. As an extreme example: if a centre had no Grade 6’s last year, only 5’s and 7’s, we should not bias our model to no Grade 6’s this year, surely.
5. There is an additional option for standardisation, not considered in the consultation document, which is less subject to the statistical problems of distribution estimation. You could extract just one or two parameters from your model (e.g. desired mean, desired standard deviation) and use these to normalise the distribution from each centre, rather than fit the complete distributions. Such aggregate statistics will be less susceptible to variation, especially for smaller cohorts.
6. I am unclear how it is possible to award grades to students at centres without any historical outcomes and with no prior attainment data or prior attainment data covering a statistically-insignificant portion of the cohort. For these centres, some form of moderation or relying on Autumn term exam results may be required.
7. I am concerned by the statement in the consultation that “we will evaluate the optimal span of historical centre outcomes (one, 2 or 3 years). We will select the approach that is likely to be the most accurate in standardising students’ grades.” There is no discussion of how “most accurate” can be judged; there is no data upon which to make this decision, so I would urge caution and an outlier-rejection strategy (see #2 above).
8. While I broadly agree that there is insufficient data upon which to base rank-order modification based on protected characteristics or socio-economic backgrounds, of the three approaches discussed in the consultation document, the “second approach” is currently very vague and needs further refinement before I can offer an opinion on it. I am happy to be contacted for further comment on this in the future.
9. I am concerned by the absence of a mechanism to flag unusual rank order differences between subjects in a centre. It should be possible to identify, for example, pupils ranked very high in Subject A and very low in Subject B compared to the typical centile differences in rankings between these subjects, for further investigation by the exam boards. The sensitivity of such a test could be set an appropriate level to the amount of staff time available to investigate.

Appealing the results

To what extent do you agree or disagree that we should not provide for a review or appeals process premised on scrutiny of the professional judgements on which a centre’s assessment grades are determined?

Agree

To what extent do you agree or disagree that we should not provide for a student to challenge their position in a centre’s rank order?

Agree

To what extent do you agree or disagree that we should not provide for an appeal in respect of the process or procedure used by a centre?

Strongly disagree

To what extent do you agree or disagree that we should provide for a centre to appeal to an exam board on the grounds that the exam board used the wrong data when calculating a grade, and/or incorrectly allocated or communicated the grades calculated?

Strongly Agree

To what extent do you agree or disagree that for results issued this summer, exam boards should only consider appeals submitted by centres and not those submitted by individual students?

Strongly disagree

To what extent do you agree or disagree that we should not require an exam board to ensure consent has been obtained from all students who might be affected by the outcome of an appeal before that appeal is considered?

Agree

To what extent do you agree or disagree that exam boards should not put down grades of other students as a result of an appeal submitted on behalf of another student?

Strongly agree

To what extent do you agree or disagree that exam boards should be permitted to ask persons who were involved in the calculation of results to be involved in the evaluation of appeals in relation to those results?

Disagree

To what extent do you agree or disagree that exam boards should be able to run a simplified appeals process?

Neither agree nor disagree

To what extent do you agree or disagree that we should not provide for appeals in respect of the operation or outcome of the statistical standardisation model?

Strongly agree

To what extent do you agree or disagree with our proposal to make the Exam Procedures Review Service (EPRS) available to centres for results issued this summer?

Strongly agree

1. I disagree with the absence of an appeal procedure against centre procedure. While recognising the difficulties faced by centres and the exceptional circumstances, there is an element of natural justice that must be maintained. Without such an appeal process, there is no safeguard against centres using completely inappropriate mechanisms to derive grade and rank orders, beyond the signed statement from the head of centre. While the consultation suggests that detailed guidance will not be sent to centres on the procedures they should follow, it is reasonable to expect a centre – if challenged by a sufficient number of candidates – to explain the procedure they did follow, and for an appeal body to find this to be reasonable or unreasonable in the circumstances. The outcome of any successful appeal may have to be the cancelling of all grades in a certain subject at a certain centre, requiring a fall-back to the Autumn 2020 exams, but the mere existence of such a mechanism may help focus centres on ensuring justifiable procedures are in place.
2. The consultation document leaves open the question of what role staff of exam boards who were involved in the calculation of results would have in appeals. It appears proper for them to be involved in providing evidence to an independent appeals committee, but not to form such a committee.

An Autumn exam series

To what extent do you agree or disagree that entries to the autumn series should be limited to those who were entered for the summer series, or those who the exam board believes have made a compelling case about their intention to have entered for the summer series (as well as to students who would normally be permitted to take GCSEs in English language and mathematics in November)?

Agree

To which qualifications the emergency regulations will apply

To what extent do you agree or disagree that we should apply the same provisions as GCSE, AS and A level qualifications to all Extended Project Qualifications and to the Advanced Extension Award qualification?

Strongly agree

Do you have any comments about the qualifications to which the exceptional regulatory measures will apply?

None

Building the arrangements into our regulatory framework

To what extent do you agree or disagree that we should confirm that exam boards will not be permitted to offer opportunities for students to take exams in May and June 2020?

Disagree

To what extent do you agree or disagree with our proposals that exam boards will not be permitted to offer exams for the AEA qualification or to moderate Extended Project Qualifications this summer?

Disagree

Do you have any comments about our proposals for building our arrangements into our regulatory framework?

I have sympathy with the proposals in this section, but they need to be balanced against the harm done to those candidates who will be unable to use centre-based assessments and against Ofqual’s duties under the Equalities legislation, given that this may disproportionately affect disabled students (see pp.51-52 of the consultation document.) On balance, it may be better to leave this as an operational decision between exam boards and exam centres to allow exams in May and June, if possible, only for these students.

Equality impact assessment

Are there other potential equality impacts that we have not explored? What are they?

As previously noted, I am concerned over the impact of the proposed arrangements for some groups of students who may be differentially affected by the change in routine due to lockdown, e.g. those with Autistic Spectrum Conditions (ASC). In order to be as fair as possible to these students, I suggest that explicit guidance be given to centres emphasising that centres are free to disregard any dip in attainment since lockdown when coming up with their rank-order list, and again emphasising their duties under equalities legislation.

We would welcome your views on how any potential negative impacts on particular groups of students could be mitigated:

If Ofqual were to adopt a “grade range” approach, outlined above, then the quoted research into the reliability of predicted GCSE, AS and A-levels prior to 2015 could be used to inform the degree of uncertainty in the range, mitigating the impact on particular groups of students.

The Growth Mindset

Over the last 5-10 years, the Growth Mindset has become a very popular feature of many schools across England. I have seen it implemented in a couple of schools, and I’m also aware that its initiator, Carol Dweck, gave an interview a couple of years ago where she criticised some implementations as “false growth mindset”.

In order to learn a bit more about the original research conducted by Dweck, I decided over the holiday to read her early book, ‘Self-theories: Their role in motivation, personality, and development’, Psychology Press, 1999. I have no background in psychology and a very limited background in educational theory, but I still want to know how much I can get from this as a parent, as an educator, and as a member of a school board.

As notes to myself, and for others who may be interested, I’m reporting the main take-away messages I got from the book in this post. I do not question the validity of any claims – I am not knowledgeable enough to do so – and I’m also very conscious that I have not had time to follow up the references to read the primary research literature. Instead, I cite below the chapters of the book in which the references can be found, should blog readers be interested in following up more deeply.

Two Theories of Intelligence

Dweck defines the seeking of challenge, the value of effort, and persistence in the face of obstacles as ‘mastery-oriented approaches’. She aims to knock down several ‘commonly held’ beliefs about what fosters such approaches: they are not more common in students with high ability, they are not necessarily improved by success in tasks, they are not improved by praise of students’ intelligence, and they are not even typically associated with students who have a high confidence in their intelligence. So what are the best approaches to fostering such qualities?

Dweck contrasts two theories of intelligence, which I’ve heard referred to in schools as “the fixed mindset” and “the growth mindset”. In the original research in this book, she refers to these as “The Theory of Fixed Intelligence” / “The Entity Theory” and “The Theory of Malleable Intelligence” / “The Incremental Theory”. In an experimental setting, failure is reported to motivate some students and demotivate others, in an apparently fairly bimodal distribution (Chapter 2).

To my mind, what’s missing from this discussion is a shared understanding of what intelligence actually is (Dweck picks this up much later in Chapter 9, on IQ tests). Intelligence, to me, describes the ability to learn and think – this seems to be a qualitative rather than a quantitative property. We could, of course, talk about speed or depth or some other quantification, and I’m aware that there’s a huge volume of work on this topic, about which I know little (any pointers for good books on this?) A principled definition of intelligence seems relevant because while I think nobody would say that a person’s knowledge is fixed, there is clearly a difference of opinion over the ability to gain such knowledge and skills – do people differ solely in the rate of development of knowledge / skills, or in the maximum level of knowledge / skills, or something else? And if there are such limits on the rate of change today for Person X, will those limits be different in the future for the same person? If the rate of change can change, can the rate of change of the rate of change change? And so, ad infinitum. And should we even care? Chapter 9 discusses pupils’ own views, with Dweck suggesting that entity theorists associate intelligence with inherent capacity or potential, while incremental theorists associate intelligence with knowledge, skills and effort. This actually surprised me – it seems that the perspective of the incremental theorists makes the very concept of intelligence – as distinct from knowledge, skills, and effort, superfluous. But it also seems to be somewhat inconsistent, because in Chapter 11 we learn that incremental theorists tend not to judge their classmates’ intelligence based on their performance in school. Perhaps the incremental theorists just have a hazier conception of intelligence in the first place?

What’s clear is that Dweck has no truck with those claiming that Growth Mindset means that “everyone can be an Einstein if you put in the effort” – it’s just that she strongly argues that potential cannot be readily measured based on current attainment – that there may well be undiscovered Einsteins in bottom set classes. These are not the same thing at all.

The Impact of Theories of Intelligence

Dweck then goes on to show that students’ theories of intelligence impact their choice of goals, with students holding the entity theory more likely to chose performance goals, given an option. She shows this to be a causal link, via appropriately designed experiments to temporarily alter students’ theories of intelligence.

Dweck shows that the goals given to students impact on whether they react with a “helpless” or a “mastery” response, even for the same task. Students given a “performance goal” are much more likely to produce a helpless response than those given a “learning goal”. Performance goals are fairly ubiquitous in the English education system, as individual target grades shared with pupils. I wonder whether her observation carries forward into this setting?

Dweck argues that pupils holding an entity model can sabotage their own attainment – withholding effort so that if they do poorly, they can blame their own lack of effort whereas if they do well, they feel validated in their innate intelligence (Chapter 6).

In Chapter 12, Dweck discusses pupils’ views of the belief in the potential to change and improve, and the impact of intelligence models on this belief – which plays out unsurprisingly. I’m more interested in similar beliefs held by teaching staff and how / whether they impact on their practice (does anyone know of any studies on this topic?)

One area where I found the book less precise is whether students can simultaneously be “more of an entity-theorist” in some subjects and “more of an incremental-theorist” in others. Often this was dealt with as if these were universal theories, but my limited experience suggests that students may, for example, hold largely incremental theories in sport while largely entity theories in maths. (Again, anyone know of studies on this topic?)

Changing Theories of Intelligence

So how do we change mindsets? One method Dweck refers to throughout, is to actually teach pupils about theories of intelligence. Another is to focus on the type of praise given: to emphasise an incremental model, praise successful strategies used on tasks they’ve clearly found challenging; quick correct answers should be responded to with apologies for wasting their time, and by setting more appropriate and challenging problems. This is subtly different advice to “praising only effort”, an approach I’ve seen some schools adopting when trying to apply the growth mindset. The best approach seems to be to ensure that challenge level is appropriate for each pupil, ensuring alignment between effort and outcome. Unfortunately, many primary schools in England are running in directly the opposite direction at the moment (see my blog post here); I do wonder what impact this is likely to have on the mindset of high-attaining pupils in the English education system.

In Chapter 15, Dweck looks at the kind of criticism and praise that reinforces these differing views. Criticism suggesting alternatives, e.g. “You’ve not quite done that completely. Maybe you should think of another way,” caused a reinforcement of incremental theories, whereas criticisms of the individual, e.g. “I’m disappointed in you”, tended to emphasise entity theories. More strikingly, Dweck argues strongly that positive praise targeted at inherent traits, e.g. “you’re smart!”, “you’re very good at this” or “I’m proud of you” can reinforce the entity theory, whereas praise such as “you’ve found a great way to do that – can you think of any other ways?” reinforces the incremental theory. While the former type of praise is definitely well received, and gives a temporary boost, Dweck argues that it sets pupils up for failure when they encounter difficulties and draw the inverse conclusion – “if I’ve not been successful, then I’m not smart, and you’re not proud of me”.

Finally, we only need to consider changing mindsets after mindsets are embedded. Dweck spends some space (Chapter 14) on arguing that the helpless-/mastery- dichotomy in responses is present even in 3.5-year-olds (where she associates this with a ‘theory of badness’ held by the children, rather than a ‘theory of intelligence’) so the mindset issue seems to be an issue for all phases of education.

Conclusions

Praise and Criticism. Students receive criticism and praise throughout their learning journey, and trying to change verbal feedback through training of staff is one thing to look at. However, it strikes me that one formalised arena for feedback, shared across parents, children and teachers, is in written “reports home”. I suspect it would be relatively easy to survey these reports for the type of language used, and compare this against the evidence Dweck presents on forms of criticism and praise. I’d be very interested in any schools that may have tried to survey or manage report language to align it with growth mindset principles. This also extends to grades: following Dweck’s results in Chapter 16 on “process praise”, it would seem far better to send home a report saying “worked on some great methods for X” rather than “Grade B”, or “could try alternative strategies for staying focussed” rather than “Grade C”.

Elective Remedial (Catch-up) Classes. Another interesting implication for schools and universities alike is the use of elective remedial classes. Several of Dweck’s studies seem to show that for those pupils who hold an entity theory of intelligence, it’s precisely those pupils who don’t need the remedial classes who are happy to attend them. Institutions should think about how to get around this problem.

School Transitions. There are implications for managing the transition from primary to secondary school, revealed by Dweck’s study of grade school to junior-high transition in the US; perhaps secondaries – jointly with primaries, even – could explicitly teach about theories of intelligence as part of the induction process, like the study at UC Berkeley reported in Chapter 5. I wonder whether any secondaries have tried this?

Mental Health. Mental health in educational settings is a hot topic at the moment. Given Dweck’s theories about self-esteem and its link to mindset, can recent work of schools and universities on mental health be improved by engaging with these ideas? For example, can mental health issues be avoided by trying to foster a growth mindset, and has any significant evidence been collected in this regard?

Grouping by attainment. I have seen many discussions of Growth Mindset that have suggested that grouping pupils by attainment runs counter to the principles outlined here. But interestingly, this is not what Dweck says (Chapter 17). She says that within the entity framework, this might be true, but attainment grouping within the incremental framework is not inherently problematic – it’s just an acknowledgement of fact. I would note that such groups are often referred to in education as “ability groups” rather than “attainment groups” – perhaps reflective of the entity theory. This issue potentially becomes even more acute when considering streaming and/or selective entry testing.

Gifted and Talented Programmes. There appear to be several implications for gifted and talented programmes (G&T) in schools (Dweck deals explicitly with this in Chapter 16, but does not draw out all the conclusions). Firstly, and essentially, we need to ensure all students are challenged, or they will not experience difficulty and effort; at the high-attaining end, this may or may not come from a G&T programme, depending on the pupil and the school approach to differentiation, but it cannot be absent. Secondly, perhaps the name G&T is problematic – Dweck herself says that “the term ‘gifted’ conjures up an entity theory,” and it’s not hard to imagine children in G&T programmes worrying more about losing G&T status than improving their knowledge and skills.

Teacher Mindsets. Although it would seem natural for teachers to have an incremental theory / growth mindset, my observations suggest this is not always the case. I wonder whether any schools have undertaken studies of their own teaching staff in this regard – this could be very interesting.

Beyond Intelligence

Chapter 10 shows that very similar observations apply to personal and social relationships, and Chapter 13 argues that theories of intelligence are also closely associated with the formation of stereotypes. Chapter 17 describes a link with self-esteem, and suggests that parents and teachers alike can model feeling good about effortful tasks, as a route to self-esteem within the incremental model. and that entity models are correlated with depression and anxiety (Chapter 7).

Overall, this book has given me plenty to think about as a parent, and a fair bit to think about as an educator too. I’d be really interested in hearing people’s suggestions for more reading on the topics above, especially if any of the studies I suggest above have already been done in the psychology or education literature.

Readers who enjoyed this post might be interested in my other educational posts.

Teaching Curriculum Beyond Year Group

I have lost track of the number of times that I’ve been told by parents of primary-age children in England that schools are claiming that they are “not allowed” to teach content beyond that set out for the child’s year group in the English National Curriculum, ever since the curriculum reforms in 2014.

This myth seems to be so embedded that I have heard it myself from numerous headteachers and teaching staff.

Instead of spending the time explaining the actual situation afresh each time I am asked, I have instead put it down as this brief explanatory blog post. I hope people find it helpful.

Firstly, different schools will have different policies. It may be school policy to do / not to do something with the curriculum, but this is determined by the school alone, acting in line with the statutory framework. For academies, the statutory framework is typically minimal. Maintained schools must follow the statutory National Curriculum, and – in practice – every academy I’ve come across also abides by these regulations.

Presumably, the myth started because the National Curriculum Programmes of Study [Maths, English] are set out as expectations by year group. However, the programmes very clearly state:

“Within each key stage, schools therefore have the flexibility to introduce content earlier or later than set out in the programme of study.”

“schools can introduce key stage content during an earlier key stage, if appropriate.”

(see Section “School Curriculum” in either the Maths or the English Programme of Study.)

This must be read in the context of the broader thrust of the programmes, which state:

The expectation is that the majority of pupils will move through the programmes of study at broadly the same pace. However, decisions about when to progress should always be based on the security of pupils’ understanding and their readiness to progress to the next stage. Pupils who grasp concepts rapidly should be challenged through being offered rich and sophisticated problems before any acceleration through new content. Those who are not sufficiently fluent with earlier material should consolidate their understanding, including through additional practice, before moving on.

So, put simply, schools can certainly teach children content above their year group. But only if they’re ready for it. Common sense, really.

If you really want to know more about my views on education, then please click on the “Education” link on this blog post to find related posts.

Structures in Arithmetic Teaching Tools

Readers of this blog will know that beyond my “day job”, I am interested in early mathematics education. Partly due to my outreach work with primary schools, I became aware of several tools that are used by primary (elementary) school teachers to help children grasp the structures present in arithmetic. The first of these, Cuisenaire Rods, have a long history and have recently come back in vogue in education. They consist of coloured plastic or wooden rods that can be physically manipulated by children. The second, usually known in this country as the Singapore Bar Model, is a form of drawing used to illustrate and guide the solution to “word problems”, including basic algebra. Through many discussions with my colleague, Charlotte Neale, I have come to appreciate the role these tools – and many other physical pieces of equipment, known in the education world as manipulatives – can play in helping children get to grips with arithmetic.

Cuisenaire and Bar Models have intrigued me, and I spent a considerable portion of my Easter holiday trying to nail down exactly what arithmetic formulae correspond to the juxtaposition of these concrete and pictorial representations. After many discussions with Charlotte, I’m pleased to say that we will be presenting our findings at the BSRLM Summer Conference on the 9th June in Swansea. Presenting at an education conference is a first for me, so I’m rather excited, and very much looking forward to finding out how the work is received.

In this post, I’ll give a brief overview of the main features of the approach we’ve taken from my (non educationalist!) perspective.

Firstly, to enable a formal study of these structures, we needed to formally define how such rods and diagrams are composed.

Cuisenaire Rods

These rods come in all multiples up to 10 of a single unit length, and are colour coded. To keep things simple, we’ve focused only on horizontal composition of rods (interpreted as addition) to form terms, as shown in an example below.

In early primary school, the main relationships being explored relating to horizontal composition are equality and inequality. For example, the figure below shows that black > red + purple, because of the overhanging top-right edge.

With this in mind, we can interpret any such sentence in Cuisenaire rods as an equivalent sentence in (first order) arithmetic. After having done so, we can easily prove mathematically that all such sentences are true. Expressibility and truth coincide for this Cuisenaire syntax! Note that this is very different to the usual abstract syntax for expressing number facts: although 4 = 2 + 1 is false, we can still write it down. This is one reason – we believe – they are so heavily used in early years education: truths are built through play. We only need to know syntactic rules for composition and we can make many interesting number sentences.

From an abstract algebraic perspective, closure and associativity of composition naturally arise, and so long as children are comfortable with conservation of length under translation, commutativity is also apparent. Additive inverses and identity are not so naturally expressed, resulting in an Abelian semigroup structure, which also carries over to our next tool, the bar model.

Bar Models

Our investigations suggest that bar models – example for $20 = x+2$ pictured below –  are rarely precisely defined in the literature, so one of our tasks was to come up with a precise definition of bar model syntax.

We have made the observation that there seem to be a variety of practices here. The most obvious one, for small numbers drawn on squared paper, is to retain the proportionality of Cuisenaire. These ‘proportional bar models’ (our term) inherit the same expressibility / truth relationship as Cuisenaire structures, of course, but now numerals can exceed 10 – at the cost of decimal numeration being a prerequisite for their use. However, proportionality precludes the presence of ‘unknowns’ – variables – which is where bar models are heavily used in the latter stages of primary schools and in some secondary schools.

At the other extreme, we could remove the semantic content of bar length, leaving only abutment and the alignment of the right-hand edges as denoting meaning – a type of bar model we refer to as a topological bar model’. These are very expressive – they correspond to Presburger arithmetic without induction. It now becomes possible to express false statements (e.g. the trivial one below, stating that 1 = 2).

As a result, we must be mathematically precise about valid rules of inference and axiom schemata for this type of model, for example the rule of inference below. Note that due to the inexpressibility of implication in the bar model, many more rules of inference are required than in a standard first-order treatment of arithmetic.

The topological bar model also opens the door to many different mistakes, arising when children apply geometric insight to a topological structure.

In practice, it seems that teachers in the classroom informally use some kind of mid-way point between these two syntaxes, which we call an order-preserving’ bar model: the aim is for relative sizes of values to be represented, ensuring that larger bars are interpreted as larger numbers. However, this approach is not compositional. Issues arising from this can be seen when trying to model, for example, $x + y = 3$. The positive integral solutions are either $x = 2, y = 1$ leading to $x > y$ or $x = 1, y =2$, leading to $y > x$.

Other Graphical Tools and Manipulatives

As part of our work, we identify certain missing elements from first-order arithmetic in the tools studied to date. It would be great if further work could be done to consider drawings and manipulatives that could help plug these gaps. They include:

• Multiplication in bar models. While we can understand $3x$, for example, as a shorthand for $x + x + x$, there is no way to express $x^2$
• Disjunction and negation. While placing two bar models side-by-side seems like a natural way of expressing conjunction, there is no natural way of expressing disjunction / negation. Perhaps a variation on Pierce’s notation could be of interest?
• We can consider variables in a bar model as implicitly existentially quantified. There is no way of expressing universal quantification.
• As noted above, these tools capture an Abelian semigroup structure. We’re aware of some manipulatives, such as Algebra Tiles, which aim to also capture additive inverses, though we’ve not explored these in any depth.
• We have only discussed one use of Cuisenaire rods – there are many others – as the recent ATM book by Ollerton, Williams and Gregg makes clear, many of which we feel could also benefit from analysis using our approach.
• There are also many more manipulatives than Cuisenaire, as Griffiths, Back and Gifford describe in detail in their book, and it would be of great interest to compare and contrast these from a formal perspective.
• At this stage, we have avoided introducing a monus into our algebra of bar models, but this is a natural next step when considering the algebraic structure of so-called comparative bar models.
• My colleague Dan Ghica alerted me to the computer game DragonBox Algebra 5+, which we can consider as a sophisticated form of virtual manipulative incorporating rules of inference. It would be very interesting to study similar virtual manipulatives in a classroom setting.

An Exciting Starting Point

Charlotte and I hope that attendees at the BSRLM conference – and readers of this blog – are as excited as we are about our idea of the potential for using the tools of mathematical logic and abstract algebra to understand more about early learning of arithmetic. We hope our work will stimulate some others to work with us to develop and broaden this research further.

Acknowledgement

I would like to acknowledge Dan Ghica for reading this blog post from a semanticist’s perspective before it went up, for reminding me about DragonBox, and for pointing out food for further thought. Any errors remain mine.

Primary Assessment

Readers of this blog will know that I have been critical of the Government’s assessment system for the new National Curriculum in England [1,2,3]. I therefore greet the Secretary of State’s recently launched consultation over the future of primary assessment with a cautious welcome, especially since it seems to follow well from the NAHT’s report on the topic.

What is Statutory Assessment for?

The consultation document states the aim of statutory assessment as follows:

Statutory assessment at primary school is about measuring school performance, holding schools to account for the work they do with their pupils and identifying where pupils require more support, so that this can be provided. Primary assessment should not be about putting pressure on children.

Firstly, let me lay my cards on the table: I do think that school “performance” deserves to be measured. My experiences with various schools suggests strongly that there are under-performing schools, which are in need of additional support to develop their educational practice. There is a subtle but telling difference between these perspectives, my own emphasising support while the Government’s emphasising accountability. While some notions of accountability in schools appear uncontroversial, the term has recently  become associated with high-stakes educational disruption rather than with improving outcomes for our children. We can definitely agree that primary assessment should not be about putting pressure on children; unfortunately, I don’t believe that the consultation proposals seriously address this question.

Consultation Questions

In this section, I focus on the questions in the Government’s consultation on which I have a strong opinion; these are by no means the only important questions.

Q2. The EYFSP currently provides an assessment as to whether a child is ‘emerging, expecting [sic] or exceeding’ the level of development in each ELG. Is this categorisation the right approach? Is it the right approach for children with SEND?

Clearly the answer here primarily depends on the use of these data. If the aim is to answer questions like “how well-aligned – on average – are children with the age-related expectations of the early-years curriculum at this school?” then this assessment scheme is perfectly reasonable. Nor does it need to be tuned for children with SEND who may have unusual profiles, because it’s not about individual pupils, nor indeed for high attaining children who may be accessing later years of the national curriculum during their reception years. But if it’s about understanding an individual learning profile, for example in order to judge pupil progress made later in the school, then any emerging / expected / exceeding judgement seems far too coarse. It groups together children who are “nearly expected” with those well below, and children who are “just above expected” with those working in line with the national curriculum objectives for half way up the primary school – or beyond.

Q3. What steps could we take to reduce the workload and time burden on those involved in administering the EYFSP?

Teacher workload is clearly a key issue. But if we are talking seriously about how to control the additional workload placed on teachers by statutory assessment, then this is an indication that our education system is in the wrong place: there should always be next to no additional workload! Assessment should be about driving learning – if it’s not doing that, it shouldn’t be happening; if it is doing that, then it should be happening anyway! So the key question we should be answering is: why has the statutory framework drifted so far from the need to support pupils’ learning, and how can we fix this?

Q5. Any form of progress measure requires a starting point. Do you agree that it is best to move to a baseline assessment in reception to cover the time a child is in primary school (reception to key stage 2)? If you agree, then please tell us what you think the key characteristics of a baseline assessment in reception should be. If you do not agree, then please explain why.

[… but earlier …]

For the data to be considered robust as a baseline for a progress measure, the assessment needs to be a reliable indicator of pupils’ attainment and strongly correlate with their attainment in statutory key stage 2 assessments in English reading, writing and mathematics.

I agree wholeheartedly with the statement regarding the requirements for a solid baseline progress measure. And yet we are being offered up the possibility of baselines based on the start of EYFS. There is no existing data on whether any such assessment strongly correlates with KS2 results (and there are good reasons to doubt it). If the government intends to move the progress baseline from KS1 down the school, then a good starting point for analysis would be the end of EYFS – we should already have data on this, although from the previous (points-based) EYFS profile. So how good is the correlation of end-of-EYFS and KS2? Because any shift earlier is likely to be worse, so at least this would provide us with a bound on the quality of any such metric. Why have these data not been presented?

It would, in my view, be unacceptable to even propose to shift the baseline assessment point earlier without having collected the data for long enough to understand how on-entry assessment correlates with KS2 results, i.e. no change should be proposed for another 6 years or so, even if statutory baseline assessments are introduced now. Otherwise we run the risk of meaningless progress metrics, with confidence intervals so wide that no rigorous statistical interpretation is possible.

Q9. If a baseline assessment is introduced in reception, in the longer term, would you favour removing the statutory requirement for all-through primary schools to administer assessments at the end of key stage 1?

The language is telling here: “to administer assessments.” If this were phrased as “to administer tests,” then I would be very happy to say “yes!” But teachers should be assessing – not examining – pupils all the time, in one form or another, because assessment is a fundamental part of learning. So really the question is the form of these assessments, and how often they should be passed up beyond the school for national comparison. Here the issue is more about the framework of support in which a school finds itself. If a school is “left to its own devices” with no local authority or other support for years (a common predicament at the moment with the abolition of the Education Services Grant by the Government!) then it way well be too long to wait six-and-a-half years before finding out that a school is seriously under-performing. Yet if the school exists within a network of supportive professionals from other schools and local authorities who have the time and resource to dig deeply into the school’s internal assessment schemes during the intervening years, these disasters should never happen. A prerequisite for a good education system is to resource it appropriately!

Q11. Do you think that the department should remove the statutory obligation to carry out teacher assessment in English reading and mathematics at key stage 2, when only test data is used in performance measures?

I think this is the wrong way round. Schools should only be required to report teacher assessment (and it should be “best fit”, not “secure fit”); tests at Key Stage 2 should be abolished. This would be fully consistent with high quality professional-led, moderated assessment, and address the very real stress placed on both children and teachers by high-stakes testing schemes. Remember the consultation document itself states “Primary assessment should not be about putting pressure on children.”

Q14. How can we ensure that the multiplication tables check is implemented in a way that balances burdens on schools with benefit to pupils?

By not having one. This is yet another situation where a tiny sliver of a curriculum (in this case tedious rote learning of multiplication tables) is picked out and elevated above other equally important elements of the curriculum. Boaler has plenty to say on this topic.

Q15. Are there additional ways, in the context of the proposed statutory assessments, that the administration of statutory assessments in primary schools could be improved to reduce burdens?

The best way to reduce the burden on schools seems to be to more closely align formative and summative assessment processes. However, schools have been explicitly encouraged to “do their own thing” when it comes to formative assessment processes. The best way the Government could help here is by commissioning an expert panel to help learn from the best of these experiments, combining what has been learnt with the best international educational research on the topic, and re-introducing a harmonised form of national in-school assessment in the primary sector.

Best Fit or Secure Fit?

The consultation appears to repeat the Government’s support for the “secure fit” approach to assessment. The document states:

The interim teacher assessment frameworks were designed to assess whether pupils have a firm grounding in the national curriculum by requiring teachers to demonstrate that pupils can meet every ‘pupil can’ statement. This approach aims to achieve greater consistency in the judgements made by teachers and to avoid pupils moving on in their education with significant and limiting gaps in their knowledge and skills, a problem identified under the previous system of national curriculum levels.

The key word here is every. This approach has been one of the key differentiators from the previous national curriculum assessment approach. I have argued before against this approach, and I stand by that argument; moreover, there are good statistical arguments that the claim to greater consistency is questionable. We are currently in the profoundly odd situation where teacher assessments are made by this “secure fit” approach, while tests are more attuned with a “best fit” approach, referred to as “compensatory” in previous DfE missives on this topic.

However, the consultation then goes on to actually suggest a move back to “best fit” for writing assessments. By removing the requirement for teacher assessments except in English, and relying on testing in KS2 for maths and reading, I expect this to be a “victory for both sides” fudge – secure fit remains in theory, but is not used in any assessment used within the school “accountability framework”.

High Learning Potential

The consultation notes that plans for the assessment of children working below expectation in the national curriculum are considered separately, following the result of the Rochford Review. It is sad, though not unexpected, that once again no particular mention is given to the assessment of children working well above the expectation of the national curriculum. This group of high attaining children has become invisible to statutory assessment, which bodes ill for English education. In my view, any statutory assessment scheme must find ways to avoid capping attainment metrics. This discussion is completely absent from the consultation document.

Arithmetic or Mathematics?

Finally, it is remarkable that the consultation document – perhaps flippantly – describes the national curriculum as having been reformed “to give every child the best chance to master reading, writing and arithmetic,” reinforcing the over-emphasis of arithmetic over other important topics still hanging on in the mathematics primary curriculum. It is worth flagging that these changes of emphasis are distressing to those of us who genuinely love mathematics.

Conclusion

I am pleased that the Government appears to be back-tracking over some of the more harmful changes introduced to primary assessment in the last few years. However, certain key requirements remain outstanding:

1. No cap on attainment
2. Baselines for progress measures to be based on good predictors for KS2 attainment
3. Replace high-stress testing on a particular day with teacher assessment
4. Alignment of summative and formative assessment and a national framework for assessment
5. Well-resourced local networks of support between schools for support and moderation

Playing with L-Systems

For today’s session of the math circle I jointly run for 5-7 year-olds, we got the kids to play with Lindenmayer Systems (L-Systems for short). L-Systems can be used as compact representations of complex geometric shapes, including fractals. The aim of the session was for children to understand that simple formulae can describe complex geometric objects, building on the intuition that properties of shapes can be described algebraically that we got through a previous session on symmetry and algebra.

I stumbled across this excellent L-System generator on the web, which was perfect for our needs as we didn’t need to install any software on the school laptops. After illustrating how the Koch Snowflake could be generated, we simply let them loose to experiment, suggesting that each time they set the number of iterations to 1 before exploring a greater depth of iteration. They seemed to really enjoy it. On a one-to-one basis, we discussed the reason that various formulae generated their corresponding shapes, trying to embed the link between the equations and the graphical representation, but the main emphasis was generating visually pleasing images.

Here are some of the curves they produced. In each case, the caption is of the form: number of iterations, angle, axiom, production rule.

I would have liked to have the time to discuss in more depth why the curve that appeared to fill the triangle had no white space visible.

Once we had finished, we finally drew together where I presented a simple L-System for the Sierpinski Triangle, an object they’d seen before in a previous session. There were several exclamations of awe, which are always great to hear!