LikeLike

]]>Michael’s comments have made me reflect a little further on questions of ordering of attainment, i.e. when can we say that one pupil has attained more than another? I’d like to explain this issue from an abstract mathematical perspective.

One option is not compare attainment at all, but generally this is the kind of summary information that is of great value to leaders, so let’s assume for the moment that we do want to devise such a way that we can talk about better or worse attainment. Let’s further assume that individual curriculum statements are indivisible: a pupil either “gets / can do” or “doesn’t get / can’t do” a particular concept / activity. To make things really simple, let’s also only consider ordering of attainment within a year group. Each year’s programme of studies contains a number of statements .

Each child’s attainment is actually a subset of these statements – those achieved. Clearly a property we would like to have between two attainments is that , that is if child b can do everything in the curriculum that child a can do then child b’s attainment is at least as high as child a’s. Let’s call these “meaningful orders”.

We could actually define and be done. Yet this leads to many incomparabilities which Michael alludes to. Is the child that can do and higher or lower attaining than the child that can do , and ? One option is to live with this. These children are genuinely incomparable in attainment, and that’s all there is to it. I think the intuitive adoption of such an ordering by set inclusion (https://en.wikipedia.org/wiki/Partially_ordered_set) is probably partly behind the rejection that an average February attainment is distinguishable from an average October attainment.

But there are other meaningful orders that could be defined, such as ordering by cardinality: , which is effectively what I’ve argued for above and is in wide use in primary education. There are many, many, other choices of course. The advantage of this one is that it is a total order (https://en.wikipedia.org/wiki/Total_order), with all the nice mathematical properties that brings – properties heavily used throughout the primary education system. The disadvantage, as Michael implies, is that this is somewhat artificial.

Take your pick: summary but artificial, or the inability to summarise but totally real. In practice, schools often choose the former for SLT and governors and the latter for classroom teachers. Perhaps with good reason!

LikeLike

]]>One issue you highlight is that these data are often interpreted by people without the statistical skills to understand the uncertainty and its implications. I agree. But I think there is no way to avoid uncertainty – it is inherent, even with testing. So really we should be upskilling our school leaders and our governors, in my view.

The other issue worth picking up relates to the statement “I don’t believe that any teacher could reasonably define the difference between an October average writer and a February average writer”. If we use the coverage metric, wouldn’t an October average writer be one with a small number of statements under his / her belt while the February writer would be one with about half the statements under his / her belt? This goes back to my initial point – I agree these are positions are likely to be indistinguishable with respect to individual statements, but surely not with respect to the proportion of such statements achieved? (I will post separately below on notions of mathematical ordering of curriculum coverage, which touches on this point but is probably a little off on a tangent!)

I of course agree with you about a governing body hypothetically drawing conclusions about quality of teaching. But I would suggest that a good set of governors would not be using a lower proportion in one class than another to draw conclusions over quality of teaching but rather as a basis for discussion with the subject coordinator, as a prompt to ask “why is the proportion lower in Class A?” To which, I’m sure, any subject coordinator worth their salt will begin by answering in the way you have!

To answer the specific question you ask: “if we measured heights of two classes of children every 6 weeks, and one seemed to fall behind, at what point would we feel that there was enough statistical significance to warrant drawing any conclusions?” The standard statistical answer would be: when the probability of any such difference happening purely due to chance (due to a random allocation of children to classes combined with the inherent variation in children’s growth) is smaller than some significance value – typically 5% or 1%. I don’t know what that height difference is, but it can be calculated, because we know the distributions. So rigorous statistics can be applied. I don’t see the educational setting as any different, except – as you point out – the variations will be higher (including the measurement error).

LikeLike

]]>1. I think the issue about accuracy is key. If you fancy running a scorable test every term, then that seems fine to me. However, even with something so precise, as you say the individual level there are risks. At cohort level it then might become useful, although most primary cohorts are small – some very small – so such data again is hazardous (and bear in mind that we’re talking about largely untrained statistical eyes here!).More to the point, in fact most such judgements are a broad teacher assessment judgement (or worse, a calculated total of a number of teacher assessment judgements). I think it unwise to be trying to draw such regular inferences from such fuzzy information.

Perhaps more regular data on a specific task is useful, e.g. knowledge/recall of number bonds. Part of the problem here is that we’re often talking about broad judgements across a fairly large domain (e.g. Writing) and then trying to imagine a category system that shows a step of progress every 6 weeks. I don’t believe that any teacher could reasonably define the difference between an October average writer and a February average writer, let alone a December one.

That’s the bit I have trouble with. I collect near daily data on tables tests, but I don’t imagine that the child who scored 56 yesterday and 54 today has got worse any more than the reverse would suggest improvement. Perhaps my biggest concern is the sense of precision which doesn’t really exist. Which links to point 2

2. With height we can very accurately measure children, comfortably to the nearest 1/2cm, perhaps the nearest millimetre – but as you say, we would only be gravely concerned by a 15cm difference. Medical practitioners might choose to monitor someone who was 10cm shorter than expected more regularly. Even that height difference represents well over a year of typical growth.

We come back again to the cohort issue. Perhaps over large groups we might iron out some of the issues, but if we measured heights of two classes of children every 6 weeks, and one seemed to fall behind, at what point would we feel that there was enough statistical significance to warrant drawing any conclusions? We certainly wouldn’t base it on a single half-term’s data, I’m sure you’d agree?

3. Yes, this is the key. And perhaps my main point here is that by attempting to draw conclusions so regularly, we add to the likelihood of error. It would concern me if a governing body were drawing conclusions about the quality of teaching in any given class because in one class 45% of children had reached step 18, but in a parallel class only 40% had. That’s quite possibly 2 children, each lacking one concept. I would consider that to be well within the likely margin of error for such judgements. The level of precision we are able to attain at such small scale doesn’t merit the effort it might take, while the risks of collecting such data and sharing it with those who are not statistically-minded (such as the vast majority of those in education!) are far greater. Not only is it probably not useful, I’d argue that it could be positively damaging.

]]>1. Frequency of measurement. Let’s consider what drives this. Firstly, even with perfect information, it’s not reasonable to measure more frequently than we can make use of the information (which is probably the initial motivation for half-termly?) Secondly, if we spend all our time collating summary metrics and not teaching, that does nobody any benefit. But the third point is the key one in this discussion: how does uncertainty impact on frequency? The less average movement within a given time period, the less frequently we should measure, for a fixed variability of the data. On this, I am happy that your expertise suggests that variation for an individual child is so large that half-termly measurement is (relatively) meaningless. Perhaps even termly – though if it’s about early detection of potential issues rather than “performance management” or some other summative use of data – this is probably arguable. However if we’re averaging over a cohort to get cohort metrics, the variation decreases. So half-termly for a class or a year group might well be meaningful, even if termly for an individual pupil is only borderline.

2. Your first point about objective proportions is one relating to granularity of thresholds. I agree with this – if we collect data at the level of individual objectives, why then round that data to a very coarse level before assessing progress – use the full precision available to us: 16% versus 17% in this case; it’s like rounding a child’s height to the nearest 10cm before looking it up in the centile table. I take your point about a child 2cm “too short”, but if an 8 year old child is 15cm shorter than average then a referral actually is recommended (at least according to the BSPED https://www.bsped.org.uk/)!

3. Your final question “what does it add to our understanding” is key. I think primarily what it adds is the ability of school leaders and governors to quickly grasp which classes and groups of pupils might need extra investigation. As always, data is the start of a discussion about “why”, never the end point.

Thanks again.

LikeLike

]]>1. I don’t think we disagree on the importance of measuring. I’m in favour of decent testing as a way of getting an indication of attainment/progress. My point in the article is that we shouldn’t be attempting to interpret measures taken so frequently. If we insist on making a judgement every 6 weeks (which is what is implied by the 6-step model), then we are likely to be confusing noise for meaningful information. Just as we wouldn’t panic if a child hadn’t grown the expected 0.4cm in two months (or whatever it might be), so it would be useless to draw any great inference from a single steps movement up/down (or not at all) over such a short period.

2. Yes, as I say, identifying the age to the nearest year could give a reasonable estimate, but to within a couple of months would be crazy – yet that is what the 6-step model implies.

As for the impact of curriculum, while it’s true that children have different prior attainments, they are nevertheless in classes of 30+. In my Y6 class I have children who might be writing at a level you might expect of a Y3, yet they may also have had a go at using dashes because it was covered in a lesson along with their peers. That might be the clue that gives away their age unrelated to their apparent ability otherwise.

3. You’re mistaken about my interpretation about 6 steps. I’m not referring to “can do X a tiny bit”, although I do see systems like that (with more than 6 steps sometimes!). It’s more about the interpretations being drawn from systems like the one you suggest based on percentages of objectives met. If a child has met 17% of objectives that they are therefore “on track” for October, but if they’ve met 16% of them, then they are “one step” behind. Consider further if the child on 17% can use Roman numerals, but not place value. They appear to be on-track yet have a significant gap, while the other child who is behind may have a secure understanding of this key concept. This is of course nonsense, and is the result of artificial thresholds. It would be the equivalent of saying that a 8 year old child who is 2cm “too short” is a cause for concern.

You seem quite happy with the 3/6 steps model (or perhaps only with the 3-step idea based on your final comment), but what does it add to our understanding? I don’t see that it’s useful, and it certainly isn’t a measure – it’s a category. Categories are sometimes useful for comparing whole cohorts on occasion, but every 6 weeks? ]]>

1. Michael makes the very important point that attainment (and therefore, indirectly, progress) is susceptible to inherent variations across pupils. He compares to height, and points out that centiles are used here. Quite right – more should be done to utilise the distribution of children’s attainment when coming to judgements on the progress made both by a child (very susceptible to large variations) and the average for a cohort (less susceptible to large variations). To be fair, this information is present in reports like RAISEOnline, though school leaders, LAs and some inspectors don’t always look at the detail. But just because something is susceptible to variation doesn’t devalue its measurement. Consider the final statement Michael makes at the end of the article: “Six points of progress each year – or three, or 10 for that matter – is no more useful an indicator of children’s success than expecting to grow exactly 5cm between birthdays. We’d do well to stop pretending otherwise.” Just because we don’t expect all children to grow the same amount doesn’t mean we shouldn’t measure their height. Because if they stop growing, or grow in a very unusual way, they are usually referred to a paediatrician. Not because there is something wrong necessarily, but because we need to ask **why** and to understand their unusual growth. The same is true with progress.

2. Michael suggests that it would be foolish to try to predict a child’s age from an example of their work. He states that “you might estimate what year group they were in, absolutely – probably based as much on your knowledge of the curriculum as anything, but nobody would sensibly try to estimate their age down to the nearest couple of months, surely?” There are two quite distinct points to make here. Firstly, yes it is entirely possible to predict a child’s age from their work – I’m not a primary school teacher yet I can tell you whether a given book is more likely to be from a Year 1 or a Year 6 – though of course there is a good level of uncertainty associated with such an estimate, and I’m very happy to accept the assertion that the level of uncertainty is much greater than a couple of months. This is uncontroversial. However, the statement “based as much on your knowledge of the curriculum as anything” rings alarm bells to me. It suggests that *despite* all the variation in attainment quite correctly pointed to by Michael, walking into a school today we are likely to see children taught material based on their age, not their capability. This is sad, in my view, especially since the National Curriculum programmes of study quite clearly allow schools to take the variation into account: “Within each key stage, schools therefore have the flexibility to introduce content earlier or later than set out in the programme of study. In addition, schools can introduce key stage content during an earlier key stage, if appropriate.”

3. Michael states that “Somehow, it has become acceptable to consider progress through the academic year as a series of six steps, as though every two months will see an identical change in attainment for everyone.” Again there are a few quite distinct issues here. The first one is whether it is reasonable to measure progress as “a series of steps.” The second is whether all children will exhibit the same number of steps of progress over a given period of time. The final point is whether six is indeed “the magic number” – or three – or any other number. The second point is easiest to deal with – of course not all children progress at the same rate or in a linear fashion – any assessment scheme expecting this is as ridiculous as Michael suggests. Yet that doesn’t stop us drawing a best fit line through rates of progress and asking questions about why particular children are making more or less progress than their peers. Maybe there’s no reason, maybe there is some magic happening we could all learn from. You don’t know until you look. The first and third points depend on what a step means. At one extreme – and I think this is what Michael is complaining about – stepped progress means, for example, “can’t compare and order fractions”, “can compare and order fractions a tiny bit”, “can compare and order fractions a bit”, “can compare and order fractions reasonably”, “can compare and order fractions a quite well”, “can compare and order fractions very well”, “a master at comparing and ordering fractions” (though he makes the point with writing.) Of course this is ridiculous. Yet it’s not the way I’ve seen schools successfully use a three or six-point progress summary – schools I’ve talked to seem to use them to record *the proportion* of different statements in the national curriculum that have been successfully mastered by pupils rather than how well they’ve mastered an individual statement. With this in mind it certainly seems reasonable to assess as “having met about a third of the statements for Year 5”.

So, in summary, a thought provoking article from Michael Tidd, as always. But I see no reason to abandon stepped progress measures. Indeed I don’t see any suggestion for an alternative that is anything other than a stepped progress measure in disguise. Is there?

LikeLike

]]>I am not interested in understanding SEN policy when not concerned with gifted children (I have other things to do, there are others more qualified and it doesn’t apply to my child). But I hate the cap on attainment, hate the way assessments are graded (which does not give you as a parent enough information to guide and assist your child). Also I think that more needs to be done for HLP children in the way of support and stimulation and allowing them to progress at the speed they are capable of if they choose and are supported emotionally, instead of forcing them in with age group based “peers”.

Basically take off the brakes on these kids as the current system is like making them drive with the handbrake on.

I have given a more detailed response to the consultation itself inculding the reccomendation that all children be independantly screened for HLP prior to entry into the school system.

LikeLike

]]>LikeLike

]]>LikeLike

]]>