Data Days: Using data to solve the SQA exam results challenge
By Holly Shippam, 1 October 2020
You’ll have heard about the disappointment thousands of students across the country were faced with on exam results day back in August. The issue was caused, of course, by the impact of Covid-19. Looking back at this process several months later gives us an opportunity to reflect more generally about data quality and making decisions with imperfect information; a challenge that our clients often have to make.
Students weren’t able to sit exams as usual, and so teachers were asked to give predicted grades instead. But in many cases, it seemed that these were disregarded, and students were left with exam results that didn’t match up with their year-long performance or their teachers’ predicted grades, leading to widespread disappointment.
Extra efforts were taken to improve the accuracy of this year’s predictions. This included a review of previous predictions compared to attainment, as well as the production of training materials focused on avoiding biases and inaccuracies in estimations for teachers. Yet, many predicted grades were still adjusted by the SQA. Overall, of 511,070 predictions across National 5, Higher and Advanced Higher, 73.8% were accepted and 26.2% were adjusted, with the majority of adjustments (93.1%) being made downwards. What went wrong? And how could the use of data have helped to solve this problem?
Use of modelling to explore scenarios
SQA explored five different technical options for awarding grades, but all of these were rejected.
The first of these was simply to award students with their predicted grades. When reviewing the use of centre estimates, overall attainment was close to the predictions for previous years. However, this was due to an averaging out of estimates that were too high and those that were too low, as many estimates weren’t accurate. For this reason, this approach was rejected for 2020.
Another option which was considered by the SQA was multiple linear regression. For this, the SQA used predicted grades along with coursework marks and prior attainment to predict the final outcome, using data from previous years to trial the results.
Using this method would have resulted in reliable predictions. Unfortunately this method wasn’t usable, as it relied on the availability of all information which in actual fact was limited and inconsistent.
The SQA also considered moderating grades either only at national level, or only at centre level. However, this was ultimately rejected, as was using centre-supplied rank order which wasn’t comparable between centres. There’s a common theme here: None of the options trialled and tested so far were viable because there wasn’t enough historic data available, or the data available didn’t cover a broad enough range. It’s always good practice to collect a broad range of data which can be quickly surfaced. This allows you to generate insights and get a working prototype as quickly as possible which can then be tested, the results analysed, and the process refined.
The chosen approach
The approach that was finally chosen compared this year’s predictions to attainment in previous years. It was decided that performance had improved too much for the predicted grades to be accurate. Then, frequency distributions for the split of grades for each course in previous years was determined, and a range within which these boundaries could move was established. Centres were ranked using these criteria, and grade adjustments were made where necessary.
The ranges for these boundaries varied depending on class sizes, and were much larger for small uptake classes as these subjects tend to experience a greater variability in results. This meant that subjects like Gaelic would be most affected, as they tend to have fewer students studying them than other subjects like English and Maths.
It also meant that there would be much greater variability in schools with the capacity for small class sizes across a broader range of subjects. Because of the broader boundaries, there was less interference from SQA in these instances. This meant that many students in smaller classes who were predicted high grades by their teachers were awarded those same grades, in comparison to their peers in larger classes who were at more risk of having their grade adjusted downwards. This was an issue because smaller class sizes are more prevalent in private schools, where students already experience an advantage over their publicly-educated peers.
Lessons to learn
The SQA couldn’t have predicted the impact that Covid-19 would have on coursework and exams in 2020. The pandemic severely limited the amount of coursework that could be safely submitted and marked, meaning that there wasn’t a full data set to work with when testing prediction models.
The issues that resulted from the intervention of SQA to adjust grades could have been avoided by simply having the right data to work with. If SQA had been able to use a technical prediction, it would have been clearer that discrepancies in results didn’t necessarily require grade adjustments - saving teachers and students a lot of heartache and hassle. For the most part, the more data available to you, whatever the situation, the easier and smoother your decision-making process will be.
As we reflect on the Summer 2020 exam process and apply it to more general data quality and the use of imperfect information we know that businesses often face challenging decisions and don’t have complete information, or decisions are required at pace, before all options can be fully explored. The advice that we generally give to clients if decisions need to be made urgently is do two things: make the best of what you have and establish steps to improve data processes from acquisition through modelling to analysis to prepare better for next time. If you have any questions for our data science team we're happy to help.