AI, personalised tutoring and the 2 sigma problem

Martin Hall

“We’re at the cusp of using AI for probably the biggest positive transformation that education has ever seen.” Sal Khan

Sal Khan was speaking in Vancouver in April this year and, since then, Khan Academy has been partnering with public schools in New Jersey to provide AI  in classrooms.  Studies in South Africa estimate that, in 20% of  the schools that serve a majority of the population, teachers  have classes of around 80 students, making any form of personalised guidance impossible. A study by the Institute of Education, University College London found that in England,  secondary school teachers work an average of more than 52 hours a week of which more than 32 hours – 62% – is on non-teaching tasks. Here, as in healthcare,  AI will take over a wide range of administrative tasks, freeing teachers to teach.

The Holy Grail of generative AI in schooling is personalised tutoring; Benjamin Bloom’s “2 sigma problem”, formulated in 1984 and long unresolved.

In their compelling mix of computer science and science fiction, Kai-Fu Lee and Chen Qiufan imagine 2041 and a school in which every student has an automated personal tutor:

What’s the chance of humans being able to form relationships with sophisticated AI companions within twenty years? For children, there is no doubt it can happen. Children already have a universal tendency to anthropomorphize toys, pets, and even imaginary friends. This is a phenomenal opportunity to design AI companions that can help children learn in a personalized way, and practice creativity, communications, and compassion—critical skills for the era of AI. AI companions that can speak, hear, and understand like humans could make a dramatic difference in a child’s development.

What is the case for making personalised tutoring available for all, whatever their circumstances?

Bloom, Professor of Education at the University of Chicago, had developed the principles of “mastery learning“; the requirement that a student must demonstrate a comprehensive understanding of a concept before moving on. In the early 1980s, and with his team of graduate students, Bloom set up controlled studies which enabled statistical comparisons between three modes of teaching: a conventionally-taught class of about 30, in which the teacher presented and the students asked questions; a class of 30 in which the teacher followed the principles of mastery learning; and small group and individual tutoring, also following the principles of mastery learning.

The differences in learning outcomes for the three teaching approaches were striking.  Using conventional classroom teaching as the control, Bloom noted that the average student taught in a class of 30 but using the techniques of mastery learning was about one standard deviation above the average, while students who were individually tutored using the principles of mastery learning achieved learning outcomes about two standard deviations – two sigmas – above the average outcomes for a conventionally taught class. This meant that about 90% of the individually tutored students reached a level of attainment only reached by 20% of the students taught in the conventional classroom.  But at the same time, rolling out individualised tutoring would be highly demanding of teachers’ time, and prohibitively expensive:

The tutoring process demonstrates that most of the students do have the potential to reach this high level of learning. I believe that an important task of research and instruction is to seek ways of accomplishing this under more practical and realistic conditions than one-to-one tutoring, which is too costly for most societies to bear on a large scale. This is the ‘2 sigma’ problem.

Bloom’s point about cost and scalability is crucial.  The difference between a conventional class of 30 and individual tuition can be mapped as a spectrum along which average attainment levels increase as class size diminishes, and there is a large body of literature describing this. A 2011 study by the Brookings Institution found that the average student:teacher ratio in public schools across all states in the US was 15.3 (the ratio has stayed much the same since then).  Reducing this ratio by just one student would have cost about $12 billion a year in additional teachers’ salaries as well as extensive costs in new classrooms and infrastructure. Calculations of student:teacher ratios include special needs provision and so class sizes in public schools are on average close to 30, as in Bloom’s study.  This is the major differentiator with private schooling in the USA, with significantly smaller class sizes in private schools, as well as lower student to teacher ratios.  Because in the US education in public schools is free while private schooling costs upwards of $10 000 a year, access to smaller class sizes correlates with the ability to pay.

For South Africa, the policy benchmark for class sizes in public secondary schools is 37 for Grades 8 and 9.  For the higher grades, leading up to the National Senior Certificate at the end of Grade 12, norms vary by subject but are  35 and 37 for languages and mathematics. But because public schools are allowed to charge additional fees, there is a wide range of variation when compared with official norms and standards, with Quintile 5 schools serving the wealthiest 20% offering class sizes that conform with policy expectations, and Quintile 1 to 3 schools having up to 80 students in a class.

Tim Köhler, University of Cape Town’s Development Policy Research Unit, has studied the relationship between class size and learning outcomes across this range of public schooling in South Africa. In his paper published in 2022, Köhler analysed a substantial dataset to see if there is a relationship between academic attainment levels and a school’s socioeconomic status. Taking into account pass rates for the National Senior Certificate at the end of Grade 12, class sizes, and student-to-teacher ratios, he found that:

It is clear that wealthier schools on average have smaller class sizes and higher NSC pass rates. … Quintile 1–3 schools do not differ significantly from one another in these aspects, while there are 20 more learners in the average class in the poorest 60% of schools relative to the wealthiest 20%. This coincides with amore than 30 percentage point difference in inter-quintile NSC pass rates: just 56% of learners in Quintile 1–3 schools pass Grade 12 with an NSC, in contrast to 87% of learners in Quintile 5 schools.

This is shown in the graph below.  For the Quintile 4 and 5 schools in Köhler’s dataset, it is clear that pass rates in the examinations at the end of Grade 12 go up as class sizes come town. But for the poorer schools in Quintiles 1 -3, the measures of class size and attainment are flat, and there is no evident advantage in being in a slightly better off school in Quintile 3 school than enrolment in a no-fee school in Quintile 1.


The indications that the size of classes and levels of student attainment are not correlated in Quintiles 1 to 3, but are correlated in wealthier Quintiles 1 and 2, shows us that the causal relationship between class size and attainment is complex.  Köhler found that “in schools with a mean class size in the top 20% of the class size distribution, the average teacher teaches a class of just under 80 learners and is less likely to (i) have a postgraduate degree, (ii) have taken mathematics in Grade 12,(iii) be very confident in teaching their subject or phase, and (iv) have received training on supporting learners with learning difficulties”. This suggests that, rather there being a simple causal relationship between class size and academic outcomes, as is often claimed by high-fee schools, class size is, in stats-speak, an “endogenous variable” which changes  according to  its relationship with other variables that are play.

Tim Köhler’s findings were anticipated by Bloom in his original formulation of the 2 sigma problem. Bloom’s point was not that smaller class size, in itself, resulted in better learning outcomes. It was rather than personalised tutoring provided the space for implementing the precise and time consuming protocols of formative assessment and feedback – mastery learning – which would not be possible in a classroom of a single teacher and 80 students. Köhler puts it this way:

Importantly, the conclusion of this paper is not that class size does not matter. Rather, it is that changes in class sizes may not be effective in improving learner outcomes unless other factors change. In other words, the severity of these variables seems to merely be indicative of other important school factors that influence learner outcomes in the South African context.

1984 – the year in which Bloom presented the challenge of the 2 sigma problem – was also the year in which Apple launched the very first Mac with a now-famous 60 second commercial by Ridley Scott in which a hammer-wielding woman frees the masses from George Orwell’s Big Brother, an allusion to the stranglehold of mainstream computing.  But although the conceptual foundations for Artificial Intelligence had long been in place, there was widespread scepticism that this could be rolled out to scale or that it could lead to viable commercial solutions.  Consequently, and like the earlier “Turing Test“, Bloom’s 2 sigma problem has remained a hypothetical challenge.  The hammer blow was to come 38 years later, with the launch of ChatGPT in November 2022. Twelve months on, there is now a range of applications in service or in beta testing, that are designed to provide automated feedback on testing and other forms of personalised tutoring  for different levels of education.

In fields such as healthcare, developing generative AI applications is complex. For example, the Mayo Clinic is leading the way in implementing AI and innovations include Mayo’s “hospital at home”, which will automate the diversion of up to 30% of acute care emergencies away from hospital admissions, as well as an application that will provide patients with an interactive facility to obtain detailed and reliable responses to their symptoms. Both of these require extensive and wide-ranging AI training data as well as complex prompt engineering.

In contrast, the data sets required for training AI to respond to questions in secondary-level education are far more constrained. Curricula such as South Africa’s National Senior Certificate, or Britain’s A-Levels, are tightly defined and fully described. Banks of past examination papers, along with the memoranda that are used by examiners for marking, are readily available.

When it comes to prompt engineering, Bloom’s original formulation of mastery learning serves as a ready-made template for designing sets of instructions for automating personal tutoring. Thomas Guskey has provided us with a neat diagram that shows how mastery learning should be implemented, also serving as a storyboard for an AI development project.

Mastery learning works best when the curriculum is broken down into units, each of which covers a closely defined set of content. As shown in the diagram above, early in each unit students take an initial formative assessment based on well defined learning objectives, and receive feedback on their responses, which identifies the areas on which they need to focus, and for which they receive “correctives”.

Following this, students take what Bloom called a “parallel assessment”, which covers the same concepts and skills as the first, but includes slightly different problems or questions. This second assessment serves both to establish that the “correctives” have served their purpose, while also motivating students by showing them that they have moved forward in their learning.

Because students will move at different paces through a unit of learning, based on their levels of prior knowledge and their abilities, Bloom provided for enrichment activities allowing students who show a high level of competence in the initial assessment exercise to dive more deeply into the subject matter.

A personalised AI tutor working within this framework would first provide each student with feedback on the initial assessment – “Formative Assessment A” in the diagram. It would then direct the student towards the specific parts of the curriculum on which the student needs to concentrate in order to correct errors and bridge gaps in existing knowledge. Finally, the AI tutor would select appropriate items from a question bank for “Formative Assessment B”, serving to establish how far he student has moved on, and would also provide customised feedback to the students as they move on to the next unit in the sequence.

Today, the potential of generative AI promises to provide affordable personalised tutoring at scale, freeing teachers to teach and, with appropriate organisational changes, providing equity of access to learning across unequal school systems.

This would finally solve Benjamin Bloom’s 2 sigma problem, an outcome which he foresaw as “an educational contribution of the greatest magnitude”.


Allen, R., A. Benhenda, J. Jerrim and S. Sims (2019). New evidence on teachers’ working hours in England. An empirical analysis of four datasets. London, UCL Institute for Education.

Bloom, B. S. (1984). “The 2 Sigma Problem: the search for methods of group instruction as effective as one-to-one tutoring.” Educational Researcher 13(6): 4-16.

Guskey, T. (2007). “Closing Achievement Gaps: Revisiting Benjamin S. Bloom’s “Learning for Mastery”.” Journal of Advanced Academics 19(1): 8-31.

Hagemeijer, T. (2023). “Many talk about AI, Mayo Clinic is implementing it.” LinkedIn.

Kohler, T. (2022). “Class size and learner outcomes in South African schools: Therole of school socioeconomic status.” Development Southern Africa 39(2): 126-150.

Lee, K.-F. and C. Qiufan (2021). AI 2041. Ten Visions for Our Future, Penguin.

Lohr, S. (2023). A.I. May Someday Work Medical Miracles. For Now, It Helps Do Paperwork. New York Times.

Whitehurst, G. J. and M. Chingos (2011). Class Size: What Research Says and  What it Means for State Policy, Brown Center on Education Policy, Brookings Institution.

AI in Schools – Surviving or Thriving?

Martin Hall

Tech columnist Kevin Roose, writing for the New York Times at the beginning of the new school year, advised teachers to look beyond the negatives of AI: “survive and thrive”.   First, assume that all the students in your class are using ChatGPT.  Second, forget about banning the use of AI – you cannot win.  Third: “teachers should focus less on warning students about the shortcomings of generative A.I. than on figuring out what the technology does well.”

After ChatGPT took the world by storm at the end of last year the use of generative AI applications was banned in New York’s schools.  But in May this year the head of the city’s public school system reversed this decision. On 18 May, Chancellor David C Banks wrote:

New York City Public Schools will encourage and support our educators and students as they learn about and explore this game-changing technology while also creating a repository and community to share their findings across our schools. Furthermore, we are providing educators with resources and real-life examples of successful AI implementation in schools to improve administrative tasks, communication, and teaching. We will also offer a toolkit of resources for educators to use as they initiate discussions and lessons about AI in their classrooms. We’ll continue to gather information from experts in our schools and the field of AI to further assist all our schools in using AI tools effectively.

Newark’s public schools are taking the same direction, and  have announced a partnership with Khan Academy, using Khanmigo as an AI tutor. Khanmigo is an AI-powered chatbot that mimics a writing coach by giving prompts and suggestions to move students forward as they write, debate, and collaborate.

A subsequent New York Times profile of this initiative has caused some scepticism, with one commentator describing the use of Khanmigo as “something like using a pneumatic jackhammer to fill a cavity”.  

In the New York Times profile, a teacher in the third grade class in Newark’s First Avenue Elementary School had her students ask Khanmigo two simple, unstructured questions: ““What are consonants?”, and “What fraction of the letters in the word MATHEMATICIAN are consonants?”.  This kind of use has prompted teachers’ concerns:

“That’s our biggest concern, that too much of the thinking work is going through Khanmigo,” said Alan Usherenko, the district’s special assistant for schools, including First Avenue, in Newark’s North Ward. The district did not want the bot to lead students through a problem step by step, he said, adding, “We want them to know how to tackle the problem themselves, to use their critical thinking skills.”

Training AI to work further up the hierarchy of learning gain is a challenge for the new discipline of prompt engineering and, already, the gains are impressive.  The potential of AI tutoring to assist students to solve problems themselves is evident in a second Khan Academy partnership, this time with the College Board, which is responsible for the SAT assessments written by High School students in the US and widely used for College admissions. In helping students to prepare for writing the SATs, the College Board issues a set of test papers each year, and the aim of the partnership is to use AI tutoring to help students learn from these test papers.

The Khan Academy resources build on the College Board’s practice tests by taking a learner through a series of authentic mock questions, augmented with a set of hints for each question, which explain the underlying conceptual logic of the question. As the learner works through the unit, they take a series of quizzes  – SAT-style questions but without the options of hints, for which they accumulate Mastery Points. The accumulation of Mastery Points matches the College Board’s official scoring system.

This application’s design has  been based on the principles of scaffolding and on neurological principles for improving  learners’ comprehension and  powers of recall. Rather than being given the correct answer to the question, students are nudged towards coming up with the right answer for themselves.

Going back to the Newark’s First Avenue Elementary School, and following the example of the prompt engineering used for the Khan Academy’s generative AI tutor for SAT practice tests, Khanamigo could have been prompt engineered to respond the question “What are consonants?”, by responding “what do you think consonants are?”, followed by constructive observations on the quality of the learner’s answer.  There are already many examples of generative AI applications working to this, and higher, levels of sophistication.

Jovan Kurbalija, Executive Director of DiploFoundation and Head of the Geneva Internet Platform, puts it like this:

The parallels between the Socratic method and AI prompting remind us of the timeless value of critical thinking, dialogue, and self-discovery. By infusing our AI interactions with the essence of Socratic questioning, we can encourage a deeper understanding of ourselves and society, fostering insightful discussions and more effective use of AI platforms.

Banks, D. C. (2023). Chalkbeat. New York.

Klein, A. (2023). New York City Does About-Face on ChatGPT in Schools. Education Week.

Kurbalija, J. (2023). “What can Socrates teach us about AI and prompting? The art of asking questions in the AI era.”

Meyer, D. (2023). “It’s Pretty Clear That These Math Students Aren’t Interested in Learning From an AI Chatbot Tutor.” Mathworlds

Roose, K. (2023). How Schools Can Survive (and Maybe Even Thrive) With A.I. This Fall. New York Times.

Singer, N. (2023). In Classrooms, Teachers Put A.I. Tutoring Bots to the Test. New York Times.

Learning from Errors

Martin Hall

In an article that was published a few years ago in the Annual Review of Psychology, Professor Janet Metcalfe argued that learning from errors has specific and significant benefits in education:

Considerable research now indicates that engagement with errors fosters the secondary benefits of deep discussion of thought processes and exploratory active learning and that the view that the commission of errors hurts learning of the correct response is incorrect. Indeed, many tightly controlled experimental investigations have now shown that, in comparison with error-free study, the generation of errors, as long as it is followed by corrective feedback, results in better memory for the correct response.

This is supported by comparisons between teaching methods in Japan and the United States. In the US tradition of teaching Mathematics in schools, set procedures are followed for teaching specific categories of problems, with an emphasis on avoiding mistakes.  In contrast, rather than starting with an account of the correct approach, teachers in Japan first require their students to attempt the problem on their own.  Inevitably, learners get into difficulties and their errors become the focus of the lesson:

The time spent struggling on their own to work out a solution is considered a crucial part of the learning process, as is the discussion with the class when it reconvenes to share the methods, to describe the difficulties and pitfalls as well as the insights, and to provide feedback on the principles at stake as well as the solutions.

This is significant because, year-on-year, schools in Japan achieve significantly better outcomes in Mathematics than school in the United States.

More recently Aarifah Gardee  and Karin Brodie, Wits University School of Education, have researched the benefits of learning from errors in the context of teaching Mathematics in South African schools. Their perspective comes from theories of knowledge construction and identity development, augmenting the conclusions that Metcalfe derives from experimental psychology.

Gardee and Brodie start from the position that making errors is normal; what matters is the ways in which teachers work with learners’ mistakes. For some teachers, avoiding errors is taken as an indicator of ability and intelligence while making errors is a source of shame, resulting in anxiety and self-doubt.  Other teachers see opportunities  for learning from errors, introducing the possibility  of building learners’ self-confidence – the approach in Japan.  This direction of research has particular importance in the context  of South Africa’s marked  educational inequities and the achievement gap in educational outcomes.

For this research, Aarifah Gardee spent two years following teachers in a South African school who taught Mathematics to Grade 9 and 10 classes. She filmed lessons and interviewed both students and teachers, resulting is a granular and detailed analysis that complements the general patterns emerging from transnational enquiries such as the TIMMS reports.

One of the specific lines of enquiry in this research was the ways in which teachers’ approaches to errors influenced learners’  “mathematical identities” :  the embodied and reflexive sense of self that is shaped by interests, actions and a person’s social context. The hypothesis was that an individual learner’s mathematical identity  would determine the extent that they learner would embrace the teacher’s objectives, formally set out in the curriculum. Mathematical identities ranged from a strong affiliation with the “classroom community” (“affiliating learners”) through to a self-perception of being on the margins of conformity.

Affiliating learners exercise agency by participating as full members of the community and by utilizing the tools and resources of the mathematics classroom community to develop their social identities as full participants. These learners tend to be emotionally invested in learning mathematics for their lives and futures, as indicative of their personal identities. Concomitantly, learners who develop their identities in marginalization, when offered social identities of marginalization by teachers, may not be emotionally invested in or motivated to learn mathematics for their lives and futures, as indicative of their personal identities. These learners exercise agency by engaging with the tools and resources of the mathematics classroom community in limited ways, or not at all, to develop their social identities as marginal members.

The two teachers who took part in this study took different approaches in their classes.

One offered lengthy explanations and multiple examples. For him “errors had negative connotations, being able to ‘grow’ and ‘breed’.  He organized his lessons by limiting learner opportunities to explore and devise their own methods and solutions to problems to avoid the occurrence of errors, and usually corrected errors when they occurred using lengthy explanations”. 

The other teacher preferred group discussions in which learners devised their own solutions: 

Errors were discussed positively as lessons, and he used errors as a means of encouraging learners to learn from their mistakes. He structured his lessons to support learner reasoning and he often probed learner errors to access their reasoning before correcting errors.

As is so often the case in this kind of  close qualitative research, the patterns that emerged were far from straightforward.  Some learners liked the first teacher’s approach, identifying with the implication that making errors is a sign of weakness.  Some learners in the second teacher’s class felt uncomfortable with his approach, preferring direct instruction.  But  whatever their attitude, all learners in the study formed  a  discernible “mathematical identity” over the two years of the study.

The outcomes of this research open up tantalising questions for future studies.  As Aarifah Gardee  and Karin Brodie ask in their conclusions, what is the play between socialisation through the authority of the teacher and learners’ agency in determining their own identities?  What will be the long-term outcome of the tension between affiliation and marginalisation on each learner’s ongoing  journey through the education system, and each of their successes at the assessment gateways ahead of them?  This study was carried out at a school that is described as “well resourced”; what would be the advantages of focussing on errors to teach mathematics at a poorly resourced Quintile 1 school in South Africa?

Developing a “mathematical identity” as a marginalised learner in Grade 9 and 10 of High School will have a profound effect on  a person’s lifetime prospects.  In South Africa, they are unlikely to take Mathematics as a subject in their National Senior Certificate examinations at the end of Grade 12. In turn, further study at college or university will either be closed to them, or their qualification options will be limited. With South Africa’s persistently high levels of unemployment, there is a high probability that a marginalised mathematical identity, however it formed, will evolve into economic marginalisation. 

This is why theoretically informed and painstakingly detailed studies such as these, which have the potential to improve teaching and learning, are so important.

Gardie, A. (2019). “Social Relationships between Teachers and Learners, Learners’ Mathematical Identities and Equity.” African Journal of Research in Mathematics, Science and Technology Education 23(2): 233-243.

Gardie, A. and K. Brodie (2022). “Relationships Between Teachers’ Interactions with Learner Errors and Learners’ Mathematical Identities.” International Journal of Science and Mathematics Education 20(1): 193-214.

Metcalfe, J. (2017). “Learning from errors.” Annual Review of Psychology 68: 465-489.

Mullis, I. V. S., M. O. Martin, P. Foy, D. L. Kelly and B. Fishbein (2020). TIMSS 2019 International Results in Mathematics and Science. Boston, TIMMS & PIRLS International Study Center, Lynch School of Education, Boston College606.

How your brain lights up when you fail

Martin Hall

One of the key mechanisms determining equality of access to education is assessment. This is because key assessment events serve as a gateway to the next level of opportunity. There is a turnstile at the end of every year of High School, determining who is promoted to the next grade and who has to repeat a year. This generates the achievement gaps that are characteristic of schooling in many countries. South Africa is at the extreme of this global trend. The Research on Socioeconomic Policy team at Stellenbosch University has shown that by Grade 10, when students begin to prepare for the National Senior Certificate, there is an achievement gap that is equivalent to four years of learning. 

Spaull and Kotze, “Starting behind and staying behind in South Africa”

Similarly, examinations at the end of High School shape who will gain admission to which kind of college or university. Analysis of this year’s British A-level results shows widening inequalities. 22.0% those writing A-levels at comprehensive schools in England were awarded grade A or above, compared with 47.4% at independent schools. 8.3% more learners achieved an A or above in the affluent south-east than in the poorer north-east. In both the South African and the British examples, differing levels of attainment correlate with household income.

Because these assessment gateways are so important in determining a person’s life trajectory, the details of how tests and examinations are structured really matters. For example, it’s long been shown that context can play an important role in advantaging one examination candidate over another; if you ask a student to give a factual account of a snow storm based on their own observations, then a student living in Chicago will have an obvious advantage over a student living in Nairobi. This is why some assessment exercises are “scaffolded”, with essential information provided as part of the assessment exercise – for example, in South Africa’s National Benchmark Tests, which are used alongside the National Senior Certificate to assess applications for university admission.

It follows from these discrepancies that good assessment design is crucial in ensuring genuine equality of opportunity. A range of research fields come into play in understanding what this should look like. One fascinating example of the potential of transdisciplinary thinking comes from Cognitive Neuroscience. 

In a series of laboratory experiments at Columbia University’s Department of Psychology, a simple assessment exercise was simulated to see how students responded to failing a question. Volunteers were divided into two categories – those who were confident in their ability to pass, and those who were less sure about their level of confidence. After an interval, everyone was given a second chance. The results showed that the more confident students were much more likely to pass the test the second time round than those who lacked confidence. This is a result of the “hypercorrection effect“, formally defined as “enhanced attention and encoding that results from a metacognitive mismatch between the person’s confidence in their responses and the true answer”; in simpler language, the shock and amazement that a confident person gets from discovering that they are wrong. The Columbia researchers captured this in MRI brain scans of their volunteers, with high confidence errors showing as red.

Moving across the span of the disciplines, these test results from Cognitive Neuroscience prompt a thought experiment, following from the Stellenbosch study of the achievement gap in Mathematics across schools in South Africa.

Imagine the level of confidence of two sets of learners in different High Schools as they enter Grade 10. One group is in a Quintile 5 school – a state school that is allowed to charge additional fees and which enrols students predominantly from affluent families. More than 90% of learners from this school will successfully complete their National Senior Certificate exams in three years time. The other group is from a Quintile 1, no-fee school, at which only 20% of those starting Grade 10 will go on to sit the NSC exams at the end of Grade 12. 

Extrapolating from the Columbia experiments, it’s clear that in-class assessments that don’t take account of the differing confidence levels in Quintile 1 and Quintile 5 schools are likely to invoke the hypercorrection effect, further widening the achievement gap as these Grade 10 learners continue to move through the Maths curriculum. This is because the Maths classes in the Quintile 1 schools will already contain a comparatively high proportion of learners who have been required to complete one or more grades, damaging their confidence in themselves. In contrast, the large majority of learners in the Quintile 5 schools will have successfully completed all the assessments in each proceeding school year, and will start Grade 10 Mathematics confident of their success. If they get a Maths problem wrong, they will experience a “metacognitive mismatch”; their brains will glow red as they hypercorrect, getting it right the next time around.

Cliff, A. (2015). “The National Benchmark Test in Academic Literacy: How might it be used to support teaching in higher education?” Language Matters 46(1): 3-21.

Forrest, A. and J. Stone (2023). A-level results: Biggest drop in top grades on record as Tories accused of ‘exacerbating’ class divide. Independent. London.

Metcalfe, J., B. Butterfield, C. Habeck and Y. Stern (2012). “Neural Correlates of People’s Hypercorrection of Their False Beliefs.” Journal of Cognitive Neuroscience 24(7): 1571-1583.

Spaull, N. and J. Kotze (2015). “Starting behind and staying behind in South Africa. The case of insurmountable learning deficits in mathematics.” International Journal of Educational Development 41: 13-24.

First generation learners: a cultural deficit problem?

Martin Hall

UNESCO’s new report, “Technology in education: a tool on whose terms?”, takes a global view on three issues: the social and human dimensions of education and the role of the teacher; the risk that the use of technology in teaching can widen inequalities; and the tension between the principle that education should be a common good and the rise of commercial interests in education. The report has been launched in tandem with a campaign, #TechOnOurTerms, calling for greater regulation of the Edtech industry, particularly for the harvesting and use of personal data of learners.

There is a lot here. One detail that caught my attention was the under-recognition of the challenges faced by students who are ahead of their parents at their current level of formal education; high school students whose parents did not have the opportunity to complete their schooling; university students who are set to be the first in their families to graduate.  I’ve recently completed a project for the University of Cape Town, analysing long term rates of student progression and graduation. Because only 6% of South Africans have a university degree, a correspondingly high proportion of students are the first in their families to attend university. Consequently, first generation students form a large proportion of students at all South African universities, and a majority at some.

UNESCO finds that there has been insufficient attention to this category of students: 

one group that is not mentioned at all in the SDG 4 framework is first-generation learners, i.e. learners who are the first in their family to attend a particular level of schooling.

UNESCO’s review of the literature identifies some of the factors that shape educational outcomes for these students. They are more likely to doubt their skills and experience a fear of being exposed, a feeling exacerbated in courses which tend to be more competition-oriented, such as science, technology, engineering and mathematics (STEM) courses. In some cases, research shows that children from less educated households are not as likely to receive a good grade, even when their performance is identical to learners from more privileged families. UNESCO concludes that completing a level of education that one’s parents never attended is a formidable challenge, whether for children of illiterate parents in low-income countries or first-in-their-family university students in high-income countries.”

Yes, but….

This argument starts from a deficit assumption, without considering the counter-intuitive. What if first generation students have a compelling vision of their future selves and a determination to succeed? What if it is an advantage not to be burdened by the assumptions of ones parents? The UNESCO report concludes that “first-generation students are more likely to have norms, such as a belief in collaboration, that are at odds with the more individualistic environment of higher education”. But what if this is an advantage, promising a new generation of graduates with a commitment to community, rather than just to themselves?

UNESCO’s general finding is referenced to research published in the Journal of Personality and Social Psychology in 2020 and based on a sample of college students in the United States. This is what the authors conclude:

United States higher education prioritizes independence as the cultural ideal. As a result, first-generation students (neither parent has a four-year degree) often confront an initial cultural mismatch early on in college settings: they endorse relatively interdependent cultural norms that diverge from the independent cultural ideal. This initial cultural mismatch can lead first-generation students to perform less well academically compared with continuing-generation students (one or more parents have a four-year degree) early in college … providing access is not sufficient to reduce social class inequity; colleges need to create more inclusive environments to ensure that students from diverse backgrounds can reap similar rewards.  

But understanding “culture” in this way is to assume both that cultures are discrete sets of traits, and that one set of cultural traits is superior to the other, in this case validating independence and individualism as preferred graduate traits. Here, it’s useful to interpret the US-based study against a comparable analysis of first generation students at universities in South Africa, published by a team from the University of Johannesburg

This paper concluded that those who succeed are motivated by support from parents and peers and make use of resources in their communities:

both quantitative and qualitative findings indicated that these students take their parents’ and family’s support seriously …Involvement in their communities enabled the students to integrate their university culture into their home environment. .. In this study, it was discovered that peers would make great efforts to support each other, providing accommodation, encouragement, food and other forms of assistance.

These traits are closely aligned with mutual trust and respect, team work and the pursuit of collective goals – attributes that are widely valued in the workplace and which employers often see as underemphasised by universities as graduate attributes. So rather than making recommendations based on the assumed deficits of first generation students – “cultural mismatch” in the US study – could it be more productive to take a critical look at the curriculum and the personal attributes that it favours and fosters?

Educational challenges such as these are difficult to resolve, and attributing causality to learning outcomes with any reasonable level of confidence will depend on assembling wide-ranging and reliable data sets. Appropriately, UNESCO’s #TechOnOurTerms campaign centres on informed and ethical data collection and use. Focusing on first generation students, with their consent, and developing analytics based on the kinds of evidence assembled by the University of Johannesburg team, could have significant benefits for developing a curriculum that recognises the distinctive benefits that first-generation students can bring to the worlds of learning.

Khuluvhe, M. and E. M. Ganyaupfu (2022). Fact sheet: highest level of educational attainment in South Africa. Pretoria, Department of Higher Education and Training.

Motsabi, S., B. Diale and A. van Zyl (2020). “The Academic Persistence of First‑Year First‑Generation African Students (FYFGAS): A Framework for Higher Education in South Africa.” Journal of Student Affairs in Africa 8(2): 73-85.

Phillips, L. T., N. M. Stephens, S. S. M. Townsend and S. Goudeau (2020). “Access is not enough: cultural mismatch persists to limit first-generation students’ opportunities for achievement throughout college.” Journal of Personality and Social Psychology 119(5): 1112-1131.

UNESCO (2023). Technology in education: a tool on whose terms? Paris, UNESCO.

Can AI Kill the Achievement Gap in Education?

Martin Hall

The “achievement gap” is a persistent and pernicious feature of inequality of access to education across all levels of learning, from early schooling to Higher Education. Many studies, across differing education systems, have shown that students’ learning outcomes are affected by levels of economic privilege. Learning outcomes are also mediated by specific legacies of identity and cultural status; ethnic minorities and race in the United States; immigrant communities across the European Union; the long shadow of apartheid in South Africa. The complex ways in which these factors interact have been known and studied since the publication of The Black-White Test Score Gap 25-years ago. But effective solutions have been elusive, and haunted by deeply-rooted biases in curriculum content and assessment systems.

The challenges of the achievement gap came to mind during Professor David Lefevre’s presentation in a recent webinar hosted by Harvard Business Publishing – “Will AI Replace the Educator”. David Lefevre is Professor of Practice at Imperial College Business School, London, where he founded the Edtech Lab in 2004, developing the uses of new digital technologies in education. He summarises his overall objective as building “precision education“, deploying technology to provide students with personalised learning journeys that target their specific needs with maximum efficiency. This would result in analytics that identify with certainty the detail of course content that best addresses a given learner’s needs as well as optimal format for delivering this content – exactly the kind of precision that is needed to take interventions to address the achievement gap to the next level.

One of the reasons that finding solutions for achievement gaps is so elusive is the spaghetti-like complexity of interactions between students’ socialisation and circumstances, educators’ backgrounds and assumptions and the legacies and conventions of education systems; classic case studies of complex systems. One way into this is by focussing on the core work of assessment and feedback, so well conceptualised by Dylan Wiliam:

The teacher’s job is not to transmit knowledge, nor to facilitate learning. It is to engineer effective learning environments for the students. The key features of effective learning environments are that they create student engagement and allow teachers, learners, and their peers to ensure that the learning is proceeding in the intended direction. The only way we can do this is through assessment. That is why assessment is, indeed ,the bridge between teaching and learning.

An appropriately designed form of assessment can provide detailed evidence about individual student achievement at a precise point in their curriculum and “can be used by teachers, learners, or their peers to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions they would have made in the absence of that evidence.”

This is where generative AI enters the picture.

However carefully designed and moderated, most human mediated assessments that form the core of education systems have an inherently subjective element in which the experience and socialisation of the assessor come into play along with institutional traditions and customs. These factors are accentuated at scale because good human assessment requires time and consideration, factors which are at odds with high workloads and tight deadlines. In contrast, careful prompt engineering, that focuses on the key competences that are being tested, has the potential to identify and mitigate the cultural filters that contribute to the achievement gap, and to do so rapidly, and at scale. In this, generative AI would be functioning as both a “guide on the side” and a “dynamic assessor”, two of the ten potential use cases identified by UNESCO in their recent guide to AI in education.

How can we be sure that an AI assessor is as least as good as a fully qualified human? Best practice in conventional assessment is for two assessors to grade anonymised student work independently, with third party moderation if the two independent grades fall outside a defined range of variance. Given this, the minimum requirement for automated assessment is that, for a representative sample of students, the AI assessments consistently fall within the variance that is allowed for two human assessors. 

Significant progress towards achieving this standard has already been met in a number of reported studies. For example, Google’s PaLM 2 has been designed for grading both multiple choice and long form examinations in Medicine and has been rigorously tested through comparisons with conventional grading by physicians, and by human moderation of AI outcomes. While more work is needed to meet the standards required in Medicine, it is clear that this will soon be achieved, allowing this use of AI to be fully integrated into medical education.

Deploying these new and emerging technologies will not, in themselves, resolve the complex issues that cause unequal access to all levels of education. But their use will shift the dial in identifying and correcting trenchant attitudes, practices and policies that prevent so many people, across all levels of education, from achieving their potential.

Jencks, C. and M. Phillips, Eds. (1998). The Black-White Test Score Gap, Brookings Institute Press.

Singhal, K., T. Tu, J. Gottweis, et al. R. (2023). Towards Expert-Level Medical Question Answering with Large Language Models. Ithaca, Cornell University.

UNESCO (2023). ChatGPT and Artificial Intelligence in Education: Quick Start Guide. Paris, UNESCO.

Wiliam, D. (2018). Embedded Formative Assessment (the new art and science of teaching). Bloomington, Solution Tree Press.