9 December 2024 · Aroha Mitchell

Literacy in the Early Years: Where Machine Learning Earns Its Keep

Reading acquisition is one of the most consequential and measurable outcomes in primary education. It's also one of the few places where adaptive assessment software has proven, peer-reviewed evidence of efficacy.

Reading acquisition is one of the most well-researched outcomes in educational psychology. The simple view of reading — Gough and Tunmer's model, which structures reading comprehension as the product of decoding and language comprehension — has been empirically supported across decades and provides a clear map for intervention. We know that phonemic awareness, phonics knowledge, vocabulary, and fluency contribute in measurable ways to reading comprehension. We know that early identification of difficulties in any of these components, followed by targeted intervention, substantially improves long-term outcomes. We know that by Year 3 or 4, reading difficulties that haven't been addressed begin to compound through a Matthew effect — children who read well read more, which improves vocabulary and background knowledge, which makes them better readers. The window for effective early intervention is real and narrow.

What machine learning adds to this well-established evidence base is scale and precision of assessment. The standard tool for assessing early reading in New Zealand primary classrooms is the running record — a protocol developed by Marie Clay, founder of Reading Recovery, in which a teacher listens to a child read aloud from a levelled text, marking each word as correct, substituted, omitted, or self-corrected. A skilled teacher can extract a great deal from a running record: reading level, accuracy rate, the types of cueing systems the child is using, whether self-correction behaviour indicates active monitoring of meaning. But running records are time-intensive. A teacher with a class of 25 students can realistically run detailed individual running records on each student once every few weeks. The children most at risk — those whose reading level is most difficult to assess quickly — need more frequent monitoring, not less.

Machine learning approaches to reading assessment work by extracting acoustic features from speech that correlate with the reading sub-skills that running records are designed to measure. Phoneme-level analysis of oral reading can identify specific decoding errors — the child who substitutes vowel phonemes consistently is showing a different profile from the child who omits word endings, and both are different from the child who reads accurately but disfluently. These patterns are identifiable from audio recordings with sufficient precision to be useful clinical signals, provided the models are trained on sufficiently large and diverse learner speech corpora and validated against human expert assessments. The technical challenge is specifically in low-resource languages and dialects: a reading assessment model trained primarily on North American English speech will not perform well on New Zealand students whose English has the specific phonological characteristics of NZ dialect, including New Zealand vowel shift patterns that differ systematically from other Englishes.

Amira Learning, which we backed at pre-seed in 2023, is building specifically for this constraint. Their literacy assessment tool collects oral reading samples from primary students in real classroom conditions — not in quiet recording booths but in the actual acoustic environment of a NZ primary classroom — and processes them through models calibrated against NZ-specific student speech data. That calibration work is painstaking and unglamorous, but it's the reason their accuracy on NZ student populations is substantially better than off-the-shelf solutions. The output is a running-record-style report that a teacher can review and augment with her professional judgment, not a black-box reading level score. The design philosophy — machine learning as an amplifier of teacher expertise, not a replacement for it — is the right one for primary school contexts where teacher professional trust in the tool is a prerequisite for adoption.

The broader argument for machine learning in early literacy is not that it will replace skilled reading teachers. Reading Recovery teachers, specialist literacy teachers, and well-trained classroom teachers who understand the simple view of reading are not going to be made redundant by software that analyses oral reading samples. The argument is that the assessment bottleneck — the constraint on how frequently and precisely a teacher can assess each student's progress — is currently limiting the effectiveness of the interventions that follow. More frequent, more precise assessment doesn't help students by itself; it helps students by allowing teachers to make better-targeted instructional decisions earlier. The software is doing what software does well: reducing the administrative overhead of a process that a human expert needs to drive. That's a useful role, and it's the right framing for machine learning in this context.