2 October 2023 · James Kopu

Formative Assessment as an AI Signal Layer

Formative assessment data — the small, frequent checks teachers run throughout a lesson — is the richest signal available to adaptive learning systems. Most EdTech products ignore it entirely.

There's a category of data in education that has been sitting in classrooms for decades, being generated constantly, and being almost entirely wasted by the software systems that schools use. That category is formative assessment data — the small, frequent checks that teachers run throughout a lesson to gauge whether students are with them. An exit ticket at the end of a period. A show-of-hands response to a worked example. A three-question diagnostic at the start of class. A running record taken during a guided reading session. These signals are pedagogically precious and technically almost useless inside most EdTech products, because most EdTech products were built to store and display summative assessment data — the grades, the NCEA credits, the end-of-unit scores that have to flow into student management systems and parent reports.

The distinction matters for anyone building an adaptive learning system. Summative data tells you what a student knew at the end of a unit. Formative data tells you where comprehension broke down during instruction — which is an entirely different and more actionable signal. In a well-designed adaptive system using Item Response Theory, the formative signal drives the next instructional decision: whether to re-expose the student to the same concept with a different representation, or branch to a prerequisite skill, or advance to the next node in the curriculum graph. The problem is that IRT models need calibrated item banks — items whose difficulty parameters have been estimated across a sufficiently large student population. Most formative assessment instruments used by NZ and Australian teachers were not built with IRT calibration in mind; they were built for human-readable interpretability, which is a different design constraint. The AI opportunity here is in building the bridge: instrumentation that is humanly interpretable for teachers and machine-readable at the item level.

Consider what this looks like in practice. A Year 4 teacher in a Christchurch primary runs a guided reading group of six students using a running record protocol — noting every miscue, substitution, and self-correction as students read aloud from a levelled text. A skilled teacher can interpret a running record in real time, make instructional decisions within seconds, and update her mental model of each student's phonological awareness and reading fluency. But that interpretation lives in her head. It doesn't flow into the LMS. It doesn't update the adaptive path on the student's literacy app. It doesn't inform the classroom aide who works with the same students in the afternoon. The formative signal was generated, processed by an expert, and then lost to the system. Tools like Amira Learning are starting to address this — using speech-to-text and phoneme-level analysis to make the running record machine-readable without replacing the teacher's interpretive role. That's the right design pattern: instrument the signal, preserve the teacher's authority over its interpretation.

We're not suggesting that every formative assessment moment needs to be digitised. The overhead of instrumenting low-stakes classroom checks would destroy their pedagogical value — the power of an exit ticket is precisely that students and teachers treat it as low stakes. What we are arguing is that the class of formative signals that already involve some structured data output — running records, diagnostic assessments, spaced-repetition recall checks — are chronically under-exploited as inputs to adaptive systems. A student who fails a phoneme-segmentation diagnostic on Monday and then shows a self-correction pattern in Tuesday's running record is exhibiting a specific and identifiable pattern in the reading acquisition curve. That pattern should update the instructional model. At present, in most NZ primary classrooms using any combination of the available tools, it doesn't. The adaptive system and the formative assessment tool operate in separate silos, with a teacher's memory and professional judgment as the only integration layer.

The infrastructure investment thesis that flows from this is straightforward: the most valuable position in the EdTech stack is not the adaptive learning engine itself, but the assessment instrumentation layer that feeds it. An engine without signal is just a recommendation algorithm with no training data. The companies that figure out how to make the formative signal machine-readable — without increasing teacher burden — will have a defensible infrastructure position that every downstream adaptive product will need to integrate with. That's the assessment-as-API play, and it's the reason we think the assessment layer is the highest-value infrastructure bet in education technology right now.