We are at an early stage of rebuilding something fundamental about how institutions store, transmit, and certify knowledge — and the companies that get the infrastructure right in this window will be defining the operating layer that every knowledge-intensive organisation depends on for the next twenty years. This is not a prediction about AI capabilities. It's an observation about where the infrastructure gaps are: the knowledge layer of most institutions is as underbuilt as the customer relationship layer was before CRM software normalised structured data about customers. The category exists; the software to support it adequately doesn't yet.
Knowledge infrastructure, as a category, is more specific than it sounds. It doesn't mean all software that handles information. It means the systems that do three specific things: they make implicit institutional knowledge explicit and structured; they maintain the provenance and credibility of knowledge claims over time; and they make knowledge findable and usable by people who didn't create it, without requiring those people to understand how the knowledge is organised. Most existing systems do one of these things partially. Traditional enterprise document management platforms make knowledge nominally structured but fail catastrophically on findability and provenance. Email threads make knowledge explicit in the moment but destroy both structure and provenance immediately. Wikis maintain structure but have profound provenance and trust problems. The design challenge is building systems that handle all three requirements without requiring so much maintenance overhead that the knowledge fails to stay current.
In education specifically, the knowledge infrastructure problem takes three forms that we think about constantly. The first is curriculum knowledge: the structured representation of what is to be taught, in what sequence, at what depth, aligned to which standards — and how that curriculum evolves over time as standards change and teaching practice improves. This knowledge currently lives in documents, in teacher heads, and in the professional memory of long-serving heads of department. When those people leave, the knowledge leaves with them. The second is student learning knowledge: the accumulated record of what each student has learned, what they've struggled with, and what pedagogical approaches have worked for them. This knowledge exists within a school across the databases of five or six different tools that don't talk to each other and resets entirely when a student moves schools. The third is professional teaching knowledge: the practitioner wisdom that experienced teachers develop about how to teach specific concepts effectively, which misconceptions are common, which explanations work for which types of learners. This is almost entirely tacit and almost entirely lost when experienced teachers retire.
The reason we describe this decade as the knowledge infrastructure decade is not because the technology has just become capable — vector databases, semantic search, and retrieval-augmented generation have been commercially available for two or three years. The reason is that the institutional appetite for investing in knowledge infrastructure is now reaching the threshold where enterprise buyers will actually budget for it. The COVID-period exposure of knowledge infrastructure failures — the scramble to manage distributed learning, the discovery that curriculum documentation existed only in physical binders that couldn't be accessed remotely, the realisation that student progress data was scattered across tools with no common data model — accelerated an institutional awareness that was already growing. The buyer is more ready than it was five years ago.
We want to be honest about what we don't know here. The timing of major infrastructure investment cycles is hard to predict precisely. The companies that will build the dominant knowledge infrastructure platforms may not be buildable from seed stage — the sales cycles and deployment complexity may require more institutional capital than a seed-stage fund can support through to scale. We hold this thesis partly as a direct investment lens and partly as a framing device for evaluating whether seed-stage companies are building with the right architectural assumptions to be relevant in the infrastructure landscape we believe is coming. A company that is building with proprietary data formats and closed APIs is making a different bet about the knowledge infrastructure future than one that is building with open standards, interoperable data models, and a design philosophy that assumes their product will be one component of a larger institutional knowledge system. We favour the latter, for reasons that will either be vindicated by where the market goes or educate us significantly if we're wrong.