The AI Mirror: How GenAI Reflects and Amplifies Gaps in Early Math Expertise

As a learning scientist and expert in early mathematics, I've spent the past year in an unexpected partnership — teaching AI systems about how young children learn math. My work centers on developing content for PAL.guide, a digital tool that helps parents and teachers support children's early math development through hands-on activities.

In working with large language models (LLMs) for PAL, I discovered they often present poor math understanding with the polish of expertise. These weren't one-off errors — they were persistent patterns across platforms (Claude, ChatGPT, Llama, Mistral, etc.) and math domains. From numeracy to measurement to geometry and beyond, LLMs routinely crossed developmental boundaries, confused concepts, and misaligned activities — often even after being repeatedly corrected.

Why does this matter? Because these AI systems are increasingly becoming the go-to resource for the very educators and parents who need the most support in early math, yet parents and even teachers often lack the specialized knowledge to detect these subtle but critical errors. When a preschool teacher who is struggling with math confidence turns to ChatGPT for help, they're not only at risk of getting flawed content — they're getting it delivered with AI-generated authority and confidence that make these tools so appealing.

A Case Study About Early Measurement

While the full set of examples from over a thousand hours of work is too extensive to share, one revealing case study highlights the systemic nature of these challenges.

The Knowledge Unit (KU) in Focus: Shows awareness of size, and uses appropriate language to describe such as big, small, giant, tiny, etc.

In our knowledge model, this KU is among the earliest in the measurement progression for 4–5-year-olds. Yet as I worked with the AI to develop aligned activities, its responses repeatedly targeted more advanced concepts, despite clear guidance on the intended skill.

The Initial Misalignment

When asked to create activities for this basic awareness skill, ChatGPT immediately jumped several developmental steps ahead:

Me: Let's try the next KU, which is "Shows awareness of size, and uses appropriate language to describe such as big, small, giant, tiny, etc."

ChatGPT: Size Sort — Gather household objects of different sizes (e.g., a spoon, a book, a pillow). Help [the learner] sort them into groups by size: big, medium, and small.

This initial suggestion reflects a fundamental misunderstanding: sorting requires a more advanced skill than simply recognizing size. Notably, the GenAI had already reviewed the full KU progression and discussed where this foundational KU fits, well before sorting, classifying, or ordering appear later in the sequence.

Recognizing the Error — But Not Correcting It

Even after I pointed out the mismatch, and even after ChatGPT acknowledged the error, it continued to produce the same developmental misalignments. This revealed a key insight: recognizing a mistake doesn't guarantee a change in a GenAI's output.

The pattern continued with concept conflation. ChatGPT suggested telling parents: "The tree is tall and wide, but the flower is very short and small."

The problem here is that "short" is an aspect of length, while "small" is an aspect of size — describing different mathematical attributes that children must learn to differentiate. When these words are treated as interchangeable, several problems arise:

Measurement Confusion: Size may refer to overall dimensions or volume, while length specifically refers to one-dimensional measurement
Classification Errors: When children confuse size and length, they struggle with categorization — essential for all mathematical reasoning
Problem-Solving Mistakes: A child may assume a taller container holds more water, even if it's narrower
Compounding Consequences: These distinctions form the foundation for area vs. perimeter, volume vs. surface area, and multi-dimensional measurement later in schooling

The Triple Expertise Gap

As I've reflected on my work with LLMs, I've come to recognize what I call the "triple expertise gap" in early mathematics — a cascading problem that begins with misconceptions about the complexity of early math.

The Practitioner Gap: Most early childhood educators lack consistent, domain-specific knowledge in early mathematics. This isn't a failing of individual educators — it's a systemic failing that spans teacher preparation, professional development, and educational priorities. Recent studies show preschool teachers exhibit only 66% accuracy in identifying age-appropriate numeracy skills for 4-year-olds, and only 66% agreement between teacher ratings and direct assessments of children's math abilities.

The Perception Gap: The widespread belief that early math is "simple" leads to chronic underinvestment in curriculum and teacher preparation. When adults see young children naming shapes or counting to 10, they often think, "How hard can this be?" But these foundational skills — like one-to-one correspondence or spatial reasoning — form the cognitive architecture for all later mathematics. Misunderstandings at this level don't vanish; they compound over time.

The Infrastructure Gap: True expertise in early math is rare. Unlike early literacy, which benefits from decades of research and training infrastructure, early math remains a niche specialization. This scarcity extends beyond classrooms, into the AI systems now being trained on our fragmented and imprecise knowledge base.

AI Isn't Inventing the Problems — We Are

Throughout this project, I encountered a wide range of persistent mathematical misunderstandings across GenAI platforms. In number sense alone, GenAIs frequently confused verbal count sequence with one-to-one correspondence, suggested backward counting when asked for forward skip counting, and offered numeral recognition activities when the KU targeted oral counting. In geometry, they often confused different types of symmetry. They consistently blurred conceptual lines between subdomains of measurement.

These issues weren't isolated to a single KU, activity, AI system, or mathematical domain. Across multiple GenAIs, the same misunderstandings appeared repeatedly — mirroring the kinds of confusion that already exist in early childhood education.

When GenAI suggests an activity that conflates size and length, it's not just making a technical mistake. It's echoing the same conceptual confusions that arise when humans lack foundational expertise. Because AI is trained on flawed human knowledge, it doesn't just reflect these gaps — it reinforces and scales them.

The Challenge and the Opportunity

As generative AI tools become more embedded in everyday educational practice, the stakes grow. Those least confident in early math are the most likely to turn to GenAI for help — and the least likely to spot its mistakes.

But this is also an opportunity. The AI mirror doesn't just reveal gaps in machine understanding; it exposes deep cracks in our human knowledge infrastructure. Through the field of learning engineering, we're uniquely positioned to address this. We can build high-quality, research-grounded datasets and knowledge models that reflect what we actually know about how children learn math. We can design tools that validate AI-generated content against research-grounded developmental progressions.

If we want AI to be part of the solution in early math education, we must first ensure that it is learning from the best data we can provide. That means investing in the infrastructure — human and technical — that will guide machines and educators toward deeper, more accurate understanding.

This is not just a call to fix AI. It's a call to fix what AI has revealed: the urgent need for better shared knowledge about how young children build their earliest mathematical foundations.

Originally published in The Cutting Ed by The Learning Agency.