The Learning Illusion - How We Built a Sophisticated Parrot and Mistook It for a Hawk

Everyone in artificial intelligence is lying to themselves about the same thing.

It isn't a malicious lie. It is not even, for the most part, a conscious one. It is the sort of comforting, collective delusion that can only arise when billions of dollars in capital, thousands of academic careers, and the entire public imagination depend on believing something that isn't quite true.

The lie is that we are building intelligence. That by scaling our models, by feeding them the entirety of the internet, by piling on more layers and parameters than there are stars in the galaxy, we will eventually cross some magical threshold into genuine machine cognition.

We won't. Not on this path.

What we are building are not fledgling minds. They are sophisticated parrots. We have poured the world's knowledge into systems that can only ever mimic the patterns within it. We have created a miracle of mimicry, a marvel of statistical reflection, and we have become so mesmerised by the fidelity of the reflection that we have mistaken it for the real thing. We have built the most sophisticated parrot in history and are trying to convince ourselves it is a hawk.

The parrot can repeat anything you say, often more eloquently than you said it. The hawk, however, understands the world. It adapts, it learns from every gust of wind, it changes its strategy based on the rustle of leaves in the undergrowth. One is a static marvel of repetition. The other is a dynamic engine of learning.

This distinction is not academic pedantry. It is the most important problem in our field. And our collective failure to grasp it is leading us down a path toward a strange and powerful new kind of stupidity-a dumb superintelligence.

The Gospel of Scale - A Fair Hearing for the Orthodoxy

Before I dismantle the prevailing view, I must give it a fair hearing. To do otherwise would be intellectually dishonest. The argument for scale as the primary driver of intelligence-the orthodoxy that rules from Silicon Valley to Shenzhen-is both powerful and seductive. Let me steel-man it for you.

First, emergent abilities are undeniably real. A model like GPT-2 could barely string a coherent sentence together. GPT-4 can write code, pass the bar exam, and explain complex scientific concepts. Somewhere between those two points, new capabilities appeared that were not explicitly programmed. They "emerged" from the sheer scale of the model and its training data. Proponents of scale argue that this process is not finished. What other, more profound abilities lie dormant, waiting for us to build a model large enough to awaken them?

Second, history seems to be on their side. The computer scientist Rich Sutton articulated this in his famous essay, "The Bitter Lesson." He observed that for 70 years, AI researchers who tried to build explicit knowledge and clever reasoning algorithms into their systems were consistently outperformed by those who simply leveraged more computational power. General methods that scale with compute, like search and learning, have always won. The lesson? Stop trying to be clever and just build bigger computers and bigger models.

Finally, the practical results are staggering. These "parrots" are already transforming industries. They are co-pilots for programmers, creative partners for artists, and powerful analytical engines for scientists. When a tool is this useful, it is easy to forgive its fundamental limitations. It is tempting to believe that whatever is powering this utility must be a form of genuine intelligence.

This is a strong position. It is backed by billions in investment and tangible, world-changing results. It is also, I believe, fundamentally wrong. It mistakes performance for competence, and mimicry for understanding.

The Cracks Appear - The Parrot in the Machine

The illusion of learning persists because, for most tasks, it looks identical to the real thing. But under pressure, at the edges, the cracks in the facade begin to show.

The most damning piece of evidence is a condition I call architectural anterograde amnesia. In neuroscience, this is a devastating condition where a person can no longer form new long-term memories. Their past is intact, but their future is a perpetually resetting present. This is the precise condition of every Large Language Model in existence.

After its gargantuan training run is complete, the model is fundamentally frozen. Its weights-the very substrate of its "knowledge"-are static. You can show it new information in a prompt, and it can use that information masterfully within that single conversation. This is what we call "in-context learning." But the moment that context window slides, that new information is gone forever. It has not been integrated. No learning has occurred. The model has not updated its worldview. It is a brilliant conversationalist with the memory of a goldfish.

This isn't a bug we can fix with a clever patch. It is a fundamental feature of the architecture. Transformer-based models are interpolation engines. They are designed to find the most statistically probable path through the vast, high-dimensional space of their training data. They take a prompt and find a plausible-sounding continuation based on the trillions of patterns they have already seen. They are not designed to encounter a genuinely novel piece of information and say, "Aha, this changes everything." The very concept of "changing everything" is alien to their structure.

This leads us to the uncomfortable truth. We haven't built a machine that learns. We have built a machine that has already learned everything it ever will. It is a crystallised intelligence, a snapshot of the internet frozen in time.

And this is why I call it a "dumb superintelligence." It possesses more factual knowledge than any human in history, yet it lacks the most basic component of true cognition: the ability to grow. It is a library that contains every book ever written, can remixes their sentences into new paragraphs, but can never, ever read a new book and add it to the shelves.

The Deeper Truth - From Static Patterns to Dynamic Growth

If scale is not the answer, what is? If larger static models only create more sophisticated parrots, how do we build the hawk?

The answer requires a paradigm shift. We must stop thinking about intelligence as something to be built and start thinking of it as a process to be cultivated. The goal is not to create a finished artefact of intelligence, but to design a system capable of growth. We need to move from static architectures to dynamic ones. We need to build what I call neuro-symbiotic frameworks.

These are systems designed not just to process information, but to change themselves in response to it. They are built around growth loops-feedback mechanisms that allow the model to learn from its experiences, update its internal representations, and become genuinely more capable over time.

This may sound abstract, but a recent, quietly revolutionary paper from Google Research gives us a concrete glimpse of what this future looks like. In "Nested Learning: The Illusion of Deep Learning Architectures," Ali Behrouz and his colleagues offer a powerful critique of the current paradigm and a compelling alternative.

Their core claim is that the "depth" in deep learning is an illusion. Stacking more layers, they argue, does not necessarily create a more computationally powerful model. True depth comes from creating a system of nested optimization problems that operate at different speeds, on different time scales-much like the different frequencies of waves in the human brain.

The Nested Learning (NL) framework reimagines a model as an ecosystem of learning processes. Critically, it reframes the optimizer-the algorithm like Adam or SGD that trains the model-not as an external tool, but as an integral, fast-learning part of the model itself. The optimizer becomes an "associative memory" for gradients, an inner loop that is constantly learning how to learn more effectively.

This is no longer a static block of parameters. This is a system with multi-speed learning. There is a "fast" learning process happening at the inner levels, responding immediately to new data, and a "slow" learning process at the outer levels, consolidating this new experience into the model's core knowledge.

This is not just a theory. Behrouz's team built a proof-of-concept model called HOPE, based on these principles. It demonstrates superior performance on tasks requiring continual learning and long-context reasoning-precisely the areas where today's "parrots" fail most spectacularly. The Nested Learning framework provides the mathematical and architectural blueprint for the growth loops we so desperately need.

Implications - How to Build a Hawk

The implications of this shift are profound for anyone building or working with AI today.

First, we must temper our obsession with scale as the sole metric of progress. Chasing ever-larger parameter counts is a game of diminishing returns if the underlying architecture remains static. The future is not bigger, but smarter and more dynamic architectures.

Second, we must begin designing for adaptation. The key question for an AI architect should no longer be "How much knowledge can I bake into this model?" but "What mechanisms have I provided for this model to acquire new knowledge and skills after deployment?" This means exploring self-modifying architectures, dynamic memory systems, and online learning protocols.

Third, we need to rethink what memory is. The current "context window" is a crude and temporary solution. The Continuum Memory System proposed in the Nested Learning paper points towards a more biological approach-a spectrum of memory systems operating at different speeds, allowing for both the fleeting recall needed for conversation and the deep consolidation required for genuine learning.

The ultimate goal is a change in mindset. We must move from being architects of static cathedrals of knowledge to being gardeners of evolving intellectual ecosystems. Our job is not to build the finished product, but to seed the initial conditions for growth and then nurture the system as it learns.

Conclusion - The Honest Path to Cognition

We have built an incredible parrot. Its ability to mimic human language and reasoning is a landmark achievement in engineering. We should be immensely proud of this creation.

But we must be honest about what it is. A mimic. A statistical shadow of the vast repository of text it was trained on.

Mistaking this parrot for a hawk-mistaking mimicry for true learning-is a dangerous delusion. It is a path that leads to a plateau, a dead end where we are surrounded by ever-larger, ever-more-convincing, but still fundamentally static artefacts.

The real work, the honest path to genuine machine cognition, begins now. It requires us to abandon the comforting simplicity of the gospel of scale and embrace the messier, more complex challenge of building systems that grow. It means trading the illusion of learning for the real thing. It means putting our parrot back in its cage, and turning our attention to the hawk circling in the sky above, wondering just what it would take to build one of those instead.