Prelude: The Gladiator's Armor

Three years ago, I read Steve Jobs biography. The part about the iPhone obsessed me. Not the design. The philosophy. "People don't know what they want until you show it to them," he said. He was right about phones. But somewhere along the way, the AI industry got it backwards. Everyone thinks they know exactly what they want. And they're usually asking for the wrong thing entirely.

Last week, Anthropic published their engineering blog post on "Code Execution with MCP." The post celebrated a 98.7% token reduction by having agents write code instead of making direct tool calls. The AI community erupted. "This is brilliant!" "This solves the context problem!" "This is the future of agents!"

But here's the thing: this isn't innovation. It's damage control. And the fact that we're celebrating it reveals something uncomfortable about how the AI ecosystem approaches problems.

Let me explain.

The Broken Design We Didn't Want to Admit

When the Model Context Protocol launched, it made sense. The AI industry needed a standard for how to expose tools to LLMs. Direct function definitions seemed logical as a first approach. Load the tool definitions into context, let the model pick which one to call, observe the results, iterate.

Except it was broken from the start. Not in implementation. In design philosophy.

Here's why:LLMs aren't traditional software systems where more data equals better decisions. They're pattern matching machines. Every token in context adds noise. And at a certain point, a point we're rapidly approaching, that noise overwhelms the signal.

Think of it like this. Imagine a librarian trying to answer your question. If you give her one shelf of relevant books, she finds your answer quickly. Give her two shelves, still good. But give her the entire library and ask her to answer your question in 30 seconds? She'll get confused. She'll miss the good information buried in mediocrity. She'll pattern-match on frequency instead of relevance.

That's what we did to LLMs with MCP.

We walked into the library and said:"Here are 47 tool definitions. Here's the documentation for each one. Here are examples. Here are edge cases. Now make a decision about which tool to use."

And the model did. But not well. Because it was drowning in context.

The Ecosystem's Honest Mistake

The fascinating part of this story is timing. For the first 18 months of MCP adoption, the industry tried to solve the problem by stuffing more context in. Bigger context windows, better tool definitions, more examples, smarter retrieval.

This was always the wrong approach. But we didn't know it yet.

What happened is what always happens in technology:real-world constraints force clarity. When context windows hit economic limits—when you're paying per token, when your latency is measured in seconds, when your system costs scale linearly with model calls, suddenly the obvious becomes unavoidable.

Context efficiency isn't a nice-to-have optimization. It's the entire game.

And if you look at actual MCP implementations in production, you see this clearly. The servers that work are exceptional. They have extremely limited, well-curated toolsets. Maybe 3-5 tools, max. Anything more and you've instantly created a context nightmare.

Most MCP servers? They're architectural mistakes.

I can say this confidently because the data backs it up:Anthropic saves 98.7% of tokens by having agents write code instead of calling tools directly. And do you know what else is 98%? The percentage of MCP servers in the wild that violate basic architectural principles.

That's not a coincidence. That's a data point telling us something important.

What 98% of MCP Servers Got Wrong

The ecosystem is drowning in worthless MCP servers. Not because the people building them are incompetent. Because they started with the wrong assumption:that MCP is a user-facing tool integration standard.

It's not. It should have been backend infrastructure from day one.

But early on, MCP felt accessible. Developers could write a TypeScript file, define some tools, wire up an API, and call it an MCP server. It worked in demos. It worked in tutorials. It even worked with a single agent and a handful of tools.

Then you tried to scale it. You added a second MCP server. Then a third. Suddenly you're loading 50+ tool definitions into context before processing a single request. Your system is slow. Your tokens are burning. Your costs explode.

The problem isn't MCP itself. The problem is that 98% of MCP servers were built as if context was infinite.

Here's what a non-worthless MCP server actually requires =>

OAuth and proper authentication. Not toy API keys. Real, production-grade OAuth 2.1 with PKCE, token introspection, audit logs. Most MCP servers? They skip this entirely.

Permission management with enforcement. Fine-grained access control. Agents should only be able to invoke tools they're explicitly permitted to use. Most implementations have no permission layer whatsoever.

State management. A database backing the state, not ephemeral memory. If your MCP server can't survive a restart without losing critical information, it's not production-ready. It's a prototype.

Observability. Monitoring, logging, traces. How are your tools being used? What's failing? What's slow? Most MCP servers have zero visibility.

Do you know how many MCP servers have all of these? A handful. Maybe 2% of the ecosystem. Probably less.

The rest are expensive reminders of what happens when you follow tutorials without questioning assumptions.

The Signal-to-Noise Ratio Nobody Talks About

Here's what the industry gets wrong about context windows:they think bigger is better.

"Claude 3.5 has 200k tokens!" "GPT-4o has a million tokens!" "If we just make context windows infinitely large, we can solve everything!"

This is backwards thinking. Context windows aren't a solution. They're a constraint you have to respect.

And the constraint isn't about token count. It's about signal-to-noise.

LLMs are pattern recognition systems. They find patterns by comparing signals across a dataset. More tokens means more data points. But it also means more noise. And at a certain ratio, noise overwhelms signal.

Think about forensics. If you give a detective one piece of evidence, it's compelling. Ten pieces? Compelling. But give them a million pieces of evidence, 99% of which are irrelevant? They'll miss the actual crime.

That's what we've been doing. We've been handing LLMs massive amounts of context. Most of it irrelevant noise and wondering why they're underperforming.

The real optimization isn't bigger context windows. It's ruthless context curation. Progressive, on-demand context loading. Only the information that matters. Nothing else.

This isn't a temporary problem. It's permanent. Unless there's a fundamental paradigm shift in how LLMs work, and there's no evidence that's coming, we'll always be fighting against the signal-to-noise ratio.

Which means context efficiency doesn't stop being important. It becomes more important.

The Real Architecture:MCP as Backend, A2A as Frontend

Here's what's actually happening in the ecosystem, whether people realise it or not =>

A2A (Agent-to-Agent protocol) is becoming the user-facing standard. It's where you define agent capabilities in a way humans can understand. "This agent can read emails, draft responses, and send them."

MCP is becoming what it should have been all along:a backend concern. Agents discover and invoke tools through code execution, but MCP operates behind the scenes, handling the infrastructure.

These aren't competing protocols. They're complementary. A2A provides the interface. MCP provides the plumbing.

Anthropic is rediscovering what A2A's designers already understood:context efficiency is fundamental. It's not an optimization. It's the requirement.

The protocol evolution looks like this =>

Era 1 (2024):MCP as User-Facing Standard

  • Direct tool definitions in context
  • Upfront, bloated context
  • Doesn't scale beyond a few tools

Era 2 (2025):Code Execution with MCP

  • Agents write code instead of calling tools
  • On-demand tool discovery
  • Massive token efficiency gains
  • Still not perfect, but much better

Era 3 (2026-2027):MCP + A2A Maturity

  • A2A handles agent coordination and interfaces
  • MCP handles tool execution and state
  • Clear separation of concerns
  • Each protocol evolves within its domain

We're transitioning between Era 1 and Era 2 right now. The ecosystem is confused because the protocols are in flux.

What Gets Lost in the Noise

Here's what I think matters but nobody's discussing =>

The 98% efficiency gain is impressive. But it's also incomplete. Code execution solves context efficiency. It doesn't solve security, permission management, state coordination, or observability.

Those battles are coming.

When you have agents writing arbitrary code to access tools, you've created a new attack surface. How do you prevent an agent from escalating privileges? How do you ensure it only accesses data it's authorised for? How do you audit what code it wrote and why?

These are hard problems. They're infrastructure problems. They require serious engineering.

And here's the uncomfortable truth:most teams won't build this correctly. Code execution with MCP requires proper infrastructure thinking, security discipline, and systems programming skill. The majority of teams adopting this will create vulnerabilities. Permission failures. Operational nightmares.

Because it's hard. And it will always be hard.

This is the moment the industry realises that proper agent systems require proper engineering. Toy tools, toy servers, and toy protocols are ending. What comes next is harder, more powerful, and only accessible to serious builders.

The Contrarian Takes

Let me say the things nobody wants to say =>

Early MCP adoption was a mistake. The protocol wasn't ready. Too many companies built on MCP thinking it was production-ready, when it was really an early-stage experiment. The ecosystem wasted cycles on worthless servers and poor patterns.

Most developers won't implement this correctly. Code execution with MCP requires discipline. Security thinking. Systems programming knowledge. Most teams have none of these.

A2A already solved this. The fact that Anthropic is discovering code execution as a superior pattern suggests they weren't paying enough attention to other protocol design efforts. A2A's designers understood context efficiency from day one.

The 98% stat cuts both ways. Yes, code execution saves tokens. But it also reveals that 98% of MCP servers were architectural mistakes. That's not a small problem to sweep past.

Practical Reality:What This Means for You

If you're building with MCP right now, here's what I'd do =>

Audit your MCP dependencies immediately. Count how many MCP servers you're using. If you're at 5+, you have a context problem. If you're at 10+, your implementation is almost certainly broken.

Don't build toy MCP servers. If you're thinking of building an MCP server for internal use, start by assuming it needs OAuth, permissions, and a database. If that overhead seems excessive for your use case, you don't need MCP. You need direct API calls.

Migrate toward code execution patterns now. If you're building agents, start having them write code instead of calling tools directly. This is the future. Starting now means you won't need to rewrite everything in six months.

Plan for the A2A transition. Start thinking about A2A as your ingestion layer and MCP as your backend execution layer. Build abstraction layers so you can swap implementations without rewriting your entire system.

Monitor token efficiency obsessively. Start measuring context window usage per request. Track what percentage of context is being used and why. This is your primary metric for system health.

The Bigger Picture

The MCP story is actually the story of how the AI industry approaches problem-solving.

We saw a problem (tools don't integrate with models well) and we picked a solution quickly (expose tool definitions as function calls). It made sense at the time. It even worked, for a while.

Then reality hit. The solution didn't scale. The industry tried to patch it with better tools, better definitions, bigger context windows.

Finally, we realised:the problem wasn't the tools. The problem was the architecture.

And when you fix the architecture. When you shift from "load all tool definitions upfront" to "load tools on demand via code execution", you solve it cleanly.

This is what expertise looks like. Not intelligence. Not raw capability. But the hard-earned pattern recognition that comes from shipping broken systems and fixing them.

The MCP ecosystem is finally learning this lesson. Not by design. By necessity.

Conclusion:Back to the Beginning

Remember where we started? Steve Jobs saying people don't know what they want until you show them.

The AI industry didn't want a context-efficient protocol. It wanted a simple tool integration standard. But reality,cold, unforgiving, production reality, showed us what we actually needed.

Context efficiency. On-demand tool discovery. Separation of concerns.

MCP will evolve to support this. A2A will mature alongside it. The ecosystem will consolidate. Many servers will be deprecated.

And 98% of the MCP servers built in Era 1? They'll be quietly archived, reminders of what happens when you optimize for the wrong metric.

But here's what gives me hope:the industry is learning. We're being honest about what didn't work. We're building better architecture. We're respecting constraints instead of ignoring them.

That's not failure. That's expertise.

And expertise, unlike intelligence, can't be automated.


Links & Resources