The Amplifier and the Evangelist
A grounded study on AI coding tells one story. The conference keynotes tell another.
Dave Farley recently discussed a study he helped design—one of the more rigorous experiments on AI coding to date. Professional developers (not students) wrote code with and without AI assistance. Then different developers maintained that code without knowing which was which.
The findings are pretty interesting. AI-assisted code was no harder to maintain than human-written code. No worse in quality. No better either. From a downstream perspective, AI didn’t break anything.
AI users were about 30% faster. Habitual AI users closer to 55%.
The most salient finding is that when experienced developers used AI habitually, their code showed measurable improvement in maintainability. When less experienced developers used AI, no such improvement.
It suggests AI acts as an amplifier. If you’re already doing the right things, AI amplifies that.
What does “doing the right things” mean? Working in small batches, solving one problem at a time, iterating with continuous testing, designing modular systems that contain the impact of changes.
Farley also flags what he calls “cognitive debt”:
“If developers stop thinking, really thinking about the code that they create, then over time understanding erodes, skills atrophy, and innovation slows. This is exactly the kind of long-term risk that doesn’t show up in a sprint metric.”
His conclusion is worth quoting at length:
“AI assistants clearly improve short-term productivity. And contrary to a lot of popular opinion, they do not appear to damage the maintainability of the systems that they produce. And they might even slightly improve it when they’re used well, but they don’t remove the need for good engineering.
They don’t remove the need for good design and the kind of broad understanding and experience that allows us to produce good design. And they certainly don’t remove the need for thinking hard about the problems that we face, how to decompose them into small pieces that allow our AI assistants to do a good job and how to guide them towards solutions that we are happy with.
This compartmentalization through decomposition is the central fundamental aspect of building software. This is the real technical skill at the heart of what it is that we do. And it is this rather than the speed of typing code that differentiates good software development from slop whether AI generated or not. As always tools matter but how we use them matters more.”
None of this is surprising. All of it is grounded. The study measures what actually happens downstream, not just “did the developer type faster.”
This matches my experience - though from a different angle. I’ve been building software with AI for nearly two years—Thunderbird extensions, workflow engines, integrations. I’m not a developer. AI lets me translate intent into working software. But I’ve known for a while that there’s a gap between “I tested it and it works” and “I understand what it actually does.” When I have an engineer look at code I’ve built, they sometimes find the implementation doesn’t match my mental model. It’s mostly right. The happy path works. But the underlying logic has assumptions I hadn’t specified.
For a developer, cognitive debt is something that might accumulate over time. For someone like me, the gap is just part of the landscape. I bring domain knowledge, not code knowledge.
The verification gap is real. Skill still matters. AI is a tool, not a replacement for judgment.
Now contrast this with Dario Amodei.
The Kitchen Conversation
Amodei was on stage recently with Demis Hassabis. He said: “I have engineers within Anthropic who say I don’t write any code anymore. I just let the model write the code. I edit it. I do the things around it.”
This has the texture of something a CEO hears in passing and repeats at conferences. An engineer says “I don’t write code anymore” meaning “AI writes the first draft and I spend my time reviewing, fixing, and designing.” Dario hears “AI does it all” and carries that into a keynote about how we’re 6-12 months from AI doing most of what software engineers do end to end.
It feels like evangelist talk. High-level pattern matching. The kind of statement that sounds good in a conference hall but dissolves under scrutiny.
And then you look at the outcomes.
The Claude Code Problem
ThePrimeagen has a video critiquing Claude Code’s terminal interface. I can’t evaluate the technical claims myself, but his argument goes like this: Anthropic built a TUI (text user interface - the kind of application that runs in a terminal window) that apparently requires 60 frames per second of rendering and struggles with performance—for what is essentially a static display receiving text updates a few times per second.
According to Primeagen, the architecture choices don’t make sense. Claude Code is a text interface - you type, it thinks, it responds with text. Nothing is moving. Nothing is animated. It’s not a video game. But Anthropic built it like a video game, rendering the screen 60 times per second even when nothing is changing.
Primeagen’s point: when you’re waiting for Claude Code to respond, when nothing is happening on screen, the system should be idle. No work to do means no work being done. Instead, Claude Code is constantly redrawing, constantly diffing, constantly burning cycles on a static display.
The result is a tool that struggles to hit performance targets for displaying text. Text.
I can’t verify the technical critique. And maybe there’s a reason for Claude Code’s architecture that Primeagen doesn’t see - some constraint or requirement that makes the choices make sense.
But there’s another possibility. If Dario’s evangelism has caught on internally - if the culture is to give over a little too much to AI code generation - then this is what you’d expect to see. Not catastrophic failure, just... odd choices. Architecture that works but doesn’t quite make sense. The kind of thing that happens when humans aren’t fully in the loop.
The Farley study says AI amplifies existing competence. Good developers get better. The rest dig holes faster.
It’s hard to join the dots directly. But Primeagen’s critique raises a question worth asking.
Two Conversations
There are two conversations happening about AI and coding.
One is grounded. Measure what actually happens. Look at maintainability, not just speed. Acknowledge that skill matters, that AI is a tool, that the hard problems are still hard.
The other is evangelical. “My engineers don’t write code anymore.” “Six to twelve months from AI doing everything.” The confident projection that dissolves when you look at actual outcomes.
The grounded conversation matches my experience. I can build things I couldn’t build before. The productivity gains are real. But I also know there’s a gap between what I think I built and what the code actually does. When the system is simple, the gap is small. When it’s complex, the gap grows. An engineer can read the code and understand actual behavior. I can only test apparent behavior.
That’s not a reason to stop building. It’s a constraint worth naming.
The evangelical conversation worries me more. Not because AI isn’t powerful—it is. But because the confident claims don’t seem to survive contact with reality. Dario says his engineers don’t write code. Meanwhile Claude Code ships with architecture that (it seems) any experienced engineer would question.
Maybe the lesson is simpler than it seems: currently, AI amplifies. It doesn’t replace judgment. Even at Anthropic. That could change - things are moving fast - but this is how it looks now.


