The Markdown Fallacy and AI
Why the smartest people in AI are puzzled by the gap between test scores and reality.
I was listening to Ilya Sutskever’s conversation with Dwarkesh Patel recently, and I kept hearing the same note of confusion.
The models we have today are strong. They write code, pass exams, and hold together long chains of reasoning. In a narrow sense, they contain more accessible knowledge than any single human ever could.
And yet Sutskever—one of the people who built this entire field—kept circling a question he couldn’t answer:
If the models are this capable on paper, why aren’t they doing the kinds of things a very smart human with this much information would do?
I think the models seem smarter than their economic impact would imply. Yeah. This is one of the very confusing things about the models right now. ... How to reconcile the fact that they are doing so well on evals? ... But the economic impact seems to be dramatically behind.
Ilya Sutskever
He compares them to a student who has studied 10,000 hours for one exam. The student knows every trick and every answer. But they don’t have the “broad intuition” of a practitioner.
The machinery is impressive.
The outcomes are underwhelming.
From a “more data = more intelligence” mindset, this looks like a bug. But if you view it as a systems problem, it starts to look like a category error.
The Markdown Fallacy
Years ago, I worked with a developer who adored Markdown. Absolutely loved it.
To him, Markdown was the perfect abstraction: simple, elegant, composable.
So naturally, he concluded that everyone should use it, for everything.
He wasn’t satisfied with “Markdown is great for developers.” He wanted knowledge bases, content tools, UI layers—whole systems—built around it. The fact that most people don’t think in plaintext symbols was, in his view, a user error.
His logic went like this:
This abstraction works for me.
It is logically sound.
Therefore the world should be built around it.
I see that same pattern in a lot of AI thinking.
The brain is a computing device.
Intelligence is computation over data.
Therefore the mind should be built around it.
In both of these worldviews, emotions, irrationality, and human messiness are bugs.
But what if they’re the operating system—the thing that decides what matters enough to process in the first place?
Infinite Abstraction, Zero Risk
If you start from the assumption that intelligence is computation over data, then the obvious response to any shortcoming is simple:
add more computation.
So we keep stacking the software.
Agents on top of agents on top of agents:
AI to design the system
AI to write the code
AI to check the code
AI to explain whether the whole thing even makes sense
It’s abstraction all the way down.
And the quiet hope is that somewhere in that depth, “judgment” will emerge.
But nothing in that stack needs to judge anything.
You can build a tower of logic-processing agents, but none of them care if the project fails.
They don’t lose their reputation.
They don’t lose their job.
They don’t go hungry.
They can produce the sound of concern, but they don’t carry the consequences that force a human to care.
And this is where the conversation often drifts toward “intuition,” “judement,” “taste,” or whatever word we’re using that day for the missing piece. But those aren’t magical traits. They’re what you get when a system actually has something at stake.
The Missing “Itch”
Sutskever himself seems to be looking for this missing biological layer:
...the value function of humans is modulated by emotions in some important way that’s hardcoded by evolution. And maybe that is important for people to be effective in the world.
Ilya Sutskever
In open source, people talk about the “itch to scratch” model. Software gets written because someone has a problem they can’t ignore—an annoyance, a friction point, a cost that finally becomes too irritating not to address.
But for a human mind, that itch isn’t just about fixing a bug. The itch is the Operating System. It is the itch of ego, shame, survival.
That itch is not inspiration. It’s pressure. It’s consequence. And that pressure is the engine of the work.
When Sutskever looks for a mathematical equivalent of intuition—a model that “wants” to solve the problem—he’s really looking for that itch.
But you can’t graft wanting onto a system that never experiences loss. You don’t get an itch from a machine that doesn’t acutely feel the problem.
If you treat intelligence as pattern-learning, AGI looks like a scaling problem. If you treat intelligence as a survival strategy, the target shifts.
Then you’re not building a better learner. You’re building a system that lives with the consequences of its behavior.
Sutskever and Dwarkesh are right to be puzzled by the gap. The reason the first hasn’t turned into the second might simply be that they’re different categories.
My colleague was wrong about Markdown because he forgot that users aren’t text parsers. We might be making the mirror mistake: assuming that text parsers are human.


