Let me break this down: imagine you're told a magical tool will make you 55% more productive. You install it, use it for months, and swear it's helping. But when someone measures your actual performance with a stopwatch, it turns out you're 19% slower than before you started using it.
That's exactly what METR's study published in July 2025 discovered, and the data is so uncomfortable for Silicon Valley that almost no one wants to talk about it.
The study no one wanted to see
METR (Model Evaluation and Threat Research) is a nonprofit AI safety organization based in Berkeley, California. It was founded by Beth Barnes, a former alignment researcher at OpenAI. They don't have stock in Cursor or Anthropic. They don't sell coding tools. They just wanted to measure the truth.
And the truth is uncomfortable.
The numbers don't lie
The study gathered 16 expert developers with one special characteristic: they all worked on their own repositories. These weren't made-up lab tasks. These were real bugs, real features, real refactors in projects they themselves maintained.
| Metric | Value |
|---|---|
| Developers participating | 16 |
| Tasks completed | 246 |
| Average experience in repos | 5 years |
| Average repository size | 1+ million lines of code |
| Average GitHub stars | 22,000+ |
| AI tools used | Cursor Pro + Claude 3.5/3.7 Sonnet |
Each task was randomly assigned: some completed with AI, others without. Developers recorded their screens and reported their times. A textbook randomized controlled trial (RCT).
The 40% paradox: belief vs. reality
Here's what most guides won't tell you: the developers weren't just slower—they were convinced they were faster.
- Before starting: predicted they'd be 24% faster with AI
- After finishing: believed they'd been 20% faster
- Actual measured result: they were 19% SLOWER
The gap between perception and reality is nearly 40 percentage points. It's like believing you ran a marathon in 3 hours when you actually took 4 and a half.
Why do we believe AI helps when it doesn't?
One participant explained it perfectly:
"I thought AI was speeding me up. But we don't focus on all the time we actually spend—just on how pleasant it was to go back and forth with the AI."
The trick is that coding with AI feels easier, even if it's not faster. It requires less cognitive effort. It's like preferring to drive even though it takes longer than the subway: the journey feels less taxing.
In fact, 69% of participants kept using Cursor after the study, despite knowing it slowed them down.
Why does AI slow down experts?
The researchers identified five main factors:
1. Over-optimism
We've all seen demos where AI writes a complete app in minutes. The problem is those demos use new, small projects—not million-line codebases with years of implicit architectural decisions.
2. The developers knew too much
When you've spent 5 years working on a repository, you know every corner. You know why that if statement is there, why that function has that weird name, which dependencies not to touch. AI knows none of this.
One developer compared it to "a new contributor who doesn't understand the codebase yet." Imagine having to explain all the context to a junior for every task. Sometimes it's faster to just do it yourself.
3. Repositories too large and complex
AI tools shine with small, well-documented projects. But when you have over a million lines of code, AI simply can't process all the necessary context.
4. Low acceptance rate
Developers accepted less than 44% of AI suggestions. That means more than half of the generated code was rejected or required cleanup.
5. Hidden time costs
Look at how time was distributed when using AI:
- 9% reviewing and cleaning AI outputs
- 4% waiting for AI to generate responses
- Less time actively programming
- Less time searching for information
That 13% of time "lost" to AI interaction adds up task after task.
The exception that proves the rule
There was one developer who was actually faster with AI: approximately 20% more productive. The difference? They had over 50 hours of intensive experience with Cursor.
This suggests something important: the learning curve for these tools might require hundreds of hours before you see real benefits. And most studies (including those from the companies themselves) don't measure this.
The elephant in the room: industry promises
Now let's compare these results with what companies are selling us:
| Source | Claim | Context |
|---|---|---|
| GitHub/Microsoft (2023) | 55% faster | Simple task (HTTP server), more benefit for juniors |
| Google DORA (2025) | Higher throughput | But stability concerns |
| METR (2025) | 19% slower | Seniors, mature repos, real tasks |
The GitHub study everyone cites ("55% faster") had a problem: developers completed a simple, artificial task. It's like measuring a new car's speed only on an empty race track.
Other independent studies aren't optimistic either
Uplevel Data Labs measured 800 developers with objective metrics and found no productivity gain and 41% more bugs. Bain reported that time savings in real enterprise adoption were "not notable."
The reactions: from denial to recognition
Critics of the study
Emmett Shear, former interim CEO of OpenAI and founder of Twitch, was direct:
"METR's analysis is tremendously misleading. The results indicate that people who essentially NEVER used AI tools are less productive while learning to use them, and say nothing about experienced AI users."
Shear has a valid point: only 1 of 16 developers had more than a week of experience with Cursor specifically. But that also reveals something: most companies adopt these tools without giving adequate learning time.
The developer community
On Hacker News and Reddit, reactions were mixed. One backend developer summarized many people's frustration:
"I hate fixing AI-written code. It solves the task, yes. But it has no vision. AI code lacks a sense of architecture, intent, or care."
Stack Overflow data confirms the skepticism
The Stack Overflow 2025 survey showed:
- Trust in AI dropped from 43% to 33%
- Positive sentiment dropped from 70% to 60%
- But adoption rose to 84%
In other words: more and more people use AI for coding, but fewer and fewer trust it. An interesting paradox.
What does this mean for you?
If you're a programmer, I'm not telling you to uninstall Cursor or Claude Code. But I am telling you to measure your real productivity, not your feeling.
Where AI actually helps
- MVPs and prototypes: When code quality matters less than speed
- Boilerplate: Repetitive code anyone could write
- Unit tests: Generating basic test cases
- Documentation: Explaining existing code
- Unfamiliar codebases: When you're the newbie, not the AI
Where AI probably slows you down
- Your own mature codebase: Where you're already the expert
- High-standard code: Where every detail matters
- Complex architecture: Where implicit context is key
- Deep debugging: Where you need to understand the "why"
For companies: beware of phantom ROI
If you're a tech lead or CTO, this study should make you rethink how you measure the impact of AI tools on your team.
Common mistakes
- Measuring by perceptions: "Do you feel more productive?" isn't a valid metric
- Adoption without training: Installing Copilot isn't the same as integrating it correctly
- Ignoring the learning curve: 50+ hours of intensive practice doesn't happen in a week
- Scaling before validating: Using AI where it doesn't make sense
What actually works
- Adopt a portfolio mindset: Use AI where it augments cognition (docs, boilerplate), not where human expertise dominates
- Measure objectively: Real time per task, not satisfaction surveys
- Identify real use cases: Not all tasks benefit equally
- Give learning time: If you expect results in the first week, you'll be disappointed
Market context: valuations vs. reality
Meanwhile, AI coding company valuations keep climbing:
| Company | Valuation | Date |
|---|---|---|
| Cursor | $29.3 billion | November 2025 |
| Cognition (Devin) | $10.2 billion | September 2025 |
Cursor reported over $1 billion in annualized revenue in November 2025. The market is paying for adoption, not proven productivity.
This doesn't mean these companies have no value. It means the value the market assigns them is based on promises, not rigorous evidence.
The uncomfortable question
If AI really made developers 55% faster, why haven't software companies cut their engineering teams in half? Why are they still hiring at the same rate?
The likely answer: because CTOs who use these tools daily know that marketing numbers don't translate to real productivity. They know that a senior developer with experience in the codebase is still irreplaceable.
My takeaway
After analyzing this study and dozens of sources, here's what I think:
AI coding tools have real value, just not the value they're selling us. They're useful for specific tasks, for certain developer profiles, in certain contexts. They're not a magic wand that multiplies everyone's productivity.
The problem isn't the technology. It's the narrative. We've been sold that AI is the future of software development, when in reality it's one more tool in the programmer's arsenal. A tool that, like all tools, has its place and its limitations.
If you use Cursor or Claude Code, keep doing so. But measure your real productivity. Don't be fooled by the feeling that everything is easier. Easier doesn't always mean faster.
And if someone tells you AI will make you 55% more productive, ask them: in what context? With what prior experience? On what types of tasks?
The METR data suggests the honest answer is more complicated than the industry wants to admit.



