Before everyone gets at me: smoking cigarettes increases your risk of lung cancer by 15-30x. Effect size matters. As does margin of error: what is the margin of error? This "increase" could easily be within noise.
PR throughput is also not a metric I would ever use to determine developer productivity for a paradigm shifting technology. I would only ever use it to compare like-to-like to find trailheads: is a team or person suddenly way more or less productive? The primary endpoint for software production is serving your customer or your mission, and PR throughput can't tell you whether any of that got better. It also cannot tell you the cost of your prior work: the increase in PR throughput could be more PRs to fix issues introduced by LLM-assisted work.
1. You might be speeding up something that is inherently not productive (the "faster horses" trope). I see companies using AI to generate performance reviews. Same company using AI to summarize all the new performance stuff they're getting. All that's happening is amplified busywork (there is real work in there, but questionable if it's improved).
2. Some things are zero sum. If you're not using AI for marketing you might fall behind. So you adopt these tools, but attention/etc are limited. There is no net gain, just competition.
3. You might speed one part up (typing code), but then other parts of your pipeline quickly become constraints. It might be a long time before we're able to adapt the end-to-end process. This is amplified by coding tools being three strides ahead.
4. Then there are actual productivity improvements. One of these PRs could have been "translate this to German". That could be one PR but a whole step-change for the business.
So much of what is happening falls in buckets 1+2+3. I don't think we've really got into the meat of 4 yet.
If I spent twice the time with these tools, most of the additional time invested would be "profit". So maybe there's something to these arguments that "it will only get better".
OTOH, we also see this with business investments of all types. "We're spending all our revenue on growth, if we wanted profits, we could slow investment at any time, and be immediately profitable!"
The biggest risk in software development is building the wrong thing. Digging yourself into a hole 10% faster is _worse_. You now have more backtracking to do!
Note that I am bullish on AI coding in general, just trying to contextualize your statement.
I’m not even a programmer — but the step change since late fall 2025 is incredible.
I have a young relative that manages in house product for a financial services company. Programming team of 150 ish. That will be 15 ish by June and they are iterating much more quickly now.
So much cope in this thread. AI is in fact the grim reaper for the median coder. The emerging middle class in India tech hubs is about to get vaporized
Is it? You don't think the median coder can use these AI tools? In my experience they're incredibly simple to use.
Seems likely that process is holding things back. Planning has always been a "best-guess". There's lots you can't account for until you start a task.
Code review mostly exists because the cost of doing something wrong was high (because human coding is slow). If you can code faster, you can replace bad code faster. I.e., LLMs have cheapened the cost of deployment.
We can't honestly assess the new way of doing things when we bring along the baggage of the old way of doing things.
If the agent comes back in a few minutes with a tiny fix, it is probably a small task.
If the agent produces a large, convoluted solution that would need careful review, it is at least a medium task.
And if the agent gets stuck, runs into architectural constraints, etc. then it is definitely a hard task.
Writing code has become much faster. Writing correct and reliable code has become somehow faster, but not nearly as much. Understanding what code to write has barely become faster.
The more novel is the code you're writing, the smaller are gains from AI writing it.
So even just learning how to use it better (or less side-eying the AI boosters) will lead to improvements. I'm not sure how much of a difference incremental improvements on the models or harnesses will make.
Of course this is also skewed by what type of work you do - textbook stuff that's well inside the distribution will have more tasks that can be accomplished with AI tools than stuff that is less well represented in the training data. It's not a mystery why the majority of my AI usage is with HTML and CSS.
We've definitely culled some low hanging fruit, but I think there's still a lot of room for improvements that could lead to step changes in capabilities. I think we're only scratching the surface of looped language models, thinking in latent space, and multimodality.
And even if the per-model differences are narrowing, even single digit improvements in performance metrics could yield outsized effects in applicability and productivity. Consider services that guarantee one 9 of reliability vs. five 9s. In absolute terms that change is a trivia difference, but the increased reliability allows use in way, way more domains.
Productivity only improves if the change increases revenue or reduces costs. And that rarely happens unless you improve the actual bottleneck of the organization.
To understand why, I recommend the book The Goal: A Process of Ongoing Improvement by Eliyahu M. Goldratt and Jeff Cox.
Yeah, listen... I'm glad these types of studies are being conducted. I'll say this though: the difference between pre- and post-Opus 4.5 has been night and day for me.
From August 2025 through November 2025 I led a complex project at work where I used Sonnet 4.5 heavily. It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.
So I'm much more interested in reading studies like this over the next two years where the start period coincides with Opus 4.5's release.
I’m interested in the studies along with the history of AI and if they’re going to realize that was the point when things changed, because for us devs, that was the moment.
How did you track these gains?
But the communication will massively improve. More artifacts being generated of progress and needs and AI can link related things around an organization rapidly and accurately. Workflows will massively improve. A living graph of an entire organization will come to life.
I think more productivity gains will come from this automation than anything. People will look back at all the drudgery workers did.
I can understand why a lot of companies are cutting junior roles. What AI does is it automates most of the stuff that juniors are good at (coding fast) but not much of the stuff that the seniors are good at.
That said, I've worked with some juniors who managed to navigate; they do this by focusing on higher order thinking and developing a sense of what's important by interacting with senior engineers. Unfortunately, it raises the talent bar for juniors; they have to become more intelligent; not in a puzzle-solving way, but in a more architectural big-picture sort of way; almost like entrepreneurial thinking but more detailed/complex.
LLMs don't have a worldview; this means that they miss a lot of inconsistencies and logical contradictions. Also, most critically, LLMs don't know what's important (at least not accurately enough) so they can't prioritize effectively and they make a lot of bad decisions.
It's kind of interesting for me because a lot of the areas where I had a contrarian opinion in the field of software development, I now see LLMs getting trapped into those and getting bad results. It's like all my contrarian opinions became much more valuable.