Preliminary data from a longitudinal AI impact study(newsletter.getdx.com)

40 pointsby donutshop8 hours ago14 comments

SirensOfTitan4 hours ago
This reads as incredibly damning to me. PR throughput should be a metric that is very supportive of the AI productivity narrative, but the effect is marginal.
Before everyone gets at me: smoking cigarettes increases your risk of lung cancer by 15-30x. Effect size matters. As does margin of error: what is the margin of error? This "increase" could easily be within noise.
PR throughput is also not a metric I would ever use to determine developer productivity for a paradigm shifting technology. I would only ever use it to compare like-to-like to find trailheads: is a team or person suddenly way more or less productive? The primary endpoint for software production is serving your customer or your mission, and PR throughput can't tell you whether any of that got better. It also cannot tell you the cost of your prior work: the increase in PR throughput could be more PRs to fix issues introduced by LLM-assisted work.
- lumost2 hours ago
  I suspect the issue is the SDLC methodology of existing mature products. The "I can build it in a weekend" use case has gotten a massive boost as you can build something which "looks" real faster then ever. Mature teams need to deal with backwards compatibility and real development risk.
jwilliams3 hours ago
I wrote a short bit on a similar topic the other day[^a]. Just because something is faster or even measurably better, that doesn't translate to end productivity.
1. You might be speeding up something that is inherently not productive (the "faster horses" trope). I see companies using AI to generate performance reviews. Same company using AI to summarize all the new performance stuff they're getting. All that's happening is amplified busywork (there is real work in there, but questionable if it's improved).
2. Some things are zero sum. If you're not using AI for marketing you might fall behind. So you adopt these tools, but attention/etc are limited. There is no net gain, just competition.
3. You might speed one part up (typing code), but then other parts of your pipeline quickly become constraints. It might be a long time before we're able to adapt the end-to-end process. This is amplified by coding tools being three strides ahead.
4. Then there are actual productivity improvements. One of these PRs could have been "translate this to German". That could be one PR but a whole step-change for the business.
So much of what is happening falls in buckets 1+2+3. I don't think we've really got into the meat of 4 yet.
a: https://jonathannen.com/ai-productivity/
rconti33 minutes ago
As someone who spends an hour or two a week using AI tools, it's a time sink. I spend (almost) all my LLM time keeping on top of changes in my company's preferred tooling, updating software, updating authentication mechanisms, etc.
If I spent twice the time with these tools, most of the additional time invested would be "profit". So maybe there's something to these arguments that "it will only get better".
OTOH, we also see this with business investments of all types. "We're spending all our revenue on growth, if we wanted profits, we could slow investment at any time, and be immediately profitable!"
0xbadc0de55 hours ago
Fair assessment. And worth noting that in a sane world, a broad 10% productivity improvement across industry would be a once-in-a-lifetime, headline-making story, not a disappointment.
- Swizec3 hours ago
  > And worth noting that in a sane world, a broad 10% productivity improvement across industry would be a once-in-a-lifetime, headline-making story, not a disappointment
  The biggest risk in software development is building the wrong thing. Digging yourself into a hole 10% faster is _worse_. You now have more backtracking to do!
- nubg3 hours ago
  Agreed, but if that came at a cost of 1 trillion dollars of debt and investments, it might be a disappointment again.
  Note that I am bullish on AI coding in general, just trying to contextualize your statement.
tcskeptic13 minutes ago
The idea that any results of AI productivity enhancements that include data from 2024 are valid is bananas.
I’m not even a programmer — but the step change since late fall 2025 is incredible.
I have a young relative that manages in house product for a financial services company. Programming team of 150 ish. That will be 15 ish by June and they are iterating much more quickly now.
So much cope in this thread. AI is in fact the grim reaper for the median coder. The emerging middle class in India tech hubs is about to get vaporized
- slopinthebag9 minutes ago
  > AI is in fact the grim reaper for the median coder.
  Is it? You don't think the median coder can use these AI tools? In my experience they're incredibly simple to use.
rybosworld5 hours ago
> Planning, alignment, scoping, code review, and handoffs—the human parts of the SDLC—remain largely untouched
Seems likely that process is holding things back. Planning has always been a "best-guess". There's lots you can't account for until you start a task.
Code review mostly exists because the cost of doing something wrong was high (because human coding is slow). If you can code faster, you can replace bad code faster. I.e., LLMs have cheapened the cost of deployment.
We can't honestly assess the new way of doing things when we bring along the baggage of the old way of doing things.
- felipeerias4 hours ago
  Planning might end up being more reliable thanks to coding agents: if you want to estimate how long a task would take, just send an agent to do it.
  If the agent comes back in a few minutes with a tiny fix, it is probably a small task.
  If the agent produces a large, convoluted solution that would need careful review, it is at least a medium task.
  And if the agent gets stuck, runs into architectural constraints, etc. then it is definitely a hard task.
- nine_k2 hours ago
  The cost of doing something wrong still is high. Even if bad code is produced instantaneously, its detrimental effect on production remains the same. Yes, yes, what fell on the floor and was picked up in five seconds is still considered fine to eat! Does not apply to eggs though. Customer trust is usually such an egg.
  Writing code has become much faster. Writing correct and reliable code has become somehow faster, but not nearly as much. Understanding what code to write has barely become faster.
  The more novel is the code you're writing, the smaller are gains from AI writing it.
slopinthebag18 minutes ago
I think it's possible we could get to something like %30. I imagine that AI is being overused by a lot of devs in places where it has a marginal or even negative impact of productivity. It's easy to remember your "WOW!" moments when it does something quickly and flawlessly, but I think we tend to forget the times when it struggles or when you outright have to throw it away and start over.
So even just learning how to use it better (or less side-eying the AI boosters) will lead to improvements. I'm not sure how much of a difference incremental improvements on the models or harnesses will make.
Of course this is also skewed by what type of work you do - textbook stuff that's well inside the distribution will have more tasks that can be accomplished with AI tools than stuff that is less well represented in the training data. It's not a mystery why the majority of my AI usage is with HTML and CSS.
naasking4 hours ago
Sounds reasonable, but gains will go up. There is a ceiling somewhere, but we don't know where it is.
- Insanity4 hours ago
  Yup, and the ceiling could be at 11% or at 50%. But my bet is closer to a lower-range ceiling than an upper-range. Model's are no longer revolutionary, they are evolutionary, and the evolution and per model-version difference is narrowing each release.
  - naasking3 hours ago
    > Model's are no longer revolutionary, they are evolutionary, and the evolution and per model-version difference is narrowing each release.
    We've definitely culled some low hanging fruit, but I think there's still a lot of room for improvements that could lead to step changes in capabilities. I think we're only scratching the surface of looped language models, thinking in latent space, and multimodality.
    And even if the per-model differences are narrowing, even single digit improvements in performance metrics could yield outsized effects in applicability and productivity. Consider services that guarantee one 9 of reliability vs. five 9s. In absolute terms that change is a trivia difference, but the increased reliability allows use in way, way more domains.
deterministic2 hours ago
If you think a department or individual working 10% faster makes a company 10% more productive, you’re almost certainly wrong.
Productivity only improves if the change increases revenue or reduces costs. And that rarely happens unless you improve the actual bottleneck of the organization.
To understand why, I recommend the book The Goal: A Process of Ongoing Improvement by Eliyahu M. Goldratt and Jeff Cox.
enraged_camel5 hours ago
>> November 2024 through February 2026
Yeah, listen... I'm glad these types of studies are being conducted. I'll say this though: the difference between pre- and post-Opus 4.5 has been night and day for me.
From August 2025 through November 2025 I led a complex project at work where I used Sonnet 4.5 heavily. It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.
So I'm much more interested in reading studies like this over the next two years where the start period coincides with Opus 4.5's release.
- jackschultz4 hours ago
  Very much agree. Gave a presentation on AI to a group earlier this week and I spent a third of the time talking about the Opus 4.5 inflection point in AI history. First time using that model the day it was released it was so clear that it knew what it was doing at a different level. People still jump around to different models or tools or time frames when talking about AI and usefulness, but those have no meaning if they’re not using the Opus 4.5 and 4.6 models and anthropic harnesses of Claude code or cowork.
  I’m interested in the studies along with the history of AI and if they’re going to realize that was the point when things changed, because for us devs, that was the moment.
  - nubg3 hours ago
    Would you mind sharing the presentation? Or an AI summary of it.
- esseph4 hours ago
  I swear people say this with every single model and release version, without fail.
- slopinthebag4 hours ago
  > It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.
  How did you track these gains?
nemo44x3 hours ago
At the very least teams will communicate with each other much better. So much of the tedium of office work is able to be automated so people can spend more time solving problems instead.
But the communication will massively improve. More artifacts being generated of progress and needs and AI can link related things around an organization rapidly and accurately. Workflows will massively improve. A living graph of an entire organization will come to life.
I think more productivity gains will come from this automation than anything. People will look back at all the drudgery workers did.
arisAlexis6 hours ago
because the human may be the bottleneck soon
- eucyclos4 hours ago
  It might be more accurate to say humans will only work at the bottlenecks soon, unless I've misunderstood the vector of your commentary.
  - SiempreViernes4 hours ago
    A lot of AI-boosting commentary does speak in terms where there are hardly any humans left in their dream world, so it makes sense to ask if that's what they mean!
jongjong4 hours ago
As I've said before, AI is a force multiplier. A 10x developer is now a 100x developer and a -10x developer (complexity maker/value destroyer) is now a -100x developer.
I can understand why a lot of companies are cutting junior roles. What AI does is it automates most of the stuff that juniors are good at (coding fast) but not much of the stuff that the seniors are good at.
That said, I've worked with some juniors who managed to navigate; they do this by focusing on higher order thinking and developing a sense of what's important by interacting with senior engineers. Unfortunately, it raises the talent bar for juniors; they have to become more intelligent; not in a puzzle-solving way, but in a more architectural big-picture sort of way; almost like entrepreneurial thinking but more detailed/complex.
LLMs don't have a worldview; this means that they miss a lot of inconsistencies and logical contradictions. Also, most critically, LLMs don't know what's important (at least not accurately enough) so they can't prioritize effectively and they make a lot of bad decisions.
It's kind of interesting for me because a lot of the areas where I had a contrarian opinion in the field of software development, I now see LLMs getting trapped into those and getting bad results. It's like all my contrarian opinions became much more valuable.
- skyberrys25 minutes ago
  Do the juniors now develop more reliably into 100x seniors or is it more of a fluctuating line?
- christophilusan hour ago
  I agree. Agents are a form of leverage. Leverage accelerates things in all directions. You can use it to fuel massive growth and to accelerate bankruptcy.
verdverm7 hours ago
so far, we're still learning how to use this new tool, which is also getting better with each release
- dude2507116 hours ago
  I agree, it was about 10.29% earlier this year, now we are standing at least at 10.35% or something.
  - verdverm6 hours ago
    The last one that made the rounds was negative, so we have moved more than 10% in less than 1/2 a year
    eucyclos4 hours ago
    That's got to be more about processes around it than the tool itself though, right?