Productivity gains from AI coding assistants haven’t budged past 10% – survey(shiftmag.dev)

61 pointsby taubek3 hours ago20 comments

jdlshore2 hours ago
This is self-reported productivity, in that devs are saying AI saves them about 4 hours per week. But let’s not forget the METR study that found a 20% increase in self-reported productivity but a 19% decrease in actual measured productivity.
(It used a clever and rigorous technique for measuring productivity differences, BTW, for anyone as skeptical of productivity measures as I am.)
- keedaan hour ago
  Let's also not forget the multiple other studies that found significant boosts to productivity using rigorous methods like RCTs.
  However, because these threads always go the same way whenever I post this, I'll link to a previous thread in hopes of preempting the same comments and advancing the discussion! https://news.ycombinator.com/item?id=46559254
  Also, DX (whose CTO was giving the presentation) actually collects telemetry-based metrics (PR's etc.) as well: https://getdx.com/uploads/ai-measurement-framework.pdf
  It's not clear from TFA if these savings are self-reported or from DX metrics.
- samuelknight2 hours ago
  https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
  That info is from mid 2025, talking about models released in Oct 2024 and Feb 2025. It predates tools like Claude Code and Codex, Lovable was 1/3 current ARR, etc.
  This might still be true but we desperately need new data.
  - lunar_mycroft2 hours ago
    None of those changes address the issue jdlshore is pointing out: self assessed developers productivity increases from LLMs are not a reliable indication of actual productivity increases. It's true that modern LLMs might have less of a negative impact on productivity or increase it, but you won't be able to tell by asking developers if they feel more productive.
    (Also, Anthropic released Claude Code in Febuary of 2025, which was near the start of the period the study ran).
  - monkaiju2 hours ago
    Yeah new data would be great, but i feel like these tools are not substantively better and this is becoming the new "its different this time!"
- williamcotton2 hours ago
  Has the METR study been replicated?
  - lunar_mycroft2 hours ago
    Not a scientific study, but someone did replicate the experiment on themselves [0] and found that in their case, any effect from LLM use wasn't detectable in their sample. Notably they almost certainly had more experience with LLMs than most of the METR participants did.
    [0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...
  - jdlshore2 hours ago
    I haven’t heard about any similar studies, no. I’m planning to conduct one at my workplace but we’re still deciding exactly which uses of AI to test.
overgard3 hours ago
You're only as fast as your biggest bottleneck. Adding AI to an existing organization is just going to show you where your bottlenecks are, it's not going to magically make them go away. For most companies, the speed of writing code probably wasn't the bottleneck in the first place.
- blibble3 hours ago
  the amount of people that work in technology and have never heard of amdahl's law always shocks me
  https://en.wikipedia.org/wiki/Amdahl's_law
  a 100% increase in coding speed means I then I get to spend an extra 30 minutes a week in meetings
  while now hating my job, because the only fun bit has been removed
  "progress"
  - hnuser8472 hours ago
    So if I'm understanding you correctly, prior to AI tools you spent 1 hour per week coding? And now you spend 30 minutes per week?
  - zht2 hours ago
    the number of people who have heard of Amdahl's law but don't know when to use "amount of X" vs "number of Y" always shocks me as well
- qudat2 hours ago
  Agreed. The bottleneck is QA/Code review and that is never going away from most corps. I've never worked at a job in tech that didn't require code review and no, asking a code agent to review a PR is never going to be "good enough".
  And here we are, the central argument for why code agents are not these job killing hype beasts that are so regularly claimed.
  Has anyone seen what multi-agent code workflows produce? Take a look at openclaw, the code base is an absolute disaster. 500k LoC for something that can be accomplished in 10k.
  - lbreakjai2 hours ago
    My head of engineering spent half a day creating a complex setup of agents in opencode, to refactor a data model across multiple repositories. After a day running agents and switching between providers to work around the token limits, it dumped a -20k +30k change set we'll need to review.
    If we're very lucky, we'll break even time wise compared to just running a single agent on a tight leash.
    voidfunc12 minutes ago
    YOLO. Just ship it.
    jimbokun37 minutes ago
    While reading your comment the Benny Hill theme Yackety Sax started playing in my head.
  - ben_w2 hours ago
    > I've never worked at a job in tech that didn't require code review
    I have. Sometimes the resulting code was much worse than what you get from an LLM, and yet the project itself was still a success despite this.
    I've also worked in places with code review, where the project's own code quality architecture-and-process caused it to be so late to the market it was an automatic failure.
    What matters to a business is ideally identical to the business metrics, which are usually not (but sometimes are) the code metrics.
  - piva002 hours ago
    The bottleneck at larger orgs is mostly always decision-making.
    Getting code written and reviewed is the trivial part of the job in most cases, discovering the product needs, considering/uncovering edge-cases, defining business logic that is extensible or easily modifiable when conditions change, etc. are the parts that consume 80% of my time.
    We in the engineering org at the company I work for have raised this flag many times during adoption of AI-assisting tools, now that the rollout is deeply in progress with most developers using the tools, changing workflows, it has become the sore thumb sticking out: yes, we can deliver more code if it's needed but for what exactly do you need it?
    So far I haven't seen a speed up in decision-making, the same chain of approvals, prioritisation, definitions chugs along as it was and it is clearly the bottleneck.
  - 8note2 hours ago
    i dont think thats actually the bottleneck?
    the bottleneck is aligning people on what the right thing to do is, and fiting the change into everyone's mental models. it gets worse the more people are involved
  - oblio2 hours ago
    > Take a look at openclaw, the code base is an absolute disaster. 500k LoC for something that can be accomplished in 10k.
    Mission accomplished: acquhire worth probably millions and millions.
    I agree with you, by the way.
    MYEUHD2 hours ago
    It was a hire not an acquihire. There was no acquisition.
    oblioan hour ago
    There was a big payoff on signing so to-may-to, to-mah-to.
  - co_king_52 hours ago
    I'm sorry but consider how many more edge cases and alternatives can be handled in 500k LoC as compared to that tiny 10k.
    In the days of AGI, higher LoC is better. It just means the code is more robust, more adaptable, better suited to real world conditions.
    Yodel09142 hours ago
    That’s… not how software works, no matter how it is produced. Complexity is the enemy; always.
- menaerus2 hours ago
  In high-performance teams it is. In bike-shedding environments of course it is not.
  - overgardan hour ago
    I'm not sure I'd call it bike shedding so much as that a lot of time and effort tends to go into hard to answer questions: what to build, why to build it, figuring out the target customer, etc. A lot of times going a thousand miles per hour with an LLM just means you figure out pretty quickly you're building the wrong thing. There's a lot of value to that (although we used to just call this "prototyping"), but, that doesn't remove the work of actually figuring out what your product is.
    The least productive teams I've been on, it wasn't usually engineering talent that was the problem, it was extremely vague or confused requirements.
    menaerus19 minutes ago
    I think you meant to say incompetent leadership.
- outside12342 hours ago
  This. The key bottleneck in many organizations is the "socialize and align" on what to build. Or just "socialize and align" in general. :)
- vorticalbox3 hours ago
  one thing that aways slowed me down was writing jsdocs and testing.
  Now i can write one example of a pass and then get codex to read the code and write a test for all the branches in that section saves time as it can type a lot faster than i can and its mostly copying the example i already have but changing the input to hit all the branches.
  - otabdeveloper43 hours ago
    > let's have LLMs check our code for correctness
    Lmao. Rofl even.
    (Testing is the one thing you would never outsource to AI.)
    idle_zealot2 hours ago
    Outsourcing testing to AI makes perfect sense if you assume that tests exist out of an obligation to meet some code coverage requirements, rather than to ensure correctness. Often I'll write a module and a few tests that cover its functionality, only for CI to complain that line coverage has decreased and reject my merge! AI to the rescue! A perfect job for a bullshit generator.
    8note2 hours ago
    outsourcing testing the AI also gets its code to be connected to deterministic results, and show let the agent interact with the code to speculate expectations and check them against the actual code.
    it could still speculate wrong things, but it wont speculate that the code is supposed to crash on the first line of code
    sshine2 hours ago
    > Testing is the one thing you would never outsource to AI
    That's not really true.
    Making the AI write the code, the test, and the review of itself within the same session is YOLO.
    There's a ton of scaffolding in testing that can be easily automated.
    When I ask the AI to test, I typically provide a lot of equivalence classes.
    And the AI still surprises me with finding more.
    On the other hand, it's equally excellent at saying "it tested", and when you look at the tests, they can be extremely shallow. Or they can be fairly many unit tests of certain parts of the code, but when you run the whole program, it just breaks.
    The most valuable testing when programming with AI (generated by AI, or otherwise) are near-realistic integration tests. That's true for human programmers, but we take for granted that casual use of the program we make as we develop it constitutes as a poor man's test. When people who generally don't write tests start using AI, there's just nothing but fingers crossed.
    I'd rather say: If there's one thing you would never outsource to AI, it's final QA.
    ben_w2 hours ago
    > (Testing is the one thing you would never outsource to AI.)
    I would rephrase that as "all LLMs, no matter how many you use, are only as good as one single pair of eyes".
    If you're a one-person team and have no capital to spend on a proper test team, set the AI at it. If you're a megacorp with 10k full time QA testers, the AI probably isn't going to catch anything novel that the rest of them didn't, but it's cheap enough you can have it work through everything to make sure you have, actually, worked through everything.
    LoganDark2 hours ago
    You don't use the LLM to check your code for correctness; you use the LLM to generate tests to exercise code paths, and verify that they do exercise those code paths.
    onion2k2 hours ago
    And that test will check the code paths are run.
    That doesn't tell you that the code is correct. It tells you that the branching code can reach all the branches. That isn't very useful.
nasretdinov3 hours ago
I think that over time people will start looking at AI-assisted coding the same way we now look at loosely typed code, or at (heavy) frameworks: it saves time in the short term, but may cause significant problems down the line. Whether or not this tradeoff makes sense in a specific situation is a matter of debate, and there's usually no obviously right or wrong answer.
- doomslayer9993 hours ago
  Once the free money runs out, the AI cos may shift to making heavily verified code snippets with more direct language control. This will heavily simplify a lot of boilerplate instead of fairytales of some AGI coding wiz.
  - co_king_53 hours ago
    Isn't the boilerplate that "AI" is capable of generating becoming more and more dated with each passing day?
    Are the AI firms capable of retraining their models to understand new features in the technologies we work with? Or are LLMs going to be stuck generating C.A. 2022 boilerplate forever?
    jimbokun35 minutes ago
    I mean if people continue checking open source code into GitHub using those new features then they should be able to learn them just fine.
    danarisa minute ago
    This is only true if there continues to be tremendous amounts of money/hardware/power available to perform the training, in perpetuity.
    doomslayer9993 hours ago
    No to the first question, and maybe with a lot of money for the second question.
    mattmanser2 hours ago
    In the 20 years I've been in the industry, boiler plate has dropped dramatically in the backend.
    Right now, front end has tons of boiler plate. It's one of the reasons AI hassle such a wow factor for FE, trivial tasks require a lot of code.
    But even that is much better than it was 10 years ago.
    That was a long way of saying I disagree with your no.
    skydhash2 hours ago
    FE has a lot of boilerplate only if you’re starting from scratch every single time. That’s why we had template systems and why we invented view libraries. Once you’ve defined your libraries, you just copy-paste stuff.
    matthewbauer3 hours ago
    It seems like they should be able to “overweight” newer training data. But the risk is the newer training data is going to skew more towards AI slop than older training data.
    otabdeveloper43 hours ago
    There won't ever be newer training data.
    The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.
    esclerofilo2 hours ago
    Every time claude code runs tests or builds after a change, it's collecting training data.
    co_king_52 hours ago
    Has Anthropic been able to leverage this training data successfully?
    esclerofilo2 hours ago
    I can't pretend to know how things work internally, but I would expect it to be involved in model updates.
    otabdeveloper42 hours ago
    You need human language programming-related questions to train on too, not just the code.
    8note2 hours ago
    thats what the related chats are for?
- moffkalastan hour ago
  It really depends on the situation. I think there's an argument for generating in a lower level strongly typed language, where most of the work of writing the pointlessly verbose parts is eliminated, any errors are found by the compiler immediately, but it still leaves the option for handwritten optimizations when needed. Sort of how one can drop down to C in python for the parts that need more performance.
ptx2 hours ago
Apparently "AI is speeding up the onboarding process", they say. But isn't that because the onboarding process is about learning, and by having an AI regurgitate the answers you can complete the process without learning anything, which might speed it up but completely defeats the purpose?
- raphman2 hours ago
  Yes, that's how I'd interpret it, too.
  According to the article, onboarding speed is measured as “time to the 10th Pull Request (PR).”
  As we have seen on public GitHub projects, LLMs have made it really easy to submit a large number of low-effort pull requests without having any understanding of a project.
  Obviously, such a kind of higher onboarding speed is not necessarily good for an organization.
  - jimbokun32 minutes ago
    Yeah it should only count ACCEPTED pull requests.
- mjfisher2 hours ago
  I think there's definite scope for that being true; not because you can start doing stuff before you understand it (you can), but because you can ask questions of a codebase your unfamiliar with to learn about it faster.
- 8note2 hours ago
  id guess the time til forst being able to make useful changes has dropped to near zero, but the time to get mastery of the code base has gone towards infinity.
  is that mastery still useful as time goes on though? its always felt a bit like its unhealthy for code to have people with mastery on it. its a sign of a bad bus factor. every effort ive ever seen around code quality and documentation improvement has been to make that code mastery and full understanding irrelevant.
- OptionOfT2 hours ago
  Correct. Reading code is important. The details are in the minutia, and the way code works is that the minutia are important.
  Summarizing this with AI makes you lose that context.
  - snsjzhhz2 hours ago
    This has been my experience as a dev, and it always confuses me when people say they prefer to work at a “higher level”. The minutiae are often just as important as some of the higher level decisions. Not everything, but not an insignificant portion either. This applies to basic things like correctness, performance, and security - craft, style, and taste are not involved.
    co_king_52 hours ago
    > This has been my experience as a dev, and it always confuses me when people say they prefer to work at a “higher level”.
    > The minutiae are often just as important as some of the higher level decisions.
    Frankly, a failure to understand this is a tell that someone is not equipped to evaluate code quality.
xeiotos3 hours ago
Unsurprising for multiple reasons. Most organizations have other bottlenecks and limiting factors than “how fast can you develop”.
Regardless, if you’re a dev who is now 2x as productive in terms of work completed per day, and quality remains stable, why should this translate to 2x the output? Most people are paid by the hour and not for outcomes.
And yes, I am suggesting that if you complete in 4 hours that which took you 8 hours in 2019, that you should consider calling it a day.
bluejekyll3 hours ago
I found the title for this post misleading. To clarify it a bit, AI has only improved productivity by 10% even though 93% of devs are using it.
- dandanua3 hours ago
  Yeah, the title may suggest that productivity is still 10% out of 100% after CEOs fired half of developers believing that the rest will do all the job with the help of AI.
ilovetux3 hours ago
I think some AI companies are just now starting to feel the pressure to profit.
Soon, I predict we will see a pretty significant jump in price that will make a 10% productivity gain seem tiny compared to the associated bills.
For now, these companies are trying to reach critical mass so their users are so dependant on their tech that they have to keep paying at least in the short term.
keeda2 hours ago
The real takeaway here -- also corroborated by the DORA 2025 report https://dora.dev/research/2025/ -- is that more than anything, AI amplifies your current development culture. Organizations with strong quality control discipline enjoy more velocity, those with weak practices suffer more outages.
Expecting AI to magically overcome your development culture is like expecting consultants to magically fix your business culture.
Furthermore, by various estimates, engineers only spend 10 - 60% of their time on actual code. So, given that currently AI is largely used only for coding activities, 10% is actually considerable savings.
Also this is the result of retro-fitting AI into existing workflows; actual "AI-native" workflows would probably look very different, likely having refactored in other parts of software engineering. Spotify's "Honk" workflow is probably just a starting point.
- orwinan hour ago
  I'm pretty sure it has to do with the individual as well as the culture. Juniors/new hire use AI to multiply by two their wrong/unsafe output, and seniors then have to spend more time correcting it.
  I'll be honest: I piss poor code, each time I come back to an old project I see where I could have done better. New hires are worse, but before AI (and especially Opus) they didn't produce that much code before spending like 6 months learning (I'm on a netsec tooling team). Now, they start producing code after two weeks or less, and every line have to be checked because they don't understand what they are doing.
  I think my personal output was increased by 15% on average (maybe 5 on difficult projects), but our team output decreased overall.
  - keedaan hour ago
    Yes, we as a society urgently have to figure out how to learn and educate with AI. There are even studies showing that students who use AI to do their work do not learn the necessary skills.
    And I'm also hearing grumblings about entry level talent that is absolutely clueless without AI, which does not help the junior hiring scene at all.
    At this point it seems clear that people wishing to learn a discipline should restrict their usage of AI until they have "built the muscles", but none of our educational, testing, recruitment and upskilling practices are conducive to that.
kiernanmcgowan2 hours ago
My biggest road blocks as an engineer has almost never been the authorship of code but everything else around it.
* Getting code reviewed
* Making sure its actually solving the problem
* Communicating to the rest of the team whats happening
* Getting tests to pass
* Getting it deployed
* Verifying that the fix is implemented in production
* Starting it all over when there is a misunderstanding
Slinging more code faster is great and getting unit testing more-or-less for free is awesome but the separation between a good and great engineer is one of communication and management.
AI is causing us to regress to thinking that code velocity is a good metric to use when comparing engineers.
chvid2 hours ago
As far as I can tell from my workplace the total impact on productivity is neutral to negative.
agentifysh2 hours ago
I read this article as the CTO being the bottleneck if he's only seeing 10% productivity boost at his organization.
I dont think this is a purely AI problem more with the legacy costs of maintaining many minds that can't be solved by just giving people AI tools until the AI comes for the CTO role (but not CEO or revenue generating roles) too and whichever manager is bottlenecking.
I imagine a future where we have Nasdaq listed companies run by just a dozen people with AI agents running and talking to each other so fast that text becomes a bottleneck and they need another medium that can only be understood by an AI that will hold humans hand
This shift would also be reflected by new hardware shifts...perhaps photonic chips or anything that lets AI scale up crazy without the energy cost....
Exciting times are ahead AI but it's also accelerating digital UBI....could be good and bad.
- Nezteb2 hours ago
  > it's also accelerating digital UBI
  Do you have sources for this claim?
onion2k2 hours ago
A 10% uplift in productivity for the cost of probably 0.001% of the salary budget is an incredible success.
- arctic-true2 hours ago
  This is exactly right. And assuming organizations use the gains to cut headcount rather than boost total productivity, a 10% reduction in white collar employment would still be an era-defining systemic shock to the economy.
  - emp173442 hours ago
    Productivity improvements from automation actually result in an increase in jobs, not fewer jobs. Basic economics.
wewewedxfgdf2 hours ago
How are CTO's so out of touch and yet loud and proud about it.
rcfox2 hours ago
The title is misleading. Productivity isn't at 10%, it's at 110%.
aaroninsf35 minutes ago
Ximm's Law applies to the "plateau" of 10%
In other words: notionally, if not literally, by the time trailing numbers are collected they are out of date.
This is of course axiomatic, but, that staleness is a serious matter in this particular moment.
It's a cliché that six months can be a lifetime on the bleeding edge of tech.
This is the first time in my career that is more or less literally true.
Humans reason poorly with non-linear change.
This entire article is a demonstration of that.
mytailorisrich3 hours ago
Blunt opinion: Most devs are not that good and really only execute what they are told to do.
The threat of AI for devs, and the way to drastically improve productivity is there: keep the better devs who can think systemically, who can design solutions, who can solve issues themselves and give them all the AI help available, cut the rest.
- tyleo2 hours ago
  That’s how I feel too. When I was an architect at a ~300-person company, a big chunk of my job shifted to reviews, technical design docs, and guidance. I’m getting great results by feeding context like that into Claude Code, then reviewing and steering what it produces.
  It really does feel like a multiplier on me and I understand things enough to get my hands dirty where Claude struggles.
  Lately I’ve been wondering if that role evolves into a more hierarchical review system: senior engineers own independent modules end-to-end, and architects focus on integration, interfaces, and overall coherence. Honestly, the best parts of our product already worked like that even before AI.
gedy3 hours ago
I can see where productivity could be higher if all I did was type in programs to some spec, or bootstrapping new apps all day - but that's like not the reality of "programming", at least for me past 25 years. Sorting through what to even make and interpreting "requirements" is what takes the most time
moralestapia3 hours ago
AI adoption has reduced productivity at my workplace, and by a noticeable amount!
- randomtoast2 hours ago
  This will lead to natural selection. As AI becomes increasingly integrated into all areas, companies that manage it less effectively than others will face greater selection pressure.
  - emp173442 hours ago
    Or, AI will turn out to just not be that useful.
    moralestapiaan hour ago
    It's such a weird effect.
    At a personal level, AI has made non-trivial improvements to my life. I can clearly see the value in there.
    At an organizational level, it tends to get in the way much more than helping out. I do not yet see the value in there.
- otabdeveloper42 hours ago
  That's expected for any new "low-code" solution du jour.
downrightmike3 hours ago
Yeah, industry has told them that devs aren't valuable and AI can do their job. Who TF has motivation after that?
- havefunbesafe2 hours ago
  People getting paid >$400k TC
- co_king_53 hours ago
  No motivation? I'm sorry buddy but your ass is getting replaced by Claude Code in the next 3-6 weeks.
ihsw2 hours ago
[dead]