How much are LLMs boosting real-world programmer productivity?(www.lesswrong.com)

110 pointsby gatinsama8 months ago45 comments

jit_hacker8 months ago
I work at a popular Seattle tech company. and AI is being shoved down our throats by leadership. to the point it was made known they're tracking how much devs use AI and I've even been asked when I'm personally not using it more. and I've long been a believer in using the right tool for the right job. And sometimes it's AI, but not super often
I spent a lot of time trying to think about how we arrived here. where I work there are a lot of Senior Directors and SVPs who used to write code 10+ years ago. Who if you would ask them to build a little hack project they would have no idea where to start. And AI has given them back something they've lost because they can build something simple super quickly. But they fail to see that just because it accelerates their hack project, it won't accelerate someone who's an expert. i.e. AI might help a hobbyist plant a garden, but it wouldn't help a farmer squeeze out more yield.
- lolinder8 months ago
  > just because it accelerates their hack project, it won't accelerate someone who's an expert.
  I would say that this is the wrong distinction. I'm an expert who's still in the code every day, and AI still accelerates my hack projects that I do in my spare time, but only to a point. When I hit 10k lines of code then code generation with chat models becomes substantially less useful (though autocomplete/Cursor-style advanced autocomplete retains its value).
  I think the distinction that matters is the type of project being worked on. Greenfield stuff—whether a hobby project or a business project—can see real benefits from AI. But eventually the process of working on the code becomes far more about understanding the complex interactions between the dozens to hundreds of components that are already written than it is about getting a fresh chunk of code onto the screen. And AI models—even embedded in fancy tools like Cursor—are still objectively terrible at understanding the kinds of complex interactions between systems and subsystems that professional developers deal with day in and day out.
  - frankc8 months ago
    My experience has gotten better by focusing on documenting the system (with ai to speed up writing markdown). I find reasoning models quite good at understanding systems if you clearly tell them how it works. I think this creates a virtuous circle where I incrementally write much more documentation than I ever had the stomach for before. Of course this is still easier of you started greenfield buts allowed me to keep claude 3.7 in the game even as the code base is now 20k+ lines.
    lolinder8 months ago
    > even as the code base is now 20k+ lines.
    That's better than my past experience with hobby projects, but also nowhere near as big as the kinds of software systems I'm talking about professionally. The smallest code base I have ever worked on was >1M lines, the one I'm maintaining now is >5M.
    I don't doubt that you can scale the models beyond 10K with strategies like this, but I haven't had any luck so far at the professional scales I have to deal with.
    drodgers8 months ago
    I've found claude-code good in a multi-million line project because it can navigate the filesystem like a human would.
    You have to give it the right context and direction — like you would to a new junior dev — but then it can be very good. Eg.
    > Implement a new API in `example/apis/new_api.rs` to do XYZ which interfaces with the system at `foo/bar/baz.proto` and use the similar APIs in `example/apis/*` as reference. Once you're done, build it by running `build new_api` and fix any type errors.
    Without that context (eg. the example APIs) it would flail, but so would most human engineers.
    frankc8 months ago
    Well I have also worked on systems of multiple millions of lines, well pre-llm, and I sure as he'll didn't actively understand every aspect of it. I understood deeply the area I work on and the contracts with my dependencies as well the contracts I provide. I also understand the overall architecture. We'll see how it goes if my project grows to that point, but I believe by clearing documenting those things, and overall focusing on low coupling, I can keep the workflow I have now, but with context loading for every session. Time will tell.
    In general though, its been a lot of learning on how to make llms work for me, and I do wonder if people simply dismiss too quickly because they subconsciously don't want them to work. Also "llm" is too generic. Copilot with 4o sucks but claude in cursor and windsurf does not suck.
- SkyPuncher8 months ago
  I’m using it to ship real projects with real customers in a real code base at 2x to 5x the rate I was last year.
  It’s just like having a junior dev. Without enough guidance and guardrails, they spin their wheels and do dumb things.
  Instead of focusing of lines of code, I’m now focusing on describing overall tasks, breaking them down, and guiding an LLM to a solution.
  - player12348 months ago
    Cool anecdote, for me it has slowed me down 8x to 23x since I started using it in real projects with real customers in a real code base last year.
    So 1-1 in pointless personal anecdotes. Now show us the numbers! How did you measure this? Can u show 2x/5x increase in projects/orders/profits/stock price?
    SkyPuncher8 months ago
    I'm not really sure I understand your counter argument. Pretty much everything about personal productivity is anecdotes because it's so uniquely tied to an individual. I showed you my numbers - I am 2x to 5x faster at delivering projects.
- thegrim338 months ago
  The point is that leadership gets to write on their own promo document / resume about how they "boosted developer productivity" by leading the charge on introducing AI dev processes to the company. Then they'll be long gone onto the next job before anybody actually knows what the result of it was, whether it actually boosted productivity or not, whether there were negative side-effects, etc.
- lumost8 months ago
  Aye - this is a limitation of the current tech. For any project greater than 1k lines where the model was not pretrained on the code base…. AI is simply not useful beyond documentation search.
  It’s easy to see this effect in any new project you start with AI, the first few pieces of functionality are easy to implement. Boilerplate gets written effortlessly. Then the ai can’t reason about the code and makes dumb mistakes.
mikeocool8 months ago
What’s that old (and in my experience pretty accurate) adage? The last 10% of a software project takes 90% of the time?
In my experience, AI is helpful for that first 90% — when the codebase is pretty simple, and all of the weird business logic edge cases haven’t crept in. In the last 10%(as well as most “legacy” codebases), it seems to have a lot trouble understanding enough to generate helpful output at more than a basic level.
Furthermore, if you’re not deliberate with your AI usage, it really gets you into “this code is too complicated for the AI to be much help with” territory a lot faster.
I’d imagine this is part of why we’re not seeing an explosion of software productivity.
- hn_throwaway_998 months ago
  This is my experience as well. There are a couple things I love using AI for, like learning new programming languages or technologies (I consider myself an expert in Java and NodeJS, and proficient in Python, but I recently took a job where I had to program in an unfamiliar language), and it's been great for programming up short little "apps" for me for things I want - I've built a slew of browser apps for myself that just save stuff to local storage so that I can easily put it up on GitHub pages (and then I create import and export functions if I switch browsers - export just opens a mailto link where the body just contains a link with the state as a param, so then I just save that email, open it up on a different device and click on the link).
  But I've found that there are a lot of places where it kind of falls over. I recently had Cursor do a large "refactoring" for me, and I was impressed with the process it went through, but at the end of the day I still had to review it all, it missed a couple places, and worse, it undid a bug fix that I put in (the bug was previously created when I had AI write a short function for me).
  The other thing the makes me really worried is that AI makes it easy to be lazy and add tons of boilerplate code, where in the old world if I had to do it all manually I would definitely have DRY-ed stuff up. So it makes my life immediately easier, but the next guy now is going to have a shit ton more code to look at when they try to understand the project in the first place. AI definitely can help with that understanding/summarization, but a lot of times I feel like code maintenance is a lot of finding that "needle in a haystack", and AI makes it easy to add a shit ton of hay without a second thought.
- javajosh8 months ago
  Yeah, I've been disappointed in a lot of code generation within my field of expertise. However, if I need to whip up some bash scripts, AI works very well. But if I want those bash scripts to be actually good, AI just can't get there. It certainly cannot "think outside the box" and deliver anything close to novel or even elegant (although it may give some tactical help writing boilerplate lightly adapted to your codebase). The analogy I use is that LLM AIs are like a new car mechanic tool that can generate any nut, bolt or gasket, for free and instantly (just add electricity!). It's great addition to the toolset for a seasoned mechanic, distracting for a junior, and is not even in the same universe required to fix an entire car, let alone design one.
photonthug8 months ago
Like it says in tfa, it’s frustrating how we can never seem to move past anecdotes and “but did you try <insert flavor of the week>” and if you’re lucky, benchmarks that may or may not be scams.
10x, 20x etc productivity boosts really should be easy to see. My favorite example of this is the idea of porting popular things like media wiki/wordpress to popular things like Django/rails. Charitable challenge right, since there’s lots of history / examples, and it’s more translation than invention. What about porting large well known code bases from c to rust, etc. Clearly people are interested in such things.
There would be a really really obvious uptick in interesting examples like this if impossible dreams were now suddenly weekend projects.
If you don’t have an example like this.. well another vibes coding anecdote about another CRUD app or a bash script with tricky awk is just not really what TFA is asking about. That is just evidence that LLMs have finally fixed search, which is great, but not the subject that we’re all the most curious about.
- crabsand7 months ago
  Recently I tried to translate a few 100-200 line scripts from zsh to nushell with Claude 3.5 Sonnet and it sucked. This is now my go to experiment for new LLMs, translating code between two programming languages must be easier than translating natural language to a programming language, yet we don't see any such results, even for popular languages.
- player12348 months ago
  So you are saying that one-off scripts and hobby projects is not a trillion dollar industry? Blasphemy!
  The emergent crew and I are going down going down to the shrine of Sam Altman later today to sacrifice a goat, maybe you would like to come and learn a thing or two?
KaiserPro8 months ago
Disclaimer: I work at a FAANG with exceptionally good integration of LLM into my IDE.
For me its been a everso slight net positive.
In terms of in-IDE productivity it has improved a little bit. Stuff that is mostly repetitive can be autocompleted by the LLM. It can, in some cases provide function names from other files that traditional intelliCode can't do because of codebase size.
However it also hallucinates plausible shit, which significantly undermines the productivity gains above.
I suspect that if I ask it directly to create a function to do X, it might work better. rather than expecting it to work like autocomplete (even though I comment my code much more than my peers)
over all rating: for our code base, its not as good as c# intelliCode/VS code.
Where it is good is asking how I do some basic thing in $language that I have forgotten. Anything harder and it start going into bullshit land.
I think if you have more comprehensive tests it works better.
I have not had much success with agentic workflow, mainly because I've not been using the larger models. (Our internal agentic workflow is limited access)
- IshKebab8 months ago
  That's sort of my experience too. It's really good at some auto-complete (especially copying patterns already in your code). For example if you write some cross product function it will easily auto-complete the equations after seeing one. It's obviously not as good as intellisense where that works.
  And it's really good for basic stuff in things you don't want to have to look up. E.g. "write me JavaScript to delete all DOM nodes with class 'foo'".
  I reckon you're underestimating how much time that saves though. The auto-complete saves me a few seconds many times a day. Maybe 2-3 minutes a day. A whole day in one year.
  The "write me some shitty script" stuff saves much more time. Sometimes maybe an hour. That's rarer but even one a month that's still a whole day in a year.
  Maybe 2 days in a year doesn't seem significant but given the low cost of these AI tools I think any company is self-harming if they don't pay for them for everyone.
  (Also I think the hallucination isn't that bad once you get used to it.)
  - chrisandchris8 months ago
    > A whole day in a year.
    Now let's count the hours of "useless meetings" and how much time could be saved there. And that would cost us nothing.
    I'm not sure saving two days of work _a year_ is worth the hassle of talking to an LLM like it's a 5 year old - IMHO.
    IshKebab8 months ago
    Sure if you have a way to guarantee reducing "useless meetings" we should definitely be doing that.
    You don't though.
- technofiend8 months ago
  That's been my experience: it's good at basic tasks. It can be prompted to write idiomatic code in very small amounts. Anything beyond that and it's just as likely to either write non-optimal code or silently delete code while trying to satisfy your ask.
- itsoktocry8 months ago
  >I suspect that if I ask it directly to create a function to do X
  This is how LLMs are most effective, and also the reason why I don't believe non programmers will be crushing code. You do actually need to know how to program.
techpineapple8 months ago
The one example I can think of of real world developer getting seemingly 10x improvement is Pieter levels coding a 3D multiplayer flight sim in a few days vibe coding. I tried vibe coding with cursor and mostly ran into simple roadblock after simple roadblock, I’m curious to watch some unedited videos of people working this way.
- colecut8 months ago
  That has been a pretty wild development to watch..
  I've been following him for a while, it's interesting what a polarizing figure he is..
  a recent comment on his X feed resonated with quite a few people
  "I’ve never seen an account that feels inspirational and makes me want to kill myself simultaneously"
  He is definitely an entrepreneur/business man first and a developer second.. He has taught himself the skills to minimally get the job done functionally, has decent eye for design, and knows how to market... He makes it blatantly obvious, moreso than usual, that sales and marketing are way more important to making money than doing things technically "the right way".. At least on these smaller scale things because he only knows how to work by himself.
  People hate that he's figured out how to make tens of thousands of dollars from a game that is arguably worse than hundreds of games that make nothing... I see that as just another skill that can be learned. And it is cool how transparent he is about his operations.
  But yes, even with his blueprints it is hard to replicate his successes, my attempt of "one shotting" a video game was not very impressive..
  - manmal8 months ago
    He‘s basically jumping from hype to hype, riding the forefront wave of attention, funneling it into his wallet. I think he‘s extremely intelligent and quick also, both of which are necessary to do this. I wouldn’t be able to.
    ge968 months ago
    The last thing I saw him working on was AI generated people/avatars.
    Yeah he was/is a big name for digital nomad, I used to quote his figured like $70K/mo I bet it's more now.
    I think remotok is written in PHP and jQuery (example of makes money, tech is not fancy)
    colecut8 months ago
    Yes, PhotoAI is his most successful thing now doing over $120k/ month on its own, it is also php and jquery.. cron jobs and AI apis..
    anonzzzies8 months ago
    HN people should take note with their overly architected weird latest tech setups that will never have any traffic; just make it work, worry about the rest later!
    ge968 months ago
    Amazing
  - itsoktocry8 months ago
    The fact of the matter is, unless you're doing cutting edge work, you dont need to be a great programmer or "write good code" to be successful.
    That's why it's mind blowing to me when devs say AI writes bad code. For the most part, who cares?
    fragmede8 months ago
    We, the people who used to write code, laboriously, by hand, using one's and zero's passed down to us, from Jacquard's loom, from before there was an Internet, do. A dying art, I suppose. Look at carpentry. The level of care put into the average kitchen cabinet evaporated when mass manufacturing means that Ikea can sell a set that is functional and looks just fine. Used to be, a carpenter painstakingly did it by hand, piece by piece in a woodshop. So too goes the way of coding, I suppose. Used to be every "if" "for" and "class" statement was typed out by a human, seen by human eyes. It seems that time must pass.
    nyarlathotep_7 months ago
    This is correct. Majority of software written is CRUD crap + a client. This doesn't require anything like "ideal" (or even particularly good) code
    Exceptions are ofc for software where there's actual performance concerns, "important" stuff (banking, finance and the like). NextJS SAAS of the day ain't it.
  - techpineapple8 months ago
    I think it’s a little weird to say he made 10’s of thousands from a game and in curious if it will last. Is the game good? I logged into the game and just thought to myself, holy billboards Batman!
    He made 10’s of thousands from advertising right? It feels like most of what the does the lesson is if you follow the steps I take you can be an entrepreneur too which is mostly true, with the game it feels like, if you develop a following of millions and then create something that leaps to the front of the hype train people may give you money for a while. Which doesn’t feel like what most people are hoping from “vibe coding a game can make you 60k/month.”
    But no shade to him.
    anonzzzies8 months ago
    He is a very typical 'build in the open' guy (he is more famous and makes more than most of them but his strategy is typical); there are very many on x and bsky; they build and launch fast, many of them launch 10+ products/year to see what sticks and when money comes in, they use those buyers to launch yet more products. As long as the barrage of products makes enough as a whole, you just need to keep pushing more and more. It works; it will never be a unicorn but it gives Levels and others a nice life without much stress and you are accountable to no one and product fail doesn't matter.
    I hate the term vibe coding, but as I understand it, it means just blabbering to cursor (or so) with working code as a result; he didn't do that as cursor (or replit or whatever current) cannot do that for more than hello world; if you do not read the generated code and specifically correct it when the ai gets stuck, it simply won't work. Levels can read and write code; people who want to 'vibe code' often cannot.
    colecut8 months ago
    I agree on all of this. I don't think this will be a sustainable venture but that doesn't matter, 2 of these unsustainable businesses lasting 1 month each is my salary..
    And agreed in the case of this game, the success is all because of his following. Not so with his other products though, this is just a joke to him..
    And his game is technically better than anything I can put together
- rlupi8 months ago
  Do you have a link?
EliRivers8 months ago
Every so often, it saves me a few hours on a task that's not very difficult, but that I just don't know how to do already. Generally something generic a lot of people have already done.
An example from today was using XAudio2 on windows to output sound, where that sound was already being fetched as interleaved data from a network source. I could have read the docs, found some example code, and bashed it together in a few hours; but I asked one of the LLMs and it gave me some example code tuned to my request, giving me a head start on that.
I had to already know a lot of context to be able to ask it the right questions, I suspect, and to thence tune it with a few follow up questions.
- rectang8 months ago
  The biggest timesaver for me so far is composing complex SQL queries with elements of SQL I don't use very often. In such cases I know what I want, but the specific syntax eludes me. Previously solving that has required poring over documentation and QA sites, but finding the right documentation and gradually debugging is tedious. An LLM gets me farther along.
- theshrike798 months ago
  Same here. I had to write a DynamicObject for a DSL-like system in C# to make it behave like Python dicts.
  With some LLM help I was done before lunch. After lunch I wrote some additional unit tests and improved on the solution - again with LLM help (the object type changes in unit tests vs integration tests, one is the actual type, one is a JsonDocument).
  I could've definitely done all that by myself, but when the LLM wrote the boilerplate crap that someone had definitely written before (but in a way I couldn't find with a search engine) I could focus on testing and optimising the solution instead of figuring out C# DynamicObject quirks.
- cmrx648 months ago
  https://chatgpt.com/share/67cca4ac-2a38-8000-9901-9f56219c06...
  … was curious what Gpt-4.5 would do with an absolute paucity of context :)
avastmick8 months ago
I’m a solo founder/developer (https://kayshun.co) my relationship/usage of LLMs for codegen has been complicated.
At first I was all in with Copilot and various similar plugins for neovim. It helped me get going but did produce the worst code in the application. Also I found (personal preference) that the autocomplete function actually slowed me down; it made me pause or even prevented me from seeing what I was doing rather than just typing out what I needed to. I stopped using any codegen for about four months at the end of 2024; I felt it was not making me more productive.
This year it’s back on the table with avante[0] and cursor (the latter back off the table due to the huge memory requirements). Then recently Claude Code dropped and I am currently feeling like I have productivity super powers. I’ve set it up in a pair programming style (old XP coder) where I write careful specs (prompts) and tests (which I code); it writes code; I review run the tests and commit. I work with it. I do not just let it just run as I have found I waste more time unwinding its output than watching each step.
From being pretty disillusioned six months ago I can now see it as a powerful tool.
Can it replace devs? In my opinion, some. Like all things it’s garbage in garbage out. So the idea a non-technical product manager can produce quality outputs seems unlikely to me.
0: https://github.com/yetone/avante.nvim
- simonbarker878 months ago
  Similar to me with CoPilot, I found it made it harder for me to spin up my brain to full power to tackle a genuinely tricky problem because I was letting it solve the simple ones for me. I stopped using it after about 3 months and had a total pause from CodeGen tools. Now I use Claude like a very documentation-knowledgeable junior developer who can write code very quickly. I guide it on the architecture and approach and sanity check what it does but let it save me a tonne of typing. I don’t use it for everything but as a CTO for an early stage startup that needs to turn somethings around quickly it’s incredibly useful.
summarity8 months ago
One of my teams at GitHub develops Copilot Autofix, which suggests fixes based on CodeQL alerts (another of my teams’ projects). Based on data of actual devs interacting with alerts and fixes, we see an average 3x speed up in time to fix over no Autofix, and up to 12x for some bug types. There’s more were doing but the theme I’m seeing is that lots of the friction points along the SDLC get accelerated.
malux858 months ago
One of the interesting things about LLM coding assistants is that the quality of the answer is significantly influenced by the communication skill of the programmer.
Some of the juniors I mentor cannot formulate their questions clearly and as a result, get a poor answer. They don’t understand that an LLM will answer the question you ask, which might not be the global best solution, it’s just answering your question - and if you ask the question poorly (or worse - the wrong question) you’re going to get bad results.
I have seen significant jumps in senior programmers capabilities, in some cases 20x, and when I see a junior or intermediate complaining about how useless LLM coding assistants are it always makes me very suspicious about the person, in that I think the problem is almost certainly their poor communication skills causing them to ask the wrong things.
- LouisSayers8 months ago
  I agree, if you can communicate effectively AI is a huge productivity jump.
  Another thing I've found is to actually engineer a solution - all the basic coding principles come into play, keeping code clean, cohesive and designing it to make testing easy. The human part of this is essential as AI has no barometer on when it's gone too far with abstraction or not far enough.
  When code is broken up into good abstractions then the AI part of filling in the gaps is where you see a lot of the productivity increases. It's like filling water into an ice tray.
  - malux858 months ago
    Thats a great analogy, and I agree!
hirsin8 months ago
> I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.
This has bad assumptions about what higher productivity looks like.
Other alternatives include:
1. Companies require fewer engineers, so there are layoffs. Software products are cheaper than before because the cost to build and maintain them is reduced.
2. Companies require fewer engineers so they lay them off and retain the spend, using it as stock buybacks or exec comp.
And certainly it feels like we've seen #2 out in the wild.
Assuming that the number of people working on software you use remains constant is not a good assumption.
(Personally this has been my finding. I'm able to get a bit more done in my day by eg writing a quick script to do something tedious. But not 5x more)
barnabee8 months ago
It’s extremely circumstantial for me.
Sometimes they give me maybe a 5–10% improvement (i.e. nice but not world changing). Usually that’s when they’re working as an alternative to docs, solving the odd bug, helping write tests or occasional glue code, etc. for a bigger or more complex/inportant solution.
In other cases I’ve literally built a small functioning app/tool in 6–12 hours of elapsed time, where most of that is spent waiting (all but unattended, so I guess this counts as “vibe coding”) while the LLM does its thing. It’s probably required less than an hour of my time in those cases and would easily have taken at least 1–2 days, if not more for me. So I’d say it’s at least sometimes comfortably 10x.
More to the point, in those cases I simply wouldn’t have tried to create the tool, knowing how long it’d take. It’s unclear what the cumulative incremental value of all these new tools and possibilities will be, but that’s also non-zero.
throwawa142238 months ago
I've had terrible luck getting LLMs to make me feel more productive.
Copilot is very good at breaking my flow and all of the agent based systems I have tried have been disappointing at following incredibly simple instructions.
Coding is much easier and faster than writing instructions in English so it is hard to justify anything i have seen so far as a time saver.
philjohn8 months ago
The biggest boon I've found is writing tests - especially when you've got lots of mocks to setup, takes away that boilerplate overhead and lets you focus on the meat of the test.
And when you name your test cases in a common pattern such as "MethodName_ExpectedBehavior_StateUnderTest" the LLM is able to figure it out about 80% of the time.
Then the other 20% of the time I'll make a couple of corrections, but it's definitely sped me up by a low double digit percentage ... when writing tests.
When writing code, it seems to get in the way more often than not, so I mostly don't use it - but then again, a lot of what I'm doing isn't boilerplate CRUD code.
havaloc8 months ago
I write plain jane PHP/MySQL crud apps that people love for work, including a fitness center membership system.
Writing a new view used to take 5-10 minutes but now I can do it in 30 seconds. Since it's the most basic PHP/MySql imaginable it works very well, none of those frameworks to confuse the LLM or suck up the context window.
The point is I guess that I can do it the old fashioned way because I know how, but I don't have to, I can tell ChatGPT exactly what I want, and how I want it.
- e12e8 months ago
  I'd be very interested to see a transcript or two of such a 30 second interaction, if you have any you could share?
  - elicksaur8 months ago
    They don’t because their comment in this thread took more than 30 seconds to write ;)
lfsh8 months ago
As search engine LLMs are nice. But for code generation they are not. Everytime it generates code there are small bugs that I don't notice directly but will bite me later.
For example a peace of code with a foreach loop that uses the collection name inside the loop instead of the item name.
Or a very nice looking peace of code but with a method call that does not exist in the used library.
I think the weakness of AI/LMMs is that it outputs probabilities. If the code you request is very common than it will probably generate good code. But that's about it. It can not reason about code (it maybe can 'reason' about the probability of the generated answer).
8 months ago
undefined
dvh8 months ago
I stopped using stack overflow altogether. It require to write very careful question not to get removed, with LLM I write 1 sentence, then few to narrow it down as needed. It could easily be 1 minute llm vs. writing SO post for 20 minutes and waiting 30 minutes for response. It also saves googling time because googling query often must be more generic to be effective so I then have to spend more time adjusting found solution, llm often gives specific answer.
The moment I realized llm are better was when I needed to do something with screen coordinates of point clouds in three.js and my searches lead nowhere, doing it myself would take me 1 or 2 hours, the llm got correct working code on first try.
- rectang8 months ago
  I have not stopped referencing Stack Overflow (though I have almost never asked questions there). I still find it more helpful for exploring approaches and ideas than an LLM. An LLM is most helpful when I definitely know what I want, I just don't know the specific incantation.
floppiplopp8 months ago
I use the jetbrains product line for my professional work. They now come with a AI code completion assistant, which will 50-80% of the time, depending on the type of project, suggest something wrong, which I either have to spend energy to evaluate and then ignore. The rare cases where it does suggest something useful don't make up for the time an energy wasted having to deal with the completion. AI in this case is detrimental to productivity and attention to the code. It's more useless than useful.
patrick4518 months ago
I never use them to generate code for languages that I know well. Google has started trying to answer my programming related searches with LLM results. Half the time it points me a useful direction, the other half it's just dead wrong.
I have found them pretty helpful writing sql. But I don't really know sql very well and I'd imagine that somebody who does could write what I need in far less time that it takes me with the LLM. While the LLM helps finish my sql task faster, the downside is that I'm not really learning it in the same way I would if I had to actually bang my head against the wall and understand the docs. In the long run, I'd be better off without it.
arjie8 months ago
Well, the most concrete example of this is Pieter Levels and his new flying around in cyberspace video game where he's making $60k MRR on. It's a concrete thing that he wouldn't have been able to build otherwise on that cadence.
LunaSea8 months ago
Negative productivity for me as I have to review bad LLM code during PR reviews.
- slig8 months ago
  Would you say that the team as a whole is less or more productive?
  - LunaSea8 months ago
    Less productive and dumber for having missed an opportunity to produce and learn.
arthurofbabylon8 months ago
Traction – such a useful term. What is the "traction" of LLMs in programming? When the rubber hits the road, what happens? Do the wheel spin in place, or does the car move forward?
The nice thing about traction is that you can see it. When you shovel away the snow in your driveway you can move your car; that's nice. When you update your hot water kettle and it boils in 30 seconds, that's traction. Traction is a freshly installed dishwasher in your kitchen.
I sincerely ask – not because I am skeptical but because I am curious – where is the traction with LLMs in software?
- rerdavies8 months ago
  Here are the various traction effects I get:
  1) Auto-complete on steroids. My LLM often auto-completes six or seven lines of code at a time, and can often infer intent from context. Wrong about as often as it is right, but if I had written the code myself, I would have been wrong on the first cut about as often as my coding assistant is. Somewhere between 25% and 60% of all code I write these days is written using TAB auto-complete of LLM suggestions. (Sorry I can't be more precise, but I use TAB auto-complete a LOT).
  2) An assistant for writing chunks of code in technologies with which I do not have deep expertise. Provides easy entry into APIs that I'm not familiar with. My LLM will sketch out the framework code required to get some chunk of technology up and working. ("Initialize an ALSA MIDI port. Ctrl+I. "and Make it initialize the port in RAW mode". "and use camel case for the method names". &c). Particularly useful with ancient crusty Linux APIs that are short on documentation and examples. My use of Stack Exchange has gone to zero (and my use of Reddit).
  3) Writing code for technologies that I have practically zero experience with. "Write a bash script that takes the output from this program (... format specified...)and use gnuplot to generate a graph."; "And display the results in a QT window"; "And set the title to "cumulative downloads"; "And rotate the x-axis labels by 45 degrees."; "and set all the fonts to 'Roboto'", &c. (I am for all practical purposes bash-script illiterate, and I've never used gnuplot before. I often do programming tasks these days that I would not have done before because the effort would not have been worthwhile without a coding assistant. Making my audio plugins read and write audio files in a variety of file format would have take a week, but with a coding assistant it took me about half a day. Without a coding assistant, I just wouldn't have done it.
  4) Difficult debugging problems! "What's wrong with this code?" (That seriously works!) "Why doesn't this code properly recover from ALSA buffer underruns?" Doesn't necessarily get the right answer; but it will often provide a list of extremely useful suggestions. I've encountered several occasions where my coding assistant will help me solve difficult bugs that I've been working for hours on, one of which I had been working on for three or four days.
  Of these, I think (2) provides the most productivity boost for me. I've often thought that the vast majority of time spent programming is spent reading documentation for the APIs you're using. Programming is mostly transforming documentation into usefulness. I will often write a one-liner saying what I want to do, and type ^I. My LLM is spookily good at stuff like that. No need to spend 40 minutes reading ffmpeg documenation. No need to google for the least toxicly awful code sample on the internet. Just, "// Write the audio buffer to an MP3 file using ffmpeg" ctrl+I.
  Let's be frank; nobody is going to be converting natural language to A-list games. But LLMs are a tool. And tools are good. And LLM coding assistants are extremely good tools. The answer to all of this is: try it, and learn to use the tool, and take away what works for you, and get good at recognizing when your coding assistant will and will not work.
  I get that you're asking because you haven't tried it yet. But I guarantee that if you do, you will recognize the traction pretty much instantly. It's immediately obvious that the tool is useful. And as you use it more and more, and adapt to the quirks, it becomes more and more useful.
  I'm a professional senior programmer with 40 years of experience. I'm pretty sure it roughly doubles my productivity.
  - black_138 months ago
    [dead]
nurettin8 months ago
I work as a programmer for the financial industry. (We do integrations with brokers, data aggregation, near realtime execution for mid frequency trading). I don't integrate them with my text editor. Pretty much all LLM code is sub-par no matter how advanced the model (likes to sprinkle hash maps everywhere a simple vector is fine, makes up functions even if you show headers) BUT sometimes it motivates me like a clown, or a rubber duck. Or a therapy cat. So overall it feels like a 5% increase in productivity for me.
insane_dreamer8 months ago
I find LLMs useful in the same vein that I find StackOverflow useful or reading through documentation useful. It saves me time when I 1) run into edge cases (where I might search SO and get lucky, but just as often the LLM's suggestion won't work); 2) am working with a framework that I'm not super familiar with (faster than looking up in the documentation); 3) writing code comments; 4) tedious refactoring.
It boosts productivity in the way that a good IDE boost productivity, but nothing like 5x or even 2x. Maybe 1.2~1.4x.
sumoboy8 months ago
Like a new diet every week, not everyone is losing weight. Great for snippets, suggestions, and helping with errors. But a long ways to go before it's more consistent and commonly used.
jmchuster8 months ago
I see it analogous to asking, "How much is access to the internet boosting real-world programmer productivity?" Are you really 5-10x more productive being able to google something? Couldn't you have just looked it up in the manual, don't you have peers you can ask, that's such a small portion of the time you spend coding.
But we've now lived it so much that it sounds ridiculous to try to argue that the internet doesn't really make _that_ much of a difference.
- simonw8 months ago
  I'm 20-50x more productive today compared to 25 years ago thanks to the internet being full of open source packages that solve most of the problems I run into.
  Back in the late 1990s I had to build solutions to almost every problem from scratch.
  - gatinsama8 months ago
    I heard a programmer I respect a lot say:
    "It's difficult to quantify this, but my guess for a while has been that I've had a giant productivity boost in the portion of my job which is typing code at a computer. And I would estimate I am two to three times more productive, faster at turning thoughts into working code, than I was before. But that's only 10% of my job."
    But this was 5 months ago, pre-Claude 3.7.
    simonw8 months ago
    That was definitely me, but I don't know where that quote is from.
    gatinsama8 months ago
    Here: https://newsletter.pragmaticengineer.com/p/ai-tools-for-soft...
rqtwteye8 months ago
For me it works pretty well for python scripts that pull data from somewhere and do something with it. Even the code quality is quite ok often. But for larger complex projects it doesn't work for me. It produces a lot of unmaintainable code.
But I can easily see a not so distant future where you don't even have to look at the code anymore and just let AI do its thing. Similar to us not checking the assembly instructions of compiled code.
- atq21198 months ago
  You don't have to check the assembly instructions of compiled code because armies of compiler engineers sweat the details of making damn sure that every single step of compilation is done by algorithms that are correct. And they fail often enough.
  I'm not saying that what you claim is entirely impossible, but it would require some major inventions and a very different approach to how ML is implemented and used compared to what's happening today. And I'm not convinced that the economics for that are there.
- rectang8 months ago
  An LLM is not a compiler, where extreme pains are taken to be correct and consistent. LLM coding output will not always need to be checked, depending on the use case, but it definitely will not be consistently correct across all use cases.
jasonthorsness8 months ago
It has certainly boosted my productivity for my blog. Not the articles; but the interactive features. Since my day job is mostly Go and C++ I’m less familiar with NextJS/TypeScript/NodeJS and the LLMs are tremendously helpful in knowing which libraries to use and how to use them correctly. It lets me get way more done in the constrained time I have to spend on it.
bhouston8 months ago
I am experiencing massive productivity boost with my home grown open source agentic coder, especially since I added a GitHub mode to it this week. I write about that here: https://benhouston3d.com/blog/github-mode-for-agentic-coding
furstenheim8 months ago
I find it useful to find esoteric APIs, like reflection, which normally has 100 methods and hard to find the right one
mentalgear8 months ago
I found that AI might help with analysis, but the generation part severely lacks and overall results in just too much effort to clean up and check the code afterwards then just writing it yourself.
At least for mid- to high- complex projects.
Vibe coding might be fun but ultimately results in unmaintainable code.
yimby20018 months ago
You guys really have all the FFMpeg flags memorized? It’s absurd to say it doesn’t save time sometimes. Also, aren’t these the guys that were whingeing about how we need to turn it off before the world turns into paper clips three years ago?
herbst8 months ago
I use AI all the day, gardening is now a completely different story for example. But for coding I don't get the results I want (which is mostly that the code is working)
8 months ago
undefined
asdf69698 months ago
I use it almost daily for asking questions. It’s 100x faster at reading documentation than I am but it’s still pretty bad at actually writing code.
janwillemb8 months ago
I was in a school team where we had to create documents describing a new curriculum. One of the members thought creating these docs using a LLM was a very easy and productive way of doing it. But it absolutely wasn't. It looked alright at the start, but on closer inspection it was just a load of BS. We had to change literally every single sentence to let the text make sense. I also was in a team that created tools we could use for our students. Again one of the members used a LLM to create certain tools. At first glance they worked alright, until some change was needed. Refactoring wasn't possible - a total rewrite was the only thing that helped. I don't think LLM boosts productivity. At best it boosts the appearance of productivity of one person, and then the others have to clean up after her.
bufordtwain8 months ago
I'd guess maybe 10-20% improvement for me.
croes8 months ago
Shouldn't the profits of companies that use AI increase massively?
Seems to be the easiest measurement of any effect
zellyn8 months ago
At my workplace, Block, people are sinking a lot of effort into Goose: https://block.github.io/goose/
Most of the folks I've talked to about it have been trying it, but the majority of the stories are still ultimately failures.
There are exceptions though: there's been some success for porting things between say JUnit4 and JUnit5.
The successes do seem to be coming more frequently, as the models improve, as the tools improve, as people develop the intuition, and as we invest time and attention to building out LLM-tailored documentation (my prediction here is that the task-focused, bite-sized documentation style that seems like a fit for LLMs will ultimately prove to be more useful to developers than a lot of the existing docs!)
On the part of the models improving, I expect it's going to be a bit like the ChatGPT 3.5 to 4 transition: there are certain almost intangible thresholds that when crossed can suddenly make a qualitative difference in usability and success.
I definitely feel like my emotions regarding LLMs are a bit of a roller coaster. I'm turning 50 these days, and some days feel like I would rather not completely upend my development practices! And the hype -- oh god, the hype -- is absolutely nauseating. Every CTO in existence told their teams to go rub some AI on everything. Every ad is telling you their AI is already working perfectly (it isn't).
But then I compete in our internal CTF and come in third even though the rest of my team bailed, because ChatGPT can write RISCV assembly payloads, and I don't have to learn or re-learns it for half an hour. Or I get Claude to write a Javascript/SVG spline editor matching the diagramming-as-code system I'm using, in like 30 or 45 minutes. And it's miraculous. And things like Cursor for just writing your code when you already know what you want… magical.
Here's the thing though. Despite the nauseating hype, we have to keep trying and trying to use AI well. There's a there there. The things are obviously unbelievably powerful. At some point, we're going to figure out how to use them effectively, and they're going to get good enough to do most of our low- to medium-complexity coding (at the very least). We're going to have to re-architect our software around AI. (As an example, instead of kludgy multi-platform solutions like React Native or Kotlin Multiplatform or J2ObjC, etc., why not make all the tests textual, and have an LLM translate changes in your Kotlin Android codebase into Swift automatically?)
We do still need sanity. I'm sort of half tongue-in-cheek trying to promulgate a rule that nobody can invoke AI to discount costs of expensive migrations until they've demonstrated an automated migration of the type in question with a medium-complexity codebase. We have to avoid waving our hands and saying, "Don't worry about that; AI will handle it," when it can't actually handle it yet.
But keep trying!
smusamashah8 months ago
When I think about LLMs reaching their peak at writing code, I can't help but think they will be writing hyper optimized code that will squeeze every last bit of processing power available to them.
I use these tools to get help here and there with tiny code snippets. So far I have not been suggested anything finely optimised. I guess it's because a greater chunk they were trained on isn't optimised for performance.
Does anyone know if any current LLMs can generate super optimised code (even assembly language) ? I don't think so. Doesn't feel we are going to have more intelligent machines than us in future if they full of slop.
- Pannoniae8 months ago
  Nope, tried various models (you know the common stuff, Claude 3.7, o1, R1, stuff like that) to write SIMD code - both as c++ intrinsics and .NET/c# vector intrinsics - the results have been really really subpar.
dehrmann8 months ago
> Abstract indicators to the tune of "this analysis shows Y% more code has been produced in the last quarter". (This can just indicate AI producing code slop/bloat).
I suspect the metrics you sometimes hear like "x% of new code was written by an LLM" are being oversold because they're reported by people interested in juicing the numbers, so they count boilerplate, lines IDE autocomplete would have figured out, and lines that had to be fixed.
semanticjudo8 months ago
The author has implied a false dichotomy: positioning the article as “it does 10x or it does nothing” (my paraphrasing) is disingenuous and hyperbolic. My experience is that on several tasks professional devs, including myself, can get to an answer much faster than pre-LLM. For example, I’ve never had to use SQL frequently enough to become an expert. Prior to LLMs, creating queries beyond the basic would take an hour of Googling and keyboard head banging (or find an expert to help who is invariably doing their own job). Now, the same thing takes 6 minutes. Arguably 10x faster for this task. But since I don’t do this often nor have 40 other examples like this, I’d never claim it makes me 10x more productive. But I DO run into 5 or 6 of this and similar examples a week and several others of smaller magnitude. And that has a meaningful impact on my productivity. I could go on to describe in what ways I can see this productivity improvement but the primary point is that it is not all or nothing. An LLM might make me 20% more productive across my week and that is still a big deal when compared with just not having it.
- eesmith8 months ago
  I don't think the author suggest a 10x-or-bust dichotomy. From the end of the piece:
  > I expect LLMs have definitely been useful for writing minor features or for getting the people inexperienced with programming/with a specific library/with a specific codebase get started easier and learn faster. They've been useful for me in those capacities. But it's probably like a 10-30% overall boost, plus flat cost reductions for starting in new domains and for some rare one-off projects like "do a trivial refactor".
  That "10-30% overall boost" matches your "20% more productive" pretty well.
ilrwbwrkhv8 months ago
For me the biggest boost has been for front-end code. I have never considered front-end development, real development and the whole madness of next.js components which somebody wants to write or add and all of the CSS animations and designs and some of the tailwind class madness I let AI handle.
That stuff is already a mess so the AI slop that comes out is also messy and that's fine as long as it looks good and performs well. and does what I want and it's also really trivial to change.
However, I'm not letting it come near any backend code or actual development.
- jdlshore8 months ago
  Front-end code is real development. It’s the part of your system that users interact with, and the part that has the most influence on people’s perceptions of your work. It’s also very fiddly and full of edge cases, precisely because it interacts with human beings.
  If you think of it as “not real,” of course you’ll end up with a mess. But that says as much about your choices as it does about front-end development.
black_138 months ago
[dead]
simonswords828 months ago
Without over complicating it, using AI is currently about 4x productivity in mid to senior developers.
- warkdarrior8 months ago
  > Without over complicating it, using AI is currently about 4x productivity in mid to senior developers.
  That is awesome. It means companies can reduce their mid and senior dev headcount by 75%!
  - simonw8 months ago
    Or it means they could double their spend on senior engineers and get a 8x boost in output from their team.
    Ever seen a company that didn't have a backlog?