I spent a lot of time trying to think about how we arrived here. where I work there are a lot of Senior Directors and SVPs who used to write code 10+ years ago. Who if you would ask them to build a little hack project they would have no idea where to start. And AI has given them back something they've lost because they can build something simple super quickly. But they fail to see that just because it accelerates their hack project, it won't accelerate someone who's an expert. i.e. AI might help a hobbyist plant a garden, but it wouldn't help a farmer squeeze out more yield.
I would say that this is the wrong distinction. I'm an expert who's still in the code every day, and AI still accelerates my hack projects that I do in my spare time, but only to a point. When I hit 10k lines of code then code generation with chat models becomes substantially less useful (though autocomplete/Cursor-style advanced autocomplete retains its value).
I think the distinction that matters is the type of project being worked on. Greenfield stuff—whether a hobby project or a business project—can see real benefits from AI. But eventually the process of working on the code becomes far more about understanding the complex interactions between the dozens to hundreds of components that are already written than it is about getting a fresh chunk of code onto the screen. And AI models—even embedded in fancy tools like Cursor—are still objectively terrible at understanding the kinds of complex interactions between systems and subsystems that professional developers deal with day in and day out.
That's better than my past experience with hobby projects, but also nowhere near as big as the kinds of software systems I'm talking about professionally. The smallest code base I have ever worked on was >1M lines, the one I'm maintaining now is >5M.
I don't doubt that you can scale the models beyond 10K with strategies like this, but I haven't had any luck so far at the professional scales I have to deal with.
You have to give it the right context and direction — like you would to a new junior dev — but then it can be very good. Eg.
> Implement a new API in `example/apis/new_api.rs` to do XYZ which interfaces with the system at `foo/bar/baz.proto` and use the similar APIs in `example/apis/*` as reference. Once you're done, build it by running `build new_api` and fix any type errors.
Without that context (eg. the example APIs) it would flail, but so would most human engineers.
In general though, its been a lot of learning on how to make llms work for me, and I do wonder if people simply dismiss too quickly because they subconsciously don't want them to work. Also "llm" is too generic. Copilot with 4o sucks but claude in cursor and windsurf does not suck.
It’s just like having a junior dev. Without enough guidance and guardrails, they spin their wheels and do dumb things.
Instead of focusing of lines of code, I’m now focusing on describing overall tasks, breaking them down, and guiding an LLM to a solution.
It’s easy to see this effect in any new project you start with AI, the first few pieces of functionality are easy to implement. Boilerplate gets written effortlessly. Then the ai can’t reason about the code and makes dumb mistakes.
In my experience, AI is helpful for that first 90% — when the codebase is pretty simple, and all of the weird business logic edge cases haven’t crept in. In the last 10%(as well as most “legacy” codebases), it seems to have a lot trouble understanding enough to generate helpful output at more than a basic level.
Furthermore, if you’re not deliberate with your AI usage, it really gets you into “this code is too complicated for the AI to be much help with” territory a lot faster.
I’d imagine this is part of why we’re not seeing an explosion of software productivity.
But I've found that there are a lot of places where it kind of falls over. I recently had Cursor do a large "refactoring" for me, and I was impressed with the process it went through, but at the end of the day I still had to review it all, it missed a couple places, and worse, it undid a bug fix that I put in (the bug was previously created when I had AI write a short function for me).
The other thing the makes me really worried is that AI makes it easy to be lazy and add tons of boilerplate code, where in the old world if I had to do it all manually I would definitely have DRY-ed stuff up. So it makes my life immediately easier, but the next guy now is going to have a shit ton more code to look at when they try to understand the project in the first place. AI definitely can help with that understanding/summarization, but a lot of times I feel like code maintenance is a lot of finding that "needle in a haystack", and AI makes it easy to add a shit ton of hay without a second thought.
10x, 20x etc productivity boosts really should be easy to see. My favorite example of this is the idea of porting popular things like media wiki/wordpress to popular things like Django/rails. Charitable challenge right, since there’s lots of history / examples, and it’s more translation than invention. What about porting large well known code bases from c to rust, etc. Clearly people are interested in such things.
There would be a really really obvious uptick in interesting examples like this if impossible dreams were now suddenly weekend projects.
If you don’t have an example like this.. well another vibes coding anecdote about another CRUD app or a bash script with tricky awk is just not really what TFA is asking about. That is just evidence that LLMs have finally fixed search, which is great, but not the subject that we’re all the most curious about.
For me its been a everso slight net positive.
In terms of in-IDE productivity it has improved a little bit. Stuff that is mostly repetitive can be autocompleted by the LLM. It can, in some cases provide function names from other files that traditional intelliCode can't do because of codebase size.
However it also hallucinates plausible shit, which significantly undermines the productivity gains above.
I suspect that if I ask it directly to create a function to do X, it might work better. rather than expecting it to work like autocomplete (even though I comment my code much more than my peers)
over all rating: for our code base, its not as good as c# intelliCode/VS code.
Where it is good is asking how I do some basic thing in $language that I have forgotten. Anything harder and it start going into bullshit land.
I think if you have more comprehensive tests it works better.
I have not had much success with agentic workflow, mainly because I've not been using the larger models. (Our internal agentic workflow is limited access)
And it's really good for basic stuff in things you don't want to have to look up. E.g. "write me JavaScript to delete all DOM nodes with class 'foo'".
I reckon you're underestimating how much time that saves though. The auto-complete saves me a few seconds many times a day. Maybe 2-3 minutes a day. A whole day in one year.
The "write me some shitty script" stuff saves much more time. Sometimes maybe an hour. That's rarer but even one a month that's still a whole day in a year.
Maybe 2 days in a year doesn't seem significant but given the low cost of these AI tools I think any company is self-harming if they don't pay for them for everyone.
(Also I think the hallucination isn't that bad once you get used to it.)
Now let's count the hours of "useless meetings" and how much time could be saved there. And that would cost us nothing.
I'm not sure saving two days of work _a year_ is worth the hassle of talking to an LLM like it's a 5 year old - IMHO.
You don't though.
This is how LLMs are most effective, and also the reason why I don't believe non programmers will be crushing code. You do actually need to know how to program.
I've been following him for a while, it's interesting what a polarizing figure he is..
a recent comment on his X feed resonated with quite a few people
"I’ve never seen an account that feels inspirational and makes me want to kill myself simultaneously"
He is definitely an entrepreneur/business man first and a developer second.. He has taught himself the skills to minimally get the job done functionally, has decent eye for design, and knows how to market... He makes it blatantly obvious, moreso than usual, that sales and marketing are way more important to making money than doing things technically "the right way".. At least on these smaller scale things because he only knows how to work by himself.
People hate that he's figured out how to make tens of thousands of dollars from a game that is arguably worse than hundreds of games that make nothing... I see that as just another skill that can be learned. And it is cool how transparent he is about his operations.
But yes, even with his blueprints it is hard to replicate his successes, my attempt of "one shotting" a video game was not very impressive..
That's why it's mind blowing to me when devs say AI writes bad code. For the most part, who cares?
Yeah he was/is a big name for digital nomad, I used to quote his figured like $70K/mo I bet it's more now.
I think remotok is written in PHP and jQuery (example of makes money, tech is not fancy)
He made 10’s of thousands from advertising right? It feels like most of what the does the lesson is if you follow the steps I take you can be an entrepreneur too which is mostly true, with the game it feels like, if you develop a following of millions and then create something that leaps to the front of the hype train people may give you money for a while. Which doesn’t feel like what most people are hoping from “vibe coding a game can make you 60k/month.”
But no shade to him.
I hate the term vibe coding, but as I understand it, it means just blabbering to cursor (or so) with working code as a result; he didn't do that as cursor (or replit or whatever current) cannot do that for more than hello world; if you do not read the generated code and specifically correct it when the ai gets stuck, it simply won't work. Levels can read and write code; people who want to 'vibe code' often cannot.
And agreed in the case of this game, the success is all because of his following. Not so with his other products though, this is just a joke to him..
And his game is technically better than anything I can put together
An example from today was using XAudio2 on windows to output sound, where that sound was already being fetched as interleaved data from a network source. I could have read the docs, found some example code, and bashed it together in a few hours; but I asked one of the LLMs and it gave me some example code tuned to my request, giving me a head start on that.
I had to already know a lot of context to be able to ask it the right questions, I suspect, and to thence tune it with a few follow up questions.
With some LLM help I was done before lunch. After lunch I wrote some additional unit tests and improved on the solution - again with LLM help (the object type changes in unit tests vs integration tests, one is the actual type, one is a JsonDocument).
I could've definitely done all that by myself, but when the LLM wrote the boilerplate crap that someone had definitely written before (but in a way I couldn't find with a search engine) I could focus on testing and optimising the solution instead of figuring out C# DynamicObject quirks.
… was curious what Gpt-4.5 would do with an absolute paucity of context :)
At first I was all in with Copilot and various similar plugins for neovim. It helped me get going but did produce the worst code in the application. Also I found (personal preference) that the autocomplete function actually slowed me down; it made me pause or even prevented me from seeing what I was doing rather than just typing out what I needed to. I stopped using any codegen for about four months at the end of 2024; I felt it was not making me more productive.
This year it’s back on the table with avante[0] and cursor (the latter back off the table due to the huge memory requirements). Then recently Claude Code dropped and I am currently feeling like I have productivity super powers. I’ve set it up in a pair programming style (old XP coder) where I write careful specs (prompts) and tests (which I code); it writes code; I review run the tests and commit. I work with it. I do not just let it just run as I have found I waste more time unwinding its output than watching each step.
From being pretty disillusioned six months ago I can now see it as a powerful tool.
Can it replace devs? In my opinion, some. Like all things it’s garbage in garbage out. So the idea a non-technical product manager can produce quality outputs seems unlikely to me.
Sometimes they give me maybe a 5–10% improvement (i.e. nice but not world changing). Usually that’s when they’re working as an alternative to docs, solving the odd bug, helping write tests or occasional glue code, etc. for a bigger or more complex/inportant solution.
In other cases I’ve literally built a small functioning app/tool in 6–12 hours of elapsed time, where most of that is spent waiting (all but unattended, so I guess this counts as “vibe coding”) while the LLM does its thing. It’s probably required less than an hour of my time in those cases and would easily have taken at least 1–2 days, if not more for me. So I’d say it’s at least sometimes comfortably 10x.
More to the point, in those cases I simply wouldn’t have tried to create the tool, knowing how long it’d take. It’s unclear what the cumulative incremental value of all these new tools and possibilities will be, but that’s also non-zero.
And when you name your test cases in a common pattern such as "MethodName_ExpectedBehavior_StateUnderTest" the LLM is able to figure it out about 80% of the time.
Then the other 20% of the time I'll make a couple of corrections, but it's definitely sped me up by a low double digit percentage ... when writing tests.
When writing code, it seems to get in the way more often than not, so I mostly don't use it - but then again, a lot of what I'm doing isn't boilerplate CRUD code.
This has bad assumptions about what higher productivity looks like.
Other alternatives include:
1. Companies require fewer engineers, so there are layoffs. Software products are cheaper than before because the cost to build and maintain them is reduced.
2. Companies require fewer engineers so they lay them off and retain the spend, using it as stock buybacks or exec comp.
And certainly it feels like we've seen #2 out in the wild.
Assuming that the number of people working on software you use remains constant is not a good assumption.
(Personally this has been my finding. I'm able to get a bit more done in my day by eg writing a quick script to do something tedious. But not 5x more)
Writing a new view used to take 5-10 minutes but now I can do it in 30 seconds. Since it's the most basic PHP/MySql imaginable it works very well, none of those frameworks to confuse the LLM or suck up the context window.
The point is I guess that I can do it the old fashioned way because I know how, but I don't have to, I can tell ChatGPT exactly what I want, and how I want it.
Copilot is very good at breaking my flow and all of the agent based systems I have tried have been disappointing at following incredibly simple instructions.
Coding is much easier and faster than writing instructions in English so it is hard to justify anything i have seen so far as a time saver.
For example a peace of code with a foreach loop that uses the collection name inside the loop instead of the item name.
Or a very nice looking peace of code but with a method call that does not exist in the used library.
I think the weakness of AI/LMMs is that it outputs probabilities. If the code you request is very common than it will probably generate good code. But that's about it. It can not reason about code (it maybe can 'reason' about the probability of the generated answer).
The moment I realized llm are better was when I needed to do something with screen coordinates of point clouds in three.js and my searches lead nowhere, doing it myself would take me 1 or 2 hours, the llm got correct working code on first try.
The nice thing about traction is that you can see it. When you shovel away the snow in your driveway you can move your car; that's nice. When you update your hot water kettle and it boils in 30 seconds, that's traction. Traction is a freshly installed dishwasher in your kitchen.
I sincerely ask – not because I am skeptical but because I am curious – where is the traction with LLMs in software?
1) Auto-complete on steroids. My LLM often auto-completes six or seven lines of code at a time, and can often infer intent from context. Wrong about as often as it is right, but if I had written the code myself, I would have been wrong on the first cut about as often as my coding assistant is. Somewhere between 25% and 60% of all code I write these days is written using TAB auto-complete of LLM suggestions. (Sorry I can't be more precise, but I use TAB auto-complete a LOT).
2) An assistant for writing chunks of code in technologies with which I do not have deep expertise. Provides easy entry into APIs that I'm not familiar with. My LLM will sketch out the framework code required to get some chunk of technology up and working. ("Initialize an ALSA MIDI port. Ctrl+I. "and Make it initialize the port in RAW mode". "and use camel case for the method names". &c). Particularly useful with ancient crusty Linux APIs that are short on documentation and examples. My use of Stack Exchange has gone to zero (and my use of Reddit).
3) Writing code for technologies that I have practically zero experience with. "Write a bash script that takes the output from this program (... format specified...)and use gnuplot to generate a graph."; "And display the results in a QT window"; "And set the title to "cumulative downloads"; "And rotate the x-axis labels by 45 degrees."; "and set all the fonts to 'Roboto'", &c. (I am for all practical purposes bash-script illiterate, and I've never used gnuplot before. I often do programming tasks these days that I would not have done before because the effort would not have been worthwhile without a coding assistant. Making my audio plugins read and write audio files in a variety of file format would have take a week, but with a coding assistant it took me about half a day. Without a coding assistant, I just wouldn't have done it.
4) Difficult debugging problems! "What's wrong with this code?" (That seriously works!) "Why doesn't this code properly recover from ALSA buffer underruns?" Doesn't necessarily get the right answer; but it will often provide a list of extremely useful suggestions. I've encountered several occasions where my coding assistant will help me solve difficult bugs that I've been working for hours on, one of which I had been working on for three or four days.
Of these, I think (2) provides the most productivity boost for me. I've often thought that the vast majority of time spent programming is spent reading documentation for the APIs you're using. Programming is mostly transforming documentation into usefulness. I will often write a one-liner saying what I want to do, and type ^I. My LLM is spookily good at stuff like that. No need to spend 40 minutes reading ffmpeg documenation. No need to google for the least toxicly awful code sample on the internet. Just, "// Write the audio buffer to an MP3 file using ffmpeg" ctrl+I.
Let's be frank; nobody is going to be converting natural language to A-list games. But LLMs are a tool. And tools are good. And LLM coding assistants are extremely good tools. The answer to all of this is: try it, and learn to use the tool, and take away what works for you, and get good at recognizing when your coding assistant will and will not work.
I get that you're asking because you haven't tried it yet. But I guarantee that if you do, you will recognize the traction pretty much instantly. It's immediately obvious that the tool is useful. And as you use it more and more, and adapt to the quirks, it becomes more and more useful.
I'm a professional senior programmer with 40 years of experience. I'm pretty sure it roughly doubles my productivity.
But we've now lived it so much that it sounds ridiculous to try to argue that the internet doesn't really make _that_ much of a difference.
Back in the late 1990s I had to build solutions to almost every problem from scratch.
"It's difficult to quantify this, but my guess for a while has been that I've had a giant productivity boost in the portion of my job which is typing code at a computer. And I would estimate I am two to three times more productive, faster at turning thoughts into working code, than I was before. But that's only 10% of my job."
But this was 5 months ago, pre-Claude 3.7.
But I can easily see a not so distant future where you don't even have to look at the code anymore and just let AI do its thing. Similar to us not checking the assembly instructions of compiled code.
I'm not saying that what you claim is entirely impossible, but it would require some major inventions and a very different approach to how ML is implemented and used compared to what's happening today. And I'm not convinced that the economics for that are there.
At least for mid- to high- complex projects.
Vibe coding might be fun but ultimately results in unmaintainable code.
Seems to be the easiest measurement of any effect
I use these tools to get help here and there with tiny code snippets. So far I have not been suggested anything finely optimised. I guess it's because a greater chunk they were trained on isn't optimised for performance.
Does anyone know if any current LLMs can generate super optimised code (even assembly language) ? I don't think so. Doesn't feel we are going to have more intelligent machines than us in future if they full of slop.
Most of the folks I've talked to about it have been trying it, but the majority of the stories are still ultimately failures.
There are exceptions though: there's been some success for porting things between say JUnit4 and JUnit5.
The successes do seem to be coming more frequently, as the models improve, as the tools improve, as people develop the intuition, and as we invest time and attention to building out LLM-tailored documentation (my prediction here is that the task-focused, bite-sized documentation style that seems like a fit for LLMs will ultimately prove to be more useful to developers than a lot of the existing docs!)
On the part of the models improving, I expect it's going to be a bit like the ChatGPT 3.5 to 4 transition: there are certain almost intangible thresholds that when crossed can suddenly make a qualitative difference in usability and success.
I definitely feel like my emotions regarding LLMs are a bit of a roller coaster. I'm turning 50 these days, and some days feel like I would rather not completely upend my development practices! And the hype -- oh god, the hype -- is absolutely nauseating. Every CTO in existence told their teams to go rub some AI on everything. Every ad is telling you their AI is already working perfectly (it isn't).
But then I compete in our internal CTF and come in third even though the rest of my team bailed, because ChatGPT can write RISCV assembly payloads, and I don't have to learn or re-learns it for half an hour. Or I get Claude to write a Javascript/SVG spline editor matching the diagramming-as-code system I'm using, in like 30 or 45 minutes. And it's miraculous. And things like Cursor for just writing your code when you already know what you want… magical.
Here's the thing though. Despite the nauseating hype, we have to keep trying and trying to use AI well. There's a there there. The things are obviously unbelievably powerful. At some point, we're going to figure out how to use them effectively, and they're going to get good enough to do most of our low- to medium-complexity coding (at the very least). We're going to have to re-architect our software around AI. (As an example, instead of kludgy multi-platform solutions like React Native or Kotlin Multiplatform or J2ObjC, etc., why not make all the tests textual, and have an LLM translate changes in your Kotlin Android codebase into Swift automatically?)
We do still need sanity. I'm sort of half tongue-in-cheek trying to promulgate a rule that nobody can invoke AI to discount costs of expensive migrations until they've demonstrated an automated migration of the type in question with a medium-complexity codebase. We have to avoid waving our hands and saying, "Don't worry about that; AI will handle it," when it can't actually handle it yet.
But keep trying!
Some of the juniors I mentor cannot formulate their questions clearly and as a result, get a poor answer. They don’t understand that an LLM will answer the question you ask, which might not be the global best solution, it’s just answering your question - and if you ask the question poorly (or worse - the wrong question) you’re going to get bad results.
I have seen significant jumps in senior programmers capabilities, in some cases 20x, and when I see a junior or intermediate complaining about how useless LLM coding assistants are it always makes me very suspicious about the person, in that I think the problem is almost certainly their poor communication skills causing them to ask the wrong things.
Another thing I've found is to actually engineer a solution - all the basic coding principles come into play, keeping code clean, cohesive and designing it to make testing easy. The human part of this is essential as AI has no barometer on when it's gone too far with abstraction or not far enough.
When code is broken up into good abstractions then the AI part of filling in the gaps is where you see a lot of the productivity increases. It's like filling water into an ice tray.
I suspect the metrics you sometimes hear like "x% of new code was written by an LLM" are being oversold because they're reported by people interested in juicing the numbers, so they count boilerplate, lines IDE autocomplete would have figured out, and lines that had to be fixed.
> I expect LLMs have definitely been useful for writing minor features or for getting the people inexperienced with programming/with a specific library/with a specific codebase get started easier and learn faster. They've been useful for me in those capacities. But it's probably like a 10-30% overall boost, plus flat cost reductions for starting in new domains and for some rare one-off projects like "do a trivial refactor".
That "10-30% overall boost" matches your "20% more productive" pretty well.
It boosts productivity in the way that a good IDE boost productivity, but nothing like 5x or even 2x. Maybe 1.2~1.4x.
I have found them pretty helpful writing sql. But I don't really know sql very well and I'd imagine that somebody who does could write what I need in far less time that it takes me with the LLM. While the LLM helps finish my sql task faster, the downside is that I'm not really learning it in the same way I would if I had to actually bang my head against the wall and understand the docs. In the long run, I'd be better off without it.
That stuff is already a mess so the AI slop that comes out is also messy and that's fine as long as it looks good and performs well. and does what I want and it's also really trivial to change.
However, I'm not letting it come near any backend code or actual development.
If you think of it as “not real,” of course you’ll end up with a mess. But that says as much about your choices as it does about front-end development.
That is awesome. It means companies can reduce their mid and senior dev headcount by 75%!
Ever seen a company that didn't have a backlog?