This is a very academic approach to the subject - read what other people have written about it without ever doing it yourself. Study what someone said about LLM coding 50 years ago, before they were even invented, to see what you think about it.
I would strongly suggest to the author that you just give it a go, and see what you think, without the preconception of other people's opinions.
My experience has been remarkable, and, like others, I'm finding real joy in being able to move past the code to actually design and play with whole systems and architectures.
It gets to the essence of code; which is not about the code, but about the system that code implements. Being able to write code in 3 minutes not 30 minutes does not bog us down in review (the LLM is perfectly capable of reviewing code too). It frees us to explore systems and architectures without worrying about the sunk cost of the existing code, or the effort of changing it.
At best we will end up not owning nothing, not even the programming skills as everyone will be at the mercy of AI companies for their coding.
We are still in the honey moon phase of AI coding, I have a very pessimistic view of the future.
> (although I’m personally skeptical of the “10x programmer” concept, the software industry overall does seem to accept it as true)
To be fair, this statement from Brooks doesn't entirely match with the "10x programmer" we talk about. My take on it is when someone says "10x programmer" today, they mean 10x more productive than the average, not 10x more productive than the worst. Brooks' statement is about the latter. If he'd looked at the difference between average and best, I would assume you'd get something more like a 2x or 4x programmer.
10x relative to what exactly? It's not a statement grounded in any kind of reality.
I'm a 10x programmer at building Django apps compared to a developer who has never worked with Django before.
Someone who developers against WordPress on a daily basis will easily 10x my own attempts at building things on that platform.
Developers should write their own code and use LLMs to design and verify. Better, faster architecture and planning, pre-cleaned PRs and no skill atrophy or loss of understanding on the part of the developer.
I come in knowing what I need to build and at least one idea or more of how it should be done. I present the problem, constraints, potential solutions, and ask for criticisms and alternatives. I can keep it as broad as possible or I can get more granular like struct layouts, api endpoints, etc. I go back and forth until there's an approach I prefer and then I code that approach.
| it can code pretty well given a very tight and limited scope.
It's wildly better at tight and limited scope than large scale changes but even then I would rather code it myself.
One thing I would like to see is the use of LLMs for smarter semi-manual editing.
While programming I often need to make very similar changes in several places. If the instances are similar enough I can get away with recording a one-off keyboard macro to repeat, but if there are differences that are too difficult to handle this way I end up needing to do a lot of manual editing.
It would be nice to see LLMs tightly integrated into the editor so I can do a simple "place the cursor at things like this" based on an example or two. I'm sure more ideas for using LLMs more quickly perform semantic changes you intended are possible, instead of just prompting for a big diff. I feel there's a lot more innovation possible in this direction, where you're still "coding it yourself" but just faster.
What I did was make one commit by hand (involving multiple files), and then told Codex (last year's Codex!) to make the equivalent changes to other instances in the code base.
Never understood that argument. Because there’s two steps in design. Finding a good solution (discussing prior art, tradeoffs,…) and then nailing the technical side of that solution (data structures, formula,…). Is it the former, the latter or both?
Generally, the whole point of the "Power to the people?" (and to some extent the "On being left behind") section(s) is to underscore the two antithetical claims made by many LLM marketers: 1. LLMs are so powerful and so natural and easy that someone with no experience can create amazing software, and 2. LLM usage is a core skill, one that if you don't begin training now you'll be left behind.
Obviously, both of these can't be simultaneously 100% true--either it's easy enough for the non-programming layperson to successfully generate software for an intentional purpose, or, LLM assisted programming is a skill you need to train to avoid professional obsolescence in modern society. So, the article disagrees with the majority of both claims, and accepts a weakened/minor portion of each: 1. LLM output is easy to generate but accurate prompting matters, and 2. when used for software development professionally, some amount of skilled human intervention does indeed seem necessary. And now these two claims do align.
However, if professional software engineers who work with and read code constantly, armed with the best software practices to aid LLMs we can determine, cannot use modern AI tools without shooting their feet off at relatively frequent rates, certainly you'd expect the layperson who must put an even greater amount of undue faith in the validity of the results to be at extremely high-risk of foot-shooting. It's not "gatekeeping" to forewarn people against unwarranted trust in LLM output, nor is it "gatekeeping" to suggest that modern tech communicators/marketers describing an overly flowery LLM tooling landscape might be doing people a disservice.
I've had this conversation with managers in multiple organizations this year: "Yes, you could totally vibe code that instead of paying for a SaaS. But you have strict contractual and professional obligations about data security. Do you want to be deposed and asked, 'So, did you really just vibe code the system that led to the data leak? Did the vibe coders have any professional qualifications? Did they even look at the code?'"
Similarly, a backend server that handles 8 million users a day is expected to stay up.
Now, there are 10,000 things that have less demanding requirements. I'm actually really delighted that people are able to vibe code their own tools with minimal knowledge of software engineering! We have been chronically underproducing niche software all along.
But if your software already has on-call shifts (and SLAs, etc) like the GP, then I think you want to be smart about how you combine human expertise with LLMs.
It feels like a dunk to write that. But I genuinely do think there's so much motivated reasoning on both sides of this issue, and one signal of that is when people tip their hands like this.
Would you have the same reaction to requiring an approval for a production deployment? That’s driving the development process.
—-
Also jfc I need to cool it with the buzzwords, sorry I just got home from “talk like this all day” $job
A paradigm shift is an earth shattering, very important change - a complete change in thinking etc. LLMs are not that. They are simply some pretty new tools. Nice tools but they will whip off your metaphorical thumb just as quickly as a miss-used table saw.
You'll note that you mention "engineers are offloading": that's not a paradigm shift. That's a bunch of engineers discovering a better slide rule.
I'm old enough to remember moving on from slide rules (I still have mine) through calculators (ditto) to using fag packets and napkins for their real intended purpose.
The drill-driver also took engineering by storm but no-one ever used the term paradigm shift (to be fair, I don't think it was invented at the time and I can't be arsed to look it up).
If this sounds melodramatic it’s likely that it hasn’t fully taken root where you are yet.
I see opinions split on like “it’s just a dirty untrustworthy tool that is making our lives and the world a living hell” and “this is the second coming of Christ”. The reality is that right now we lie on that first part of the spectrum, but I am looking over the hill and seeing 4 horses and they are stampeding this way
Engage as a person, please.
It does? You mean "it tests itself faster", which is not really a test now, is it?
Funny, I thought that the major hurdle is improving accuracy and reliability, as it's always been. Engineering is necessary and useful, but it's a much simpler problem, which is why everyone is jumping on it.
I'm sure it was very difficult to program in machine code, but if now (or soon) anyone can just write software using a LLM without any sort of learning it changes everything. LLMs can plan and create something usable from simple instructions or ideas, and they will only get better.
I think LLMs will be (and already are) useful for many more things than programming anyway.
Did you read the section "Power to the People?" ? In it, the author dismantles your thesis with powerful, highly plausible arguments.
1. You don't have to be an LLM expert to get good, consistent results with LLMs.
My best vibe-code process after years of using LLMs is to have Claude Code create a plan file and then cycle it through Codex until Codex finds nothing more to review, then have an agent implement it. This process is trivial yet produces amazing results.
It's solved by better and better harnesses.
2. You don't have to write technical specs. The LLM does that for you. You just tell it "I want the next-tab button to wrap back to the first one" and it generates a technical plan. Natural language is fine.
3. Software that seems to work only to fail down the line in production is already how software works today. With LLMs you can paste the stacktrace or user bug email and it will fix it.
This is why vibe-coding works. Instead of simulating how an app will run in your head looking at its code, you run the app and tell the LLM what isn't working correctly. The app spec is derived iteratively through a UX feedback look.
4. I don't understand TFA's goalposts, but letting people create software that are only interested in the LLM process (rather than the software craftsmanship) would be a huge democratization of software.
Would there even be a debate in the tech community if such unassailable arguments existed? The author is entirely entitled to his opinion, just as I am allowed to disagree with him (not sure why I am also downvoted). The good thing is, if I'm right, we will see it in less than 10 years.
I don't buy that's true. The "only" part, anyway. Look at how UX with software has evolved. This is gonna be an old man yells at clouds take, but before smartphones, there were hotkeys. And man, you could fly with those things. The computers running things weren't as fast as they are today, but you could mash in a a whole sequence thru muscle memory, and just wait for it to complete. Now, you have to poke at your phone, wait for it to respond, poke at it some more. It's really not great for getting fast at it. AI advancement is going to be like that. Directionally generally it will be better, but there's going to be some niche where, y'know what, ChatGPT-4o really had it in a way that 5.5 does not. (Rose colored glasses not included.)
Then came the new Claude update, which many people say is worse. Even Anthropic says it got worse.[1] HN discussion back on April 15th: [2]
Some of this is a pricing issue. Turning "default reasoning effort" down from "high" to "medium" was a form of shrinkflation. Maybe this technology is hitting a price/performance wall.
[1] https://www.anthropic.com/engineering/april-23-postmortem
Just one more harness bro. Just one more agentic swarm. Please bro, just one more Claude Max subscription. Please bro.
All I would need from an LLM doubter is evidence that at tractable software engineering task LLM's are not improving. The strongest argument against the increasing general capabilities of LLM's are the ARC-AGI tasks, however the creators admit that each generation of LLM's exceed their expectations, and that AGI will be achieved within the decade.
That being said, I don't even think that arguing about this from a mathematical perspective is a worthwhile use of time. Calling something an asymptote in the first place requires defining a quantifiable "X" and "Y", which we don't even have. What we have are a bunch of synthetic benchmarks. Even ignoring the fact that the answers to the questions are known to regularly leak into the training data (in other words, it's possible for scores to increase while capabilities remain the same), there's also the fundamental fact that performance on benchmarks is not the same thing as performance in the real world. And being able to answer some arbitrary set of arbitrary questions on a benchmark which the previous model couldn't, does not have a quantifiable correlation to some specific amount of real-world improvement.
The OP article focuses on research papers which assess real-world impact of LLMs within software organizations, which I think are more representative.
I wouldn't call myself an "AI doubter" - I use LLMs every day. When you say "doubter" you're not referring to "AI" in general, or the fact that AI is helpful or boosts productivity (which I believe it does). You're rather referring to the very specific, very extraordinary claim, that LLMs will surpass humans in coding. If that's the case then yeah I'm a doubter, at least on any foreseeable timescale.
You’re definitely right that people adopt agentic workflows and are disappointed or worse, but the point is the disappointment has already reduced substantially and will continue to do so. We know this because we know the scaling laws, and also because learning theory has been around for many decades.
I'll give you the coding harnesses themselves are better because that was a new product category with a lot of low-hanging fruit, but have the models actually improved in a way that isn't just benchmaxxing? I'd argue the models seem to be regressing. Even the most AI-pilled people at my company have all complained that Opus 4.7 is a dud. Anecdotally, GPT 5.5 seems decent, but it's rumored to be a 10T parameter model, isn't noticeably better than 5.4 or 5.3, is insanely expensive to use, and seems to be experiencing model collapse since the system prompt has to beg the thing to not talk about goblins and raccoons.
- AI coding is a disappointing fad (“fever dream?”). - that has not made meaningful progress in…6 months? - coding harness is improving - model improvements are lies: it’s just businesses “benchmaxxing” and misleading people. Real performance has not meaningfully improved - “opus 4.7 is a dud” - 5.5 suffering from “system collapse” (I’ve never heard this term before)
Since you asked and I assume you are rational and really are interested to know:
- we have many measures of performance and have studied how one particularly important but unintuitive measure (pertaining perplexity) scales with data, compute, and model size. These laws continue to hold and have satisfying theoretical origins.
- whatever the scale of 5.5, consider we have far more room to go on the scaling front. Probably another 2-3 orders of magnitude before we hit limiting bottlenecks.
- that’s also fine because scaling is only part of the puzzle. RL on verifiable rewards is virtually guaranteed to get you optimal performance and that’s the entirety of the excitement around coding agents
- while you are right about benchmarks and measurement science having a ton of weaknesses, they are not at all garbage. There are probably around 40,000 benchmarks in the literature (this is not a made up number by the way it really is around that many). Epoch made a great composite measure using good stats (IRT) called their epoch capability index, METR has done and redone their time horizon measure and it holds up beautifully. There is a ton of signal in many benchmarks and they all tell a pretty compelling story.
- additionally, this is not some unknowable thing. It strikes me as odd that people’s prior on HN a lot of time is “it’s all dumb rich people putting way too much dumb money in this”. Sorry but the world is not that dumb. Trillions of CapEx is usually pretty rationally allocated. And it is!
- why? Because this is already known what happens when you do what we’re doing. When you have a verifiable reward system, have a certain amount of compute available, have seed data to get you to where you can do RL, you will be almost guaranteed to get superhuman performance
We're almost 6 months into all this AI-code madness and I've yet to see that "rapid improvement" you mention. As in software products that are genuinely better compared to 6 months ago, or new software products (and good software products at that) which would have not existed had this AI craze not happened.
It's the "YOLO" of business strategies.
(Sorry for bikeshedding, but you can't discuss an article if you can't read it.)
It's not terribly hard to check either. You can do some spot checks with cost dashboards in AWS, Datadog, etc and see if the numbers line up
Can also tell Claude "go right size the environment, pull p95 usage metrics for the last 3 months" and a couple hours later, a bunch of money is saved. Much easier than manually pulling trend data and also easier than installing/configuring/managing tools that do it for you.
The benefits of the time savings of having progressily better tooling over time add up quickly.
Use the damn thing or don't.
It's that simple.
You could fetch some unfinished github repos or download free templates. It’s actually faster than LLMs, still no body would do it.
I don’t start my project with the ecommerce nextjs starter repo. I build it from scratch, because it’s faster...
The author didn't seem to read the Brooks essay for comprehension. There is an entire section about expert systems that foreshadows agents. While there is no singular silver bullet, Brooks explores the most promising techniques to reduce essential complexity that were anticipated in 1986.
> The most powerful contribution of expert systems will surely be to put at the service of the inexperienced programmer the experience and accumulated wisdom of the best programmers. This is no small contribution.
Furthermore, his objection to automatic programming was simply an argument from incredulity, which is an understandable opinion at the time, yet quite vacuous in hindsight.
I think the biggest benefit language models have provided me is in the auxiliary aspects to programming: search, debugging, rubber ducking, planning, refactoring. The actual code generation has been mixed.
I had an LLM try and implement a fairly involved feature the other day, providing it with API spec details, examples from other open source libraries, and plenty of specifications. It's also something readily available in training data as well, but still fairly involved.
On first glance it looked great, and had I not spent the time to investigate deeper I would have missed some glaring deficiencies and omissions that render its implementation worthless. I am now going back and writing it by hand, but with language models providing assistance along the way, and it's going much better.
I think people are being unrealistic by thinking that the usage of language models in their side projects represent something broader. It's almost the perfect situation for language models: small, greenfield code bases, no review, no responsibility, and no users. It goes up on GitHub with a pretty readme, and then off to social media where they post about how developers are "cooked". It's just not a very realistic test.
In the end we will probably see large productivity increases by integrating language models, but they won't be replacing developers but rather augmenting them.
Design patterns in an older (programming) language become core language features in a newer one. As we internalize and abstract away the best patterns for something, it becomes accidental but it's only obvious in retrospect.
The article quotes Brooks (quoting Parnas) about just that (later, in context of LLMs):
> automatic programming always has been a euphemism for programming with a higher-level language than was presently available to the programmer. [...] Once those accidents have been removed, the remaining ones are smaller, and the payoff from their removal will surely be less.
Considering this was written when C was the hot new stuff, let's compare the ability to code a CRUD web app in Python/Django vs C. What Brooks and Parnas are saying that Python/Django cannot bring big improvements in building a CRUD web app when compared to C because they can only make it easier to program, reducing accidental complexity. But we've since redefined "accidental" and I would argue that you can write a CRUD web app in Python/Django at least 100x faster than in C (and probably at least 100x more secure), although it may take 1000x as more CPU and RAM while running.
So "we removed most of the accidental difficulties and the most that remains is essential" is a kind of "end of history" argument.
> I’d be surprised if there’s even a doubling of productivity still available from a complete elimination of remaining accidental difficulty.
It's good that this statement has a conditional subjective guard, because that's just punditry.
> LLM coding does not represent a silver bullet
Here I agree with the author completely, but probably not for the same reasons. The definition of "silver bullet" the article uses (quoting Brooks):
> There is no single development, in either technology or management technique, which by itself promises even a single order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.
AI-assisted development is not a single technique, the same way "devops" or "testing" or "agile" is not a single technique. But more importantly, I agree it will take time to find best practices, for the technology change to slow down, and for the best approaches to diffuse across the industry.
The article's conclusion:
> You should be adopting and perfecting solid foundational software development practices like version control, comprehensive test suites, continuous integration, meaningful documentation, fast feedback cycles, iterative development, focus on users, small batches of work… things that have been known and proven for decades, but are still far too rare in actual real-world software shops.
These are great and I'm gonna let him/her finish, but it's curious actual coding isn't mentioned anywhere. The author doesn't suggest "polish your understanding of C pointer semantics" or "Rust ownership model" or "Django ORM" or to really, deeply, understand B-trees. Looks like pedestrian detailes like those are left as an excercise for the reader ... or the reader's LLM.
I'm reminded of this scene from the Matrix: https://www.youtube.com/watch?v=cD4nhYR-VRA where the older wise man discusses societies reliance on AI
"Nobody cares how it works, as long as it works"
We're done. I for one welcome our new AI Overlords, or more accurately still welcome the tech bro billionares who are pulling the strings
There are, IMHO, fewer reasons to believe they will be able to do that rather than not, though.
The current state of the art is irrelevant. Only the first couple of time derivatives matter.
I would say I got better at both of those over the last 12-18 months. Are your skills static?
Lmao why does it seem outlandish to other people? Perhaps they never thought too deeply in the first place to recognise it.
Really? That's like someone during an economic boom saying "The economy is the worst it'll ever be. There is no reason to expect things to not continue to improve".
Until recently. dramatic pause
And then AI happened.
I honestly couldn't force myself to finish yet another blog post about how "we're not yet sure what impact LLMs will have on society" or whatever beleaguered point the author was attempting to make.
"Some random person's take on LLMs" was maybe interesting in 2024. Today it is not even remotely interesting.
There are a gazillion more interesting things happening today that ought to be of interest to the median HN reader. Can we talk about those instead?
It sounds like you actually do want to talk about how much you don't want other people to talk about LLMs.
i was doing an ML Sec phd a year or two before all this hype took off. i took one of the OG transformer papers along to present at our official little phd reading group when the paper was only a few months old (the details of this might be a bit sketchy here, was years ago now).
now i want nothing to do with the field in any way shape or form. i’m just done.
edit -- i got incredibly angry after writing this comment. pure hatred and spite for all the charlatans and accompanying bullshit.
The article goes on to assume there’s no 10x gain to be had but misses one big truth.
Needing to type the code is an enormous source of accidental difficulty (typing speed, typos, whether you can be arsed to put your hands on the keyboard today…) and it is gone thanks to coding agents.