You sure? That’s basically all that’s being discussed.
There’s nothing in this article I haven’t heard 100 times before. Open any mainstream news article or HN/Reddit thread and you’ll find all of OP’s talking points about water, electricity, job loss, the intrinsic value of art, etc.
> It's common, if not inevitable, for people who feel strongly about $topic to conclude that the system (or the community, or the mods, etc.) are biased against their side. One is far more likely to notice whatever data points that one dislikes because they go against one's view and overweight those relative to others. This is probably the single most reliable phenomenon on this site. Keep in mind that the people with the opposite view to yours are just as convinced that there's bias, but they're sure that it's against their side and in favor of yours. [1]
Simplified comparisons like these rarely show the full picture [0]. They focus on electricity use only, not on heating, transport, or meat production, and certainly not on the CO2 emissions associated with New York’s airports. As a rough, back-of-the-envelope estimate, a flight from Los Angeles to New York with one seat is on the order of 1,000,000 small chat queries CO2e.
Of course we should care about AI’s electricity consumption, especially when we run 100 agents in parallel simply because we can. But it’s important to keep it in perspective.
[0] https://andymasley.substack.com/p/a-cheat-sheet-for-conversa...
It’s not just being discussed, it’s dominating the conversation on sites like Hacker News. It’s actually hard to find the useful and informative LLM content because it doesn’t get as many upvotes as the constant flow of anti-LLM thought pieces like this.
There was even the strange period when the rationalist movement was in the spotlight and getting mainstream media coverage for their AI safety warnings. They overplayed their hand with the whole call to drop bombs on data centers and the AI 2027 project with Scott Alexander that predicted the arrival of AGI and disastrous consequences in 2027. There was so much extreme doom and cynicism for a while that a lot of people just got tired of it and tuned out.
Mention anything about the water and electricity wastage and embrace the downvotes.
The more interesting questions are about psychology, productivity, intelligence, AGI risk, etc. Resource constraints can be solved, but we’re wrestling with societal constraints. Industrialization created modernism, we could see a similar movement in reaction to AI.
Well, that's just it. Those extentitential risks aren't even proven yet.
Meanwhile, threats to resources are already being felt today. "just overhaul our infrastructure" isn't an actionable solution that will magically fix things today or next week. Even if these don't end up being big problems in the grand scheme of things doesn't mean they aren't problems now.
It's almost like humans evolved from prehistoric times eating meat. Wild discovery, right?
> And we have cheap ways of producing electricity, we just need to overhaul our infrastructure and regulations.
Why don't AI bros lobby the changes? Much harder than selling snake oil and getting rid of programmers, I guess.
Wait until other countries jump in the bandwagon and see energy prices jump.
Currently it is mostly the US and China.
And RAM and GPU prices are already through the roof with SSDs to follow.
That is for now with ONLY the US market mostly.
And those “net benefits” you talk about have very questionable data behind it.
It’s a net loss of you ask me.
RAM and GPU prices going up, sure ok, but again: if you're claiming there is no net benefit from AI what is your evidence for that? These contracts are going through legally, so what basis do you have to prevent them from happening? Again I say its site specific: plenty of instances where people have successfully prevented data centers in their area, and lots of problems come up (especially because companies are secretive about details so people may not be able to make informed judgements).
What are the massive problems besides RAM + GPU prices, which again, what is the societal impact from this?
I spent some time thinking of a better word than "evil" before typing this comment. I can't think of one. Doing something bad that harms more than it helps for the purposes of enrichment and power is simply put: evil.
Exactly. Strongly agree with that. This closed world assumption never holds. We would only do less work if nothing else changes. But of course everything changes when you lower the price of creating software. It gets a lot cheaper. So, now you get a lot of companies suddenly considering doing things that previously would be too expensive. This still takes skills and expertise they don't have. So, they get people involved doing that work. Maybe they'll employ some of those but the trend is actually to employ things that are core to your company.
And that's just software creation. Everything else is going to change as well. A lot of software we use is optimized for humans. Including all of our development tools. Replacing all that with tools more suitable for automatic driving by AI is an enormous amount of work.
And we have decades worth of actively used software that requires human operators currently. If you rent a car, some car rental companies still interface with stuff written before I was born. And I'm > 0.5 century old. Same with banks, airline companies, insurers, etc. There's a reason this stuff was never upgraded: doing so is super expensive. Now that just got a bit cheaper to do. Maybe we'll get around to doing some of that. Along with all the stuff for which the ambition level just went up by 10x. And all the rest.
I don’t think typing code was ever the problem here. I’m doubtful this got any cheaper.
It’s always this tired argument. “But it’s so much better than six months ago, if you aren’t using it today you are just missing out.”
I’m tired of the hype boss.
I'm sure people were saying similar things about, say, aviation all through the first decades of the 20th century, "wow, those planes are getting better every few years"... "Until recently planes were just gimmicks, but now they can fly across the English channel!"... "I wouldn't have got in one of those death traps 5 years ago, but now I might consider it!" And different people were saying things like that at different times, because they had different views of the technology, different definitions of usefulness, different appetites for risk. It's just a wide range of voices talking in similar-sounding terms about a rapidly-developing technology over a span of time.
This is just how people are going to talk about rapidly-improving technologies for which different people have different levels of adoption at different times. It's not a terribly interesting point. You have to engage with the specifics, I'm afraid.
For what it is worth, I have also gone from a "this looks interesting" to "this is a regular part of my daily workflow" in the same 6 month time period.
Even I can see there has been a clear advancement in performance in the past six months. There will probably be another incremental step 6 months from now.
I use LLMs in a project that helps give suggestions for a previously manually data entry job. Six months ago the LLM suggestions were hit or miss. Using a recent model it’s over 90% accurate. Everything is still manually reviewed by humans but having a recent model handle the grunt work has been game changing.
If people are drinking a firehose of LinkedIn style influencer hype posts I could see why it’s tiresome. I ignore those and I think everyone else should do. There is real progress being made though.
The model providers should really start having LTS (at least 2 years) offerings that deliver consistent results regardless of load, IMO. Folks are tired of the treadmill and just want some stability here, and if the providers aren't going to offer it, llama.cpp will...
Yeah, exactly.
I don't deny that there's been huge improvements in LLMs over the last 6-12 months at all. I'm skeptical that the last 6 months have suddenly presented a 'category shift' in terms of the problems LLMs can solve (I'm happy to be proved wrong!).
It seems to me like LLMs are better at solving the same problems that they could solve 6 months ago, and the same could be said comparing 6 months to 12 months ago.
The argument I'd dismiss isn't the improvement, it's that there's a whole load of sudden economic factors, or use cases, that have been unlocked in the last 6 months because of the improvements in LLMs.
That's kind of a fuzzier point, and a hard one to know until we all have hindsight. But I think OP is right that people have been claiming "LLMs are fundamentally in a different category to where they were 6 months ago" for the last 2 years - and as yet, none of those big improvements have yet unlocked a whole new category of use cases for LLMs.
To be honest, it's a very tricky thing to weight into, because the claims being made around LLMs are very varied from "we're 2 months away from all disease being solved" to "LLMs are basically just a bit better than old school Markov chains". I'd argue that clearly neither of those are true, but it's hard to orient stuff when both those sides are being claimed at the same time.
"Problem solving" (which definitely has improved, but maybe has a spikey domain improvement profile) might not be the best metric, because you could probably hand hold the models of 12 months ago to the same "solution" as current models, but you would spend a lot of time hand holding.
Yes I agree here in principle here in some cases: I think there are certainly problems that LLMs are now better at but that don't reach the critical reliability threshold to say "it can do this". E.g. hallucinations, handling long context well (still best practice to reset context window frequently), long-running tasks etc.
> That's kind of a fuzzier point, and a hard one to know until we all have hindsight. But I think OP is right that people have been claiming "LLMs are fundamentally in a different category to where they were 6 months ago" for the last 2 years - and as yet, none of those big improvements have yet unlocked a whole new category of use cases for LLMs.
This is where I disagree (but again you are absolutely right for certain classes of capabilities and problems).
- Claude code did not exist until 2025
- We have gone from e.g. people using coding agents for like ~10% of their workflow to like 90-100% pretty typically. Like code completion --> a reasonably good SWE (with caveats and pain points I know all too well). This is a big step change in what you can actually do, it's not like we're still doing only code completion and it's marginally better.
- Long horizon task success rate has now gotten good enough that basically also enable the above (good SWE) for like refactors, complicated debugging with competing hypotheses, etc, looping attempts until success
- We have nascent UI agents now, they are fragile but will see a similar path as coding which opens up yet another universe of things you can only do with a UI
- Enterprise voice agents (for like frontline support) now have a low enough bounce rate that you can actually deploy them
So we've gone from "this looks promising" to production deployment and very serious usage. This may kind of be like you say "same capabilities but just getting gradually better" but at some point that becomes a step change. Before a certain failure rate (which may be hard to pin down explicitly) it's not tolerable to deploy, but as evidenced by e.g. adoption alone we've crossed that threshold, especially for coding agents. Even sonnet 4 -> opus 4.5 has for me personally (beyond just benchmark numbers) made full project loops possible in a way that sonnet 4 would have convinced you it could and then wasted like 2 whole days of your time banging your head against the wall. Same is true for opus 4.5 but its for much larger tasks.
> To be honest, it's a very tricky thing to weight into, because the claims being made around LLMs are very varied from "we're 2 months away from all disease being solved" to "LLMs are basically just a bit better than old school Markov chains". I'd argue that clearly neither of those are true, but it's hard to orient stuff when both those sides are being claimed at the same time.
Precisely. Lots and lots of hyperbole, some with varying degrees of underlying truth. But I would say: the true underlying reality here is somewhat easy to follow along with hard numbers if you look hard enough. Epoch.ai is one of my favorite sources for industry analysis, and e.g. Dwarkesh Patel is a true gift to the industry. Benchmarks are really quite terrible and shaky, so I don't necessarily fault people "checking the vibes", e.g. like Simon Willison's pelican task is exactly the sort of thing that's both fun and also important!
I don't doubt that the leading labs are lighting money on fire. Undoubtedly, it costs crazy amounts of cash to train these models. But hardware development takes time and it's only been a few years at this point. Even TODAY, one can run Kimi K2.5, a 1T param open-source model on two mac studios. It runs at 24 tokens/sec. Yes, it'll cost you $20k for the specs needed, but that's hobbyist and small business territory... we're not talking mainframe computer costs here. And certainly this price will come down? And it's hard to imagine that the hardware won't get faster/better?
Yes... training the models can really only be done with NVIDIA and costs insane amounts of money. But it seems like even if we see just moderate improvement going forward, this is still a monumental shift for coding if you compare where we are at to 2022 (or even 2024).
[1] https://x.com/alexocheema/status/2016487974876164562?s=20
Memory costs are skyrocketing right now as everyone pivots to using hbm paired with moderate processing power. This is the perfect combination for inference. The current memory situation is obviously temporary. Factories will be built and scaled and memory is not particularly power hungry, there’s a reason you don’t really need much cooling for it. As training becomes less of a focus and inference more of a focus we will at some point be moving from the highest end nvidia cards to boxes of essentially power efficient memory hbm memory attached to smaller more efficient compute in the future.
I see a lot of commentary “ai companies are so stupid buying up all the memory” around the place atm. That memory is what’s needed to run the inference cheaply. It’s currently done on nvidia cards and apple m series cpus because those two are the first to utilise High Bandwidth Memory but the raw compute of the nvidia cards is really only useful for training, they are just using them for inference right now because there’s not much pn the market that has similar memory bandwidth. But this will be changing very soon. Everyone in the industry is coming along with their own dedicated compute using hbm memory.
And those are the EASY things for AI to "put out of work".
HARD systems like government legacy systems are not something you can slap 200 unit tests on and say "agent re-write this in Rust". They're hard because of the complex interconnects, myriad patches, workarounds, institutional data codified both in the codebase and outside of it. Bugs that stay bugs because they next 20 versions used that bug in some weird way. I started my career in that realm. I call bullshit on AI taking any jobs here if it can't even accelerate the pace of _quality_ releases of OS and video games.
I'm not saying this won't happen eventually but that eventually is doing a heavy lift. I am skeptical of the 6-12 month timelines for broad job displacement that I see mentioned.
AIs (LLMs) are useful in this subtle way like "Google Search" and _not_ like "the internet". It's a very specific set of text heavy information domains that are potentially augmented. That's great but it's not the end of all work or even the end of all lucrative technical work.
It's a stellar tool for smart engineers to do _even_ more, and yet, the smart ones know you have to babysit and double-check -- so it's not remotely a replacement.
This is without even opening the can of worms that is AI Agency.
This has been the big one for me. Of all the best devs I know, only one is really big on AI, but he’s also the one that spent years in a relational database without knowing what isolation levels are. He was absolutely a “greenfield” kinda dev - which he was great at, tons of ideas but not big on the small stuff.
The rest use them judiciously and see a lot of value, but reading and understanding code is still the bottleneck. And reviewing massive PRs is not efficient.
"Software Engineers" isn't a single group, here. Some software engineers (making millions of dollars at top AI companies) are automating other software engineers' workflows and making those software engineers replaceable. The people who are going to be mostly negatively affected by these changes are not the people setting out to build them.
Do you really think we should be like ancient egyptian scribes and gatekeep the knowledge and skills purely for our sake?
I don't think there has been other field that is this open with trade secrets. I know a dentist who openly claims they should be paid for training dental interns (idk what the US terminology is) despite them being productive, simply because they will earn a lot of money in the future. I did not want and do not want software engineering to become that.
My strongly held belief is that anyone who think that way, also think that software engineering is reading tickets, searching for code snippets on stack overflow and copy-pasting code.
Good specifications are always written after a lot of prototypes, experiments and sample implementations (which may be production level). Natural language specifications exist after the concept has been formalized. Before that process, you only have dreams and hopes.
In the few apps I've built, progress is initially amazing. And then you get to a certain point and things slow down. I've built a list of things that are "not quite right" and then, as I work through each one all the strange architectural decisions the AI initially made start to become obvious.
Much like any software development, you have to stop adding features and start refactoring. That's the point at which not being a good software developer will really start to bite you, because it's only experience that will point you in the right direction.
It's completely incredible what the models can do. Both in speed of building (especially credible front ends), and as sounding boards for architecture. It's definitely a productivity boost. But I think we're still a long way off non-technical people being able to develop applications.
A while ago I worked on a non-trivial no-code application. I realised then that even though there's "no code", you still needed to apply careful thought about data structures and UI and all the other things that make an application great. Otherwise it turned into a huge mess. This feels similar.
I'm surprised I haven't seen anyone do a case study, having truly non-technical people build apps with these tools. Take a few moderately tech-savvy (can use MS office up to doing basic stuff in excel, understands a filesystem) people who work white collar jobs. Give them a one or two-day crash course on how Claude Code works. See what is the most complicated app which they can develop that is reasonably stable and secure.
LLMs will trivialize some subfields, be nearly useless in others, but will probably help to some degree in most of them. The range of opinions online about how useful LLMs are in their work probably correlates to what subfields they work in
I’ve done CRUD and the biggest proportion of the time was custom business rules and UI tweaking (updating the design system). And they were commits with small diffs. The huge commits were done by copy pasting, code generators and heavy use of the IDE refactoring tools.
I often find when I come up with the solution, these little autocompletes pretend they knew that all along. Or I make an observation they say something like "yes that's the core insight into this".
They're great at boilerplate. They can immediately spot a bug in a 1000 lines of code. I just wish they'd stop being so pretentious. It's us that are driving these things, it's our intelligence, intuition and experience that's creating solutions.
As long as code continues to need to be reviewed to (mostly) maintain a chain of liability, I don’t see SWE going anywhere like both hypebros and doomers seem to be intent on posting about.
Code review will simply become the bottle neck for SWE. In other words, reading and understanding code so that when SHTF the shareholders know who to hold accountable.
1. You know exactly what the code should look like ahead of time. Agents are handy here, but this is rare and takes no time at all as is
2. You don’t know exactly what it should be. Building the mental model by hand is faster than working backwards from finished work. Agents help with speeding up your knowledge here, but having them build the model is not good
Code review is and always was the bottle neck. You are doing code review when you write code.
Shareholders don’t want a “release” button that when they press it there’s a XX% chance that the entire software stack collapses because an increasingly complex piece of software is understood by no one but a black box called “AI” that can’t be fired or otherwise held accountable.
Like I'm sorry but if you couldn't see that this tech would be enormously useful for millions if not billions of people you really shouldn't be putting yourself out there opining on anything at all. Same vibes as the guys saying horseless carriages were useless and couldn't possibly do anything better than a horse which after all has its own mind. Just incredibly short sighted and lacking curiosity or creativity.
How to avoid being a Duryea, a Knox, a Marsh, a Maxwell-Briscoe, or a Pope-Toledo seems to be the real question.
Same thing pretty much happened in the early days of radio, with the addition of vicious patent wars. Which I'm sure we'll eventually see in the AI field, once the infinite money hose starts to dry up.
IDK this sounds a whole lot like paying for snippets
A snippet for a for loop in your editor, yeah, go ahead. A “snippet for “make this spec into half assed code” is a wholly different story.
Overpaid bubble snippets at best.