E.g. I'm a software architect and developer for many years. So I know already how to build software but I'm not familiar with every language or framework. AI enabled me to write other kind of software I never learned or had time for. E.g. I recently re-implemented an android widget that has not been updated for a decade by it's original author. Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI.
Same for daily tasks at work. AI makes me faster here, but also makes me doing more. Implement tests for all edge cases? Sure, always, I saved the time before. More code reviews. More documentation. Better quality in the same (always limited) time.
I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.
LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.
Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.
No clue has any research been done into this, just a thought OTTOMH.
I see it empowering to build custom tooling which need not be a high quality maintenance project.
So I’m not sure a study from 2024 or impact on code produced during 2024 2025 can be used to judge current ai coding possibilities.
I've found giving the LLMs the input and output interfaces really help keep them on rails, while still being involved in the overall process without just blindly "vibe coding."
Having the AI also help with unit tests around business logic has been super helpful in addition to manual testing like normal. It feels like our overall velocity and code quality has been going up regardless of what some of these articles are saying.
Even that could use some nuance. I'm generating presentations in interactive JS. If they work, they work - that's the result, and I extremely don't care about the details for this use case. Nobody needs to maintain them, nobody cares about the source. There's no need for "properly" in this case.
The datasets are big and having the scripts written in the performant language to process them saves non-trivial amounts of time, like waiting just 10 minutes versus an hour.
Initial code style in the scripts was rather ugly with a lot of repeated code. But with enough prompting that I reuse the generated code became sufficiently readable and reasonable to quickly check that it is indeed doing what was required and can be manually altered.
But prompting it to do non-trivial changes to existing code base was a time sink. It took too much time to explain/correct the output. And critically the prompts cannot be reused.
Where AI fails us is when we build new software to improve the business related to solar energy production and sale. It fails us because the tasks are never really well defined. Or even if they are, sometimes developers or engineers come up with a better way to do the business process than what was planned for. AI can write the code, but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to do X first. If we only did code-reviews then we would miss that step.
In a perfect organisation your BPM people would do this. In the world I live in there are virtually no BPM people, and those who know the processes are too busy to really deal with improving them. Hell... sometimes their processes are changed and they don't realize until their results are measurably better than they used to be. So I think it depends a lot on the situation. If you've got people breaking up processes, improving them and then decribing each little bit in decent detail. Then I think AI will work fine, otherwise it's probably not the best place to go full vibe.
LLMs combine two dangerous traits simultaneously: they are non-critical about suboptimal approaches and they assist unquestioningly. In practice that means doing dumb things a lazy human would refuse because they know better, and then following those rabbit holes until they run out of imaginary dirt.
My estimation is that that combination undermines their productivity potential without very structured application. Considering the excess and escalating costs of dealing with issues as they arise further from the developers work station (by factors of approximately 20x, 50x, and 200x+ as you get out through QA and into customer environments (IIRC)), you don’t need many screw ups to make the effort net negative.
Then don't ask it to write code? If you ask any recent high quality model to discuss options, tradeoffs, design constraints, refine specs it will do it for you until you're sick and tired of it finding real edge cases and alternatives. Ask for just code and you'll get just code.
This deserves a blog post all on its own. OP you should write one and submit it. It's a good counterweight to all the AI optimistic/pessimistic extremism.
This is the kind of argument that seems true on the surface, but isn't really. An LLM will do what you ask it to do! If you tell it to ask questions and poke holes into your requirements and not jump to code, it will do exactly that, and usually better than a human.
If you then ask it to refactor some code, identify redundancies, put this or that functionality into a reuseable library, it will also do that.
Those critiques of coding assistants are really critiques of "pure vibe coders" who don't know anything and just try to output yet another useless PDF parsing library before they move on to other things.
Even seasoned coders using plan mode are funneled towards "get the code out" when experience shows that the final code is a tiny part of the overall picture.
The entire experience should be reorganized that the code is almost the afterthought, and the requirements, specs, edge cases, tests, etc are the primary part.
Most coding assistant tools are flexible to applying these kinds of workflows, and these sorts of workflows are even brought up in Anthropic's own examples on how to use Claude Code. Any experienced dev knows that the act of specifically writing code is a small part of creating a working program.
Either you (a) don't review the code, (b) invest more resources in review or (c) hope that AI assistance in the review process increases efficiency there enough to keep up with code production.
But if none of those work, all AI assistance does is bottleneck the process at review.
Are they something worth using up vast amounts of power and restructuring all of civilisation around? No
Are they worth giving more power to megacorps over? No
Its like tech doesn't understand consent and then partially the classic case of "disrupting x" - thinking that you know how to solve something in maths, cs, physics and then suddenly that means you can solve stuff in a completely different field.
llms are over indexed.
LLM has been hollowing out the mid and lower end of engineering. But has not eroded highest end. Otherwise all the LLM companies wouldn’t pay for talent, they’d just use their own LLM.
I'm going to give an example of a software with multiple processes.
Humans can imagine scenarios where a process can break. Claude can also do it, but only when the breakage happens from inside the process and if you specify it. It can not identify future issues from a separate process unless you specifically describe that external process, the fact that it could interact with our original process and the ways in which it can interact.
Identifying these are the skills of a developer, you could say you can document all these cases and let the agent do the coding. But here's the kicker, you only get to know these issues once you started coding them by hand. You go through the variables and function calls and suddenly remember a process elsewhere changes or depends on these values.
Unit tests could catch them in a decently architected system, but those tests needs to be defined by the one coding it. Also if the architect himself is using AI, because why not, it's doomed from the start.
I think that it's mistaken to think that reasoning while writing the code is at all a good way to truly understand what your code is doing. (Without implying that you shouldn't write it by hand or reason about it.) You need to debug and test it thoroughly either way, and basically be as sceptical of your own output as you'd be of any other person's output.
Thinking that writing the code makes you understand it better can cause more issues than thinking that even if you write the code, you don't really know what it's doing. You are merely typing out the code based on what you think it should be doing, and reasoning against that hypothesis. Of course, you can be better or worse at constructing the correct mental model from the get go, and keep updating it in the right direction while writing the code. But it's a slippery slope, because it can also go the other way around.
A lot of bugs that take unreasonably long for junior-mid level engineers to find, seem to happen because: They trust their own mental model of the code too much without verifying it thoroughly, create a hypothesis for the bug in their own head without verifying it thoroughly, then get lost trying to reason about a made up version of whatever is causing the bug only to come to the conclusion that their original hypothesis was completely wrong.
Elegant code isn’t just for looks. It’s code that can still adapt weeks, months, years after it has shipped and created “business value”.
edit: typo
This trade-off predates LLMs by decades. I've been fortunate to have a good and fruitful career being the person companies hire when they're running out of road down which to kick the can, so my opinion there may not be universal, mind you.
That's a rather short-sighted opinion. Ask yourself how "inelegant code" find it's way into a codebase, even with working code review processes.
The answer more often than not is what's typically referred to as tech debt driven development. Meaning, sometimes a hacky solution with glaring failure modes left unaddressed is all it takes to deliver a major feature in a short development cycle. Once the feature is out, it becomes less pressing to pay off that tech debt because the risk was already assumed and the business value was already created.
Later you stumble upon a weird bug in your hacky solution. Is that bug negative business value?
Look at e.g. facebook. That site has not shipped a feature in years and every time they ship something it takes years to make it stable again. A year or so ago facebook recognized that decades of fighting abuse led them nowhere and instead of fixing the technical side they just modified policies to openly allow fake accounts :D Facebook is 99% moltbook bot-to-bot trafic at this point and they cannot do anything about it. Ironically, this is a good argument against code quality: if you manage to become large enough to become a monopoly, you can afford to fix tech debt later. In reality, there is one unicorn for every ten thousand of startups that crumbled under their own technical debt.
Because the entire codebase is crap, each user encounters a different bug. So now all your customers are mad, but they’re all mad for different reasons, and support is powerless to do anything about it. The problems pile up but they’re can’t be solved without a competent rewrite. This is a bad place to be.
And at some level of sloppiness you can get load bearing bugs, where there’s an unknown amount of behavior that’s dependent on core logic being dead wrong. Yes, I’ve encountered that one…
In my experience the limiting factor is doing the right choices. I've got a costumer with the usual backlog of features. There are some very important issues in the backlog that stay in the backlog and are never picked for a sprint. We're doing small bug fixes, but the big ones. We're doing new features that are in part useless because of the outstanding bugs that prevent customers from fully using them. AI can make us code faster but nobody is using it to sort issues for importance.
True, and I'd add the reminder that AI doesn't care. When it makes mistakes it pretends to be sorry.
Simulated emotion is dangerous IMHO, it can lead to undeserved trust. I always tell AI to never say my name, and never use exclamation points or simulated emotion. "Be the cold imperfect calculator that you are."
When it was giving me complements for noticing things it failed to, I had to put a stop to that. Very dangerous. When business decisions or important technical decisions are made by an entity that literally is incapable of caring, but instead pretends to like a sociopath, that's when trouble brews.
LLM has been hollowing out the mid and lower end of engineering. But has not eroded highest end. Otherwise all the LLM companies wouldn’t pay for talent, they’d just use their own LLM.
The talent isn't used for writing code anymore though. They're used for directing, which an LLM isn't very good at since it has limited real world experience, interacting with other humans, and goals.OpenAI has said they're slowing down hiring drastically because their models are making them that much more productive. Codex itself is being built by Codex. Same with Claude Code.
Remember a few years ago when Sam Altman said we had to pause AI development for 6 months because otherwise we would have the singularity and it would end the world? Yeah, about that...
tl;dr content marketing
There is this super interesting post in new about agent swarms and how the field is evolving towards formal verification like airlines, or how there are ideas we can draw on. Any, imo it should be on the front over this piece
"Why AI Swarms Cannot Build Architecture"
An analysis of the structural limitations preventing AI agent swarms from producing coherent software architecture
That's fine. I found the leading stats interesting. If coding assistants slowed down experienced developers while creating a false sense of development speed then that should be thought-provoking. Also, nearly half of code churned by coding assistants having security issues. That he's tough.
Perhaps it's just me, but that's in line with my personal experience, and I rarely see those points being raised.
> There is this super interesting post in new about agent swarms and how (...)
That's fine. Feel free to submit the link. I find it far more interesting to discuss the post-rose tinted glasses view of coding agents. I don't think it makes any sense at all to laud promises of formal verification when the same technology right now is unable to introduce security vulnerabilities.
They are from before the current generation of models and agent tools, they are almost certainly out of date and now different and will continue to evolve
We're still learning to crawl, haven't gotten to walking yet
I did, or someone else did, it's the link in the post you replied to
Context gathering and refinement is the biggest issue I have with product development at the moment.