I'm very much pro AI for coding there are clearly significant capabilities there but I'm still getting my head around how to best utilise it.
Posts like these make it sound like ruthlessly optimizing your workflow letting no possible efficiency go every single day is the only way to work now. This has always been possible and generally not a good idea to focus on exclusively. There's always been processes to optimise and automate and always a balance as to which to pursue.
Personally I am incorporating AI into my daily work but not getting too bogged down by it. I read about some of the latest ideas and techniques and choose carefully which I employ. Sometimes I'll try and AI workflow and then abandon it. I recently connected Claude up to draw.io with an MCP, it had some good capabilities but for the specific task I wanted it wasn't really getting it so doing it manually was the better choice to achieve what I wanted in good time.
The models themselves and coding harnesses are also evolving quickly complex workflows people may put together can quickly become pointless.
More haste, less speed as they say!
I’ve had a lot of success dogfooding my own product, the Mermaid Studio plugin for JetBrains IDEs (https://mermaidstudio.dev).
It combines the deep semantic code intelligence of an IDE with a suite of integrated MCP tools that your preferred agent can plug into for static analysis, up to date syntax, etc.
I basically tell Claude Code to run the generated diagram through the analysis tool, fix issues it detects and repeat until fixed. Then generate a png or svg for a visual inspection before finalizing the diagram.
Now all of my planning and architecture docs are filled with illustrative flowcharts, sequence diagrams, and occasionally block diagrams for workshopping proposed UI layouts
From my perspective, all this energy spent on AI prompting is actually just planning meetings and whiteboarding in disguise, but since all that has the bad reputation of luring devs into power struggles and yak shaving this is the new way.
It's likely where most of their improved productivity is coming from. The people doing the meta-work just need be mature about it to avoid procrastinating.
If I read stuff like that, I wonder what the F they are doing. Agents work overnight? On what? Stuck in some loop, trying to figure out how to solve a bug by trial and error because the agent isn't capable of finding the right solution? Nothing good will come out of that. When the agent clearly isn't capable of solving an issue in a reasonable amount of time, it needs help. Quite often, a hint is enough. That, of course, requires the developer to still understand what the agent is doing. Otherwise, most likely, it will sooner or later do something stupid to "solve" the issue. And later, you need to clean up that mess.
If your prompt is good and the agent is capable of implementing it correctly, it will be done in 10 minutes or less. If not, you still need to step in.
I wonder how our comments will age in a few years.
Edit: to add
> Review the output, not the code. Don't read every line an agent writes
This can't be a serious project. It must be a greenfield startup that's just starting.
I don't think there will be a future where agents need to work on a limited piece of code for hours. Either they are smart enough to do it in a limited amount of time, or someone smarter needs to get involved.
> This can't be a serious project. It must be a greenfield startup that's just starting.
I rarely review UI code. Doesn't mean that I don't need to step in from time to time, but generally, I don't care enough about the UI code to review it line-by-line.
Badly. While I wouldn't assign a task to an LLM that requires such a long running time right now (for many reasons: control, cost etc) I am fully aware that it might eventually be something I do. Especially considering how fast I went from tab completion to whole functions to having LLMs write most of the code.
My competition right now is probably the grifters and hustlers already doing this, and not the software engineers that "know better". Laughing at the inevitable security disasters and other vibe coded fiascos while back-patting each other is funny but missing the forest for the trees.
To be clear, this is not a hypothetical situation. I wrote long specs like that and had large chunks of services successfully implemented up to around 2h real-time. And that was limited by the complexity of what I needed, not by what the agent could handle.
I can see overnight for a prototype of a completely new project with a detailed SPEC.md and a project requirements file that it eats up as it goes.
This approach breaks the moment you need to provide any form of feedback, of course.
Humans are not the only thing initiating prompts either. Exceptions and crashes coming in from production trigger agentic workflows to work on fixes. These can happen autonomously over night, 24/7.
Admittedly, I have never tried to run it that long. If 10 minutes are not enough, I check what it is doing and tell it to do what it needs to do differently, or what to look at, or offer to run it with debug logs. Recently, I have also had a case where Opus was working on an issue forever, fixing one issue and thereby introducing another, fix that, only for the original issue to disappear. Then I tried out Codex, and it fixed it at first sight. So changing models can certainly help.
But do you really get a good solution after running it for hours? To me, that sounds like it doesn't understand the issue completely.
>or offer to run it with debug logs.
Enabling it to add its own debug logs and use a debugger can allow it to do these loops itself and understand where it's going wrong with its current approach.
For this, which summarises vibe coding and hence the rest of the article, the models aren't good enough yet for novel applications.
With current models and assuming your engineers are of a reasonable level of experience, for now it seems to result in either greatly reduced velocity and higher costs, or worse outcomes.
One course correction in terms of planned process, because the model missed an obvious implication or statement, can save days of churning.
The math only really has a chance to work if you reduce your spend on in-house talent to compensate, and your product sits on a well-trodden path.
In terms of capability we're still at "could you easily outsource this particular project, low touch, to your typical software farm?"
They can one shot basic changes and refactors, or even many full prototypes, but for pretty much everything else they're going to start making mistakes at some point. Usually very quickly. It's just where the technology is right now.
The thing that frustrates me is that this is really easy to demonstrate. Articles like this are essentially hallucinations that, at least many, people mystifyingly take seriously.
I assume the reason they get any traction is that a lot of people don't have enough experience with LLM agents yet to be confident that their personal experience generalizes. So they think maybe there are magical context tricks to get the current generation of agents to not make the kinds of mistakes they're seeing.
There aren't. It doesn't matter if it's Opus 4.6 in Claude Code or Codex 5.3 xhigh, they still hallucinate, fail to comprehend context and otherwise drift.
Anyone who can read code can fire up an instance and see this for themselves. Or you can prove it for free by looking at the code of any app that the author says was vibecoded without human review. You won't have to look very hard.
Agents can accomplish impressive things but also, often enough, they make incomprehensibly bad decisions or make things up. It's baked into the technology. We might figure out how to solve that problem eventually, but we haven't yet.
You can iterate, add more context to AGENTS.md or CLAUDE.md, add skills, setup hooks, and no matter how many times you do it the agents will still make mistakes. You can make specialized code review agents and run them in parallel, you can have competing models do audits, you can do dozens of passes and spend all the tokens you want, if it's a non trivial amount of code, doing non trivial things, and there's no human in the loop, there will still be critical mistakes.
No one has demonstrated different behavior, articles and posts claiming otherwise never attempt to prove that what they claim is actually possible. Because it isn't.
Just to be clear, I think coding agents are incredibly useful tools and I use them extensively. But you can't currently use them to write production code without a human in the loop. If you're not reading and understanding the code, you're going to be shipping vulnerabilities and tech debt.
Articles like this are just hype. But as long as they keep making front pages they'll keep distorting the conversation. And it's an otherwise interesting conversation! We're living through an unprecented paradigm shift, the field of possibilities is vast and there's a lot to figure out. The idea of autonomous coding agents is just a distraction from that, at least for now.
Literally every single point in the article was good engineering practice way before AI. So it's either amnesia or simple ignorance.
In particular, "No coding before 10am" is worded a bit awkward, as it simply means "think before you write code", which... Does it need an article for saying it?
Not for nothing but The Art of War includes really insightful quotes like "If you do not feed your soldiers, they will die."
This seems entirely backwards. Why spend money to optimize something that _isn't_ the bottleneck?
For my whole life in technology, there was this thing called the Mythical Man Month: nine women cannot have a baby in a month. If you're Google, you can't just put a thousand software engineers on a product and wipe out a startup because you can only... build that product with seven or eight people. Once they've figured it out, they've got that lead.
That's not true with AI. If you have data and you have enough GPUs, you can solve almost any problem. It is magic. You can throw money at the problem. We've never had that in tech.So that only goes some distance and then you face new limitations.
If we take that to its logical conclusion, I think we can answer that question.
Getting rid of humans, unfortunately, also takes away their earnings and therefore their ability to purchase whatever product you are developing. The ultra rich can only purchase your product so often - hence better make it a subscription model.
So there is pressure on purchasing power versus earnings. Interesting to see what happens and why.
Genuinely seeking answers on the following - if you’re working that way, what are you “understanding” about what’s being produced? Are you monitoring for signal that points out gaps in your spec which you update; code base is updated, bugs are fixed and the show goes on? What insights can you bring to how the code base works in reality?
Not a sceptic, but thinking this stuff through ain’t easy!
Also, if your spec is taking too long for the agent to execute, odds are high that it's ambiguous, unsound, unreviewed, underspecified, unmaintainable, or the model is just optimized to waste tokens so as to bill you maximally.
Plan before you code. Now your plan is just in a prompt.
I don't *yet* subscribe to the idea of "code is context for AI, not an interface for a human", but I have to admit that the idea sounds feasible. I have many examples of small-to-mid size apps (local use only) where I pretty much didn't even look at the code beyond checking that it doesn't do anything finicky. There, the code doesn't mater because I know that I can always regenerate it from my specs, POC-s, etc. I agree that the paradigm changes completely if you look at code as something temporary that can be thrown away and re-created when the specification changes. I don't know where this leads to and if this is good or not for our industry, but the fact is - it is feasible.
I would never use this paradigm for anything related to production, though. Nope. Never. Not in the foreseeable future anyway.
> Everyone uses their own IDE, prompting style, and workflow.
In my experience with recent models this is still not a good idea: it quickly leads to messy code where neither AI nor human can do anything anymore. Consistency is key. (And abstractions/layers/isolation everywhere, as usual).
IDE - of course. But, at the very least, I would suggest using the same foundation model across the code base, .agent/ dirs with plenty of project documents, reusable prompts, etc.
--
P.S. Still not sure what does the 10AM rule bring, though...
I don't see how that would change if you accept the premise that code is now a commodity.
In these cases, I just read the main point behind in this case is "create a way for devs to share context when working with AI".
Some recent techniques claim to be solving this problem but none reached a release yet.
Working with what we have now, this is a recipe for disaster. Agents often lies about the outputs. The shorter the context space they have to manage while the bigger the data already in context makes it prone to lie and deceive.
It works ok for small changes on top of human code. That's what we know works now. The rest is more yet to be reached
Would prefer if 2028 models are concise and generates perfect refactors.
My team is incredibly clueless and complacent. I can't even get them to use TypeScript or to migrate from Yarn v1.