If reviewing is the expensive part now, optimize for reviewability.
Sitting in your cubical with your perfect set of test suites, code verification rules, SOP's and code reviews you wont want to hear this, but other companies will be gunning for your market; writing almost identical software to yours in the future from a series of prompts that generate the code they want fast, cheap, functionally identical, and quite possibly untested.
As AI gets more proficient and are given more autonomy (OpenClaw++) they will also generate directly executable binaries completely replacing the compiler, making it unreadable to a normal human, and may even do this without prompts.
The scenario is terrifying to professional software developers, but other people will do this regardless of what you think, and run it in production, and I expect we are months or just a few years away from this.
Source code of the future will be the complete series of prompts used to generate the software, another AI to verify it, and an extensive test suites.
1. AI is meant to make us go faster, reviews are slow, the AI is smart, let it go.
2. There are plenty of AI maximizers who only think we should be writing design docs and letting the AI go to town on it.
Maybe, this might be a great time to start a company. Maximize the benefits of AI while you can without someone who has never written a line of code telling you that your job is going to disappear in 12 months.
All the incentives are against someone who wants to use AI in a reasonable way, right now.
Or you can be a grifter and make some AI wrapper yourself and cash out with some VC investment. So good time for a new company either way.
It's like watching someone else solve a puzzle, or watching someone else play a game vs playing it yourself (at least that's half as interesting as playing it through)
Then don’t even bother looking at C work or below.
Also works with planning before any coding sessions. Gemini + Opus + GPT-xhigh works to get a lot of questions answered before coding starts.
Can you blame them? All the AI companies are saying “this does a better job than you ever could”, every discussion topic on AI includes at least one (totally organic, I’m sure) comment along the lines of “I’ve been developing software for over twenty years and these tools are going to replace me in six months. I’m learning how to be a plumber before I’m permanently unemployed.” So when Claude spits out something that seems to work with a short smoke test, how can you blame developers for thinking “damn the hype is real. LGTM”?
I've been using LLMS for 14+ months now and they've exceeded my expectations.
I'm thinking HVAC or painting lines in parking lots. HVAC because I can program smart systems and parking lot lines because I can use google maps and algos to propose more efficient parking lot designs to existing business owners.
There is that paradox when if something becomes cheaper there is more demand so we'll see what happens.
Finally, I'm a mediocre dev that can only handle 2-3 agents at a time so I probably won't be good enough.
The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?
Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.
But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.
Software has, since at least the adoption of "agile" created an industry culture of not just refusing to build to specs but insisting that specs are impossible to get from a customer.
I've been saying "the last job to be automated will be QA" and it feels more true every day. It's one thing to be a product engineer in this era. It's another to be working at the level the author is, where code needs to be verifiable. However, once people stop vibing apps and start vibing kernels, it really does fundamentally change the game.
I also have another saying: "any sufficiently advanced agent is indistinguishable from a DSL." I hadn't considered Lean in this equation, but I put these two ideas together and I feel like we're approaching some world where Lean eats the entire agentic framework stack and the entire operating system disappears.
If you're thinking about building something today that will still be relevant in 10 years, this is insightful.
If a piece of code is produced by an agent loop (prompt -> tool calls -> edits -> tests), the real artifact isn’t just the final code but the trace/pipeline that produced it.
In that sense verification might look closer to: checking constraints on the generator (tests/specs/contracts), verifying the toolchain used by the agent, and replaying generation under controlled inputs.
That feels closer to build reproducibility or supply-chain verification than traditional program proofs.
The Dafny code formed a security kernel at the core of a service, enforcing invariants like that an audit log must always be written to prior to a mutating operation being performed. Of course I still had bugs, usually from specification problems (poor spec / design) or Claude not taking the proof far enough (proving only for one of a number of related types, which could also have been a specification problem on my part).
In the end I realized I'm writing a bunch of I/O bound glue code and plain 'ol test driven development was fine enough for my threat model. I can review Python code more quickly and accurately than Dafny (or the Go code it eventually had to link to), so I'm back to optimizing for humans again...
Like an engineer overseeing the construction of a bridge, the job is not to lay bricks. It is to ensure the structure does not collapse.
The marginal cost of code is collapsing. That single fact changes everything.
Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.
If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.
This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.
Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.
In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.
I recall a time, maybe around 2013-2017, when people were talking about 4 or 5 nines. But sometime around then the goalposts shifted, and instead of trying to make things as reliable as possible, it started becoming more about seeing how unreliable they can get before anyone notices or cares. It turns out people will suffer through a lot if there's some marginal benefit--remember what personal computers were like in the 1990s before memory protection? Vibe coding is just another chapter in that user hostile epic. Convenient reliability, like this author describes, (if it can be achieved) might actually make things better? But my money isn't on that.
It's pretty awesome but still does a lot of basic idiotic stuff. I was implementing a feature that required a global keyboard shortcut and asked opus to define it, taking into account not to clash with common shortcuts. He built a field where only one modifier key was required. After mentioning that this was not safe since users could just define CTRL+C for the shortcut and we need more safeguards and require at least two modifier keys I got the usual "you're absolutely right" and proceeded to require two modifier keys. But then it also created a huge list of common shortcuts into a blacklist like copy, cut, paste, print, select all, etc.. basically a bunch of single modifier key shortcuts. Once I mentioned that since we're already forcing two modifier keys that's useless it said I'm right again and fixed it.
The counter point of this idiocy is that it's very good overall at a lot of what is (in my mind) much more complicated stuff. It's a .NET app and stuff like creating models, viewmodels, usercontrols, setting up the entire hosting DI with pretty much all best practices for .net it does it pretty awesomely.
tl;dr is that training wheels are still mandatory imho
It’s such an intoxicating copyright-abuse slot machine that a buddy who is building an ocaml+htmx tree editor told me “I always get stuck and end up going to the llm to generate code. Usually when I get to the html part.” I asked if he used a debugger before that, he said “that’s a good idea”.
If boilerplate was such a big issue, we should have worked on improving code generation. In fact, many tools and frameworks exist that did this already:
- rails has fantastic code generation for CRUD use cases
- intelliJ IDEs have been able to do many types of refactors and class generation that included some of the boilerplate
I haven't reached a conclusion on this train of thought yet, though.