The LLM secret predictability angle is something I’m still digging into and will be a separate article. There’s a lot more to it than I could cover here.
Genuinely curious: for anyone shipping vibe-coded projects, are you actually running any kind of security check before it goes live? Prompting the AI for a review, using a scanner, doing it manually, or just crossing your fingers? And if you are using an agent workflow for it, what does that look like? Any specific agent skills or tools you’ve found useful versus just adding noise?
This is how I go about ensuring there is little to no chaos (your mileage may vary based on project size and characteristics): - Plan your project manually, do not outsource thinking to the LLM. This includes being intentional about architecture, tech-stack, dependencies, etc.. - I have planning, orchestrating, coding, and reviewing agents. These should be self-explanatory, but there's a catch: the workflow is automated. OpenCode allows you to define "subagents" which can be called by "primary" agents. I will write a detailed Gitlab issue that my planning agent can fetch and read. It will create a detailed resolution plan that I can point the orchestration agent to. The orchestrator then delegates implementation to one or more coding agents simultaneously. Results are in turn delegated to reviewer agents. If the reviewer agents don't complain, then the results are ready for human review in an MR. - Changes that pass all review are documented in the project spec. E.g., if new modules are added that require an auth guard pattern implementation that is already documented in the spec, they will be listed as relevant sites for that auth guard pattern, etc..
I feel like the LLM agents have been more thorough and consistent than I could have been without them. This goes for refactors too: Since the entire project is essentially mapped out in the spec.md file(s), it's hard for the agent to miss a relevant site in the code. Human review is key. Don't merge code you don't understand.
The bit I’d push on: do your reviewer agents catch logic errors… things like a double negative auth check or a race condition in a payment flow. Those usually pass a check because code looks intentional and clean. Curious whether your reviewers are prompted specifically for security logic or more for spec conformance?
“Don’t merge code you don’t understand” is the right closer. Most setups don’t force that discipline cause people dont have the knowledge :)