Here is the smallest set of constraints I have seen that changes outcomes materially: Roles, not vibes. Define narrow agent roles with hard boundaries.
Example of a specialized "CodeFactory" configuration:
* Architect: Defines ADRs (Architecture Decision Records) and contracts. Prohibited from writing implementation code.
* API Generator: Implements backend endpoints strictly from OpenAPI specs. No business logic in the API layer; validation is generated from the contract.
* Worker Generator: Implements asynchronous processing based on message contracts. Focus: idempotency and statelessness.
* Frontend Generator: Creates UI components based on UX specs and API contracts. Mandatory handling of three states: loading, error, and empty.
* DS (Data Scientist): Constructs ML pipelines and trains models. Abstracting complexity into dedicated modules/clients.
* DevOps: Manages infrastructure, deployment scripts, and rollback playbooks. Focus: idempotency of operations and post-deploy health checks.
* Critic: The automated Quality Gate. Prohibited from "fixing" code. Only issues verdicts (Pass/Fail) based on linters, tests, and contract parity.
Contracts-first
Before code, freeze interfaces: OpenAPI for HTTP, message contracts for async events, DB schema and invariants. This prevents "LLM-driven interface drift" where the agent silently changes request shapes because it feels nice.
* Read-evidence, not trust - Ban "I assume this file contains…" behavior. Require file reads before edits and require the agent to cite what it saw: which functions exist, which types exist, where the integration points are. It is about forcing contact with reality.\
* Determinism over cleverness - Prefer minimal diffs, explicit types, explicit invariants, explicit error paths, idempotent workers, no "magic" implicit behavior. LLMs love cleverness because cleverness sounds correct. Determinism survives maintenance.
* Hard gates at the boundary - A "critic" role runs linters, unit tests, integration tests, contract validation, migration checks, deploy health checks. If it fails, the pipeline stops. Not "warns". Stops.
* Explicit handoffs - Every step ends with: what changed, what assumptions were made, what is now the source of truth, who owns the next step.