Harness Engineering: 52 Days, One Person, 965K Lines of Code(agentsmesh.ai)

2 pointsby zyf19947 hours ago1 comment

zyf19947 hours ago
OpenAI recently described how they used AI agents to produce over a million lines of code in five months — a practice they call Harness Engineering.
I started building AgentsMesh (agentsmesh.ai) 52 days ago. 600 commits, 965,687 lines of code throughput, 356,220 lines of production code still standing. One person. The repo is fully open source and every number is verifiable via `git log`.
But the numbers aren't the point — the structure is. I used Harness Engineering to build a Harness Engineering tool.
*The Engineering Environment Sets the Ceiling*
Agent output quality depends critically on the engineering soil it works in.
The codebase follows strict DDD layering with 22 domain modules and clear boundaries. When an agent adds a feature, it knows where things go. Cross-stack naming is fully aligned — `domain/loop/`, `service/loop/`, `components/loops/` — so directory structure alone serves as documentation.
Most counterintuitively: tech debt gets amplified exponentially by agents. A temporary compromise becomes a "precedent" the agent systematically reuses. I stopped multiple times to clean up tech debt — not for aesthetics, but to maintain signal purity. This maintenance cost is unique to agent-collaborative development.
Strong typing (Go + TypeScript + Proto) shifts errors from runtime to compile time. A four-layer feedback loop — compilation, unit tests, e2e tests, CI pipeline — gives agents fast, precise error signals. Worktrees with automatic port isolation enable native parallelism.
The conclusion: the codebase itself is the agent's most important context. No separate RAG or memory files needed. The repository is the context.
*Cognitive Bandwidth Is a Real Constraint*
Around day 5, I hit a wall at ~50,000 lines of daily throughput. Three parallel agent workstreams was the limit for sound architectural decisions. A fourth noticeably degraded quality.
The breakthrough: delegate the decision-making itself. Let agents coordinate agents — move from supervising agents to supervising the system that supervises them. That's how Autopilot mode was born, and the core design intent of AgentsMesh.
*When Experimentation Costs Collapse*
The Relay architecture wasn't designed on a whiteboard. Three Pods crashed the Backend; I watched it fail, rebuilt, and iterated. Discovery to fix: under two days — versus two weeks of traditional architecture discussion.
AI doesn't change the speed of writing code; it changes the cost structure. When iteration is cheap enough, experiment-driven development produces better architectures than design-driven development.
*Three Engineering Primitives*
52 days of practice converged into three primitives:
*Isolation.* Every agent needs its own workspace — a hard prerequisite. AgentsMesh implements this with Pods: each agent runs in its own Git worktree and sandbox, with all necessary context prepared.
*Decomposition.* Agents excel with clear scope and definition of done. Tickets handle one-shot work (features, bugs, refactoring); Loops handle recurring tasks (daily tests, scheduled builds) via Cron expressions.
*Coordination.* Agents aren't bound by human specialization — the same agent can code, document, test, and review. Coordination needs communication and permissions, not role-based hierarchy. Channels provide shared collaboration spaces; Bindings grant point-to-point permissions like `terminal:read` and `terminal:write`, letting supervisor agents directly observe and control workers.
OpenAI calls these Context Engineering, architectural constraints, and entropy management. Different names, same problem.
*Open Source*
Harness Engineering is a discipline, not a product feature. Code is on GitHub — https://github.com/AgentsMesh/AgentsMesh.