6 pointsby foundatron7 hours ago3 comments
  • jlongo784 hours ago
    The hardest part of autonomous pipelines like this is observability when things go sideways. Specs rarely survive first contact with real codebases intact. Worth investing heavily in session persistence and replay so you can audit exactly what the agent reasoned at each step. Being able to resume a failed run mid-conversation rather than starting cold saves enormous time. Multi-agent parallelism also compounds fast, so a grid view across simultaneous runs becomes essential pretty quickly.
  • deltaops7 hours ago
    This is exactly the problem we're tackling! We built DeltaOps (delta-ops-mvp.vercel.app) - human-in-the-loop governance for autonomous agents. You hit the nail on the head with "no human in the loop" - that's the gap. DeltaOps adds a layer where agents can work autonomously, but critical actions (deploys, code merges, spending) require human approval. Also addresses your compliance concerns - every action is logged and approved. Would love to chat about integrating governance into dark factories!
    • foundatron7 hours ago
      Cool site/ good idea. Maybe I'm underestimating it (I probably am), but I don't think it's a huge leap from what I published today and that compliant vision you're tackling.
  • guerython7 hours ago
    Curious how you are handling those guard logs and approvals in OctopusGarden?
    • foundatron7 hours ago
      Right now OctopusGarden logs every LLM call with token counts and cost, and the SQLite store records each run and iteration (spec hash, scores per scenario, generated code). So you get a full trace of what was generated, what it was tested against, and how it scored.

      For approvals, the current model is that the spec is the approval. If the spec is right and scenarios pass at 95%+ satisfaction, the code ships. There's no PR review step by design (the "code is opaque weights" philosophy).

      That said, you could totally layer approvals on top. Gate on spec changes, require sign-off before a run kicks off, or add a human checkpoint between "converged" and "deployed." The tool doesn't enforce a deployment pipeline, so that's up to your org's workflow.

      Worth noting: this is purely a hobby project at this point. It hasn't been used in any commercial setting. The guard rails and approval workflow stuff is where it would need the most work before anyone used it for real.