1 pointby alexli18072 hours ago1 comment
  • alexli18072 hours ago
    Hi HN, I built Binex because debugging multi-agent pipelines was driving me crazy.

      The problem: you chain 5 agents together, something in the middle breaks or gives a weird output,
      and you have no idea what happened. Logs are scattered, there's no replay, and swapping a model
      means rewriting code.
    
      Binex lets you define agent pipelines in YAML and records everything per node. After a run you can:
    
      - `binex trace` — full execution timeline with latency per node
      - `binex debug` — post-mortem with inputs, outputs, and errors
      - `binex replay --agent node=llm://other-model` — re-run swapping one model
      - `binex diff run_a run_b` — compare two runs side-by-side
    
      It uses LiteLLM under the hood so it works with Ollama, OpenAI, Anthropic, and 6 more providers.
      Also supports local Python agents, remote agents via A2A protocol, and human-in-the-loop approval
      gates.
    
      Everything is stored locally (SQLite + filesystem), no cloud dependency.
    
      pip install binex
    
      Happy to answer any questions about the architecture or design decisions.