3 pointsby gauri19025 hours ago2 comments
  • gauri19025 hours ago
    Hey all, we just released our work on self-improving AI systems at NeoSigma. We show our auto agent harness improvement system on Tau3 benchmark tasks where the agent’s score improves from 0.56 to 0.78 (~40% jump) while mining failures and auto maintaining live evals. We got a lot of responses from people wanting to try the self-improving loop on their own agent, so we open-sourced our setup. Releasing auto-harness: an open source library for our self improving agentic systems with auto-evals. Connect your agent and let it cook over the weekend. Watch it go brrrr!! Link to the article here: https://x.com/gauri__gupta/status/2040251170099524025
  • deadinator4 hours ago
    Point it at your agent. Leave it running. Come back to a better agent with evals!!