1 pointby rajvanshyr2 hours ago1 comment
  • rajvanshyr2 hours ago
    Built this after spending too many hours manually debugging why our AI agent kept failing on real world examples.

    evalfix runs your evals, identifies what's failing, and uses a multi-agent loop to diagnose and fix your prompt — then verifies the fix doesn't break anything that was passing.

    It also ships with an SDK to capture production failures and automatically pull them into your eval suite.