10 pointsby riyajoshi6 hours ago4 comments
  • naveenprasanthv6 hours ago
    Kind of wild that we're finally getting benchmarks for AI-generated API testing. Feels like the equivalent of SWE-bench, but for finding actual bugs instead of writing code.
  • saikia_6 hours ago
    Cool launch - let me try this with our in house setup!
  • calderon_19036 hours ago
    interesting stuff
    • riyajoshi6 hours ago
      Thank you. Would really appreciate feedback on methodology or the evaluation framework!
  • paol_taja5 hours ago
    [dead]