10 pointsby riyajoshi6 hours ago4 comments

naveenprasanthv6 hours ago
Kind of wild that we're finally getting benchmarks for AI-generated API testing. Feels like the equivalent of SWE-bench, but for finding actual bugs instead of writing code.
saikia_6 hours ago
Cool launch - let me try this with our in house setup!
calderon_19036 hours ago
interesting stuff
- riyajoshi6 hours ago
  Thank you. Would really appreciate feedback on methodology or the evaluation framework!
paol_taja5 hours ago
[dead]