SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations(www.surgehq.ai)

22 pointsby landonxi5 months ago1 comment

egillie5 months ago
Is this because GPT-5 hallucinates less in general?