Hacker News
new
top
best
ask
show
job
SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations
(
www.surgehq.ai
)
20 points
by
landonxi
12 hours ago
1 comment
egillie
12 hours ago
Is this because GPT-5 hallucinates less in general?