Hacker News
new
top
best
ask
show
job
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks
(
arxiv.org
)
1 point
by
FiberBundle
9 hours ago
1 comment
cestivan
8 hours ago
[dead]