Hacker News
new
top
best
ask
show
job
DeepSWE Measuring frontier coding agents
(
deepswe.datacurve.ai
)
2 points
by
e2e4
4 hours ago
1 comment
e2e4
4 hours ago
gpt-5.5xhigh leading benchmark, coincides with my recent experience. I've been opus 4.7 user but it burns tokens so quickly, so gave gpt-5.5xhigh (via codex) a try, quality was similar (if not better), and tokens lasted a lot longer.