1 pointby grigio3 hours ago1 comment
  • throwawayffffas3 hours ago
    > Note: "Benchmarks are less important than real-world tests for production adoption"

    > Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows.

    What? Make up your mind do the benchmarks matter or not?