> Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows.
What? Make up your mind do the benchmarks matter or not?