SWE-bench will hit 90% this year(fabraix.com)

6 pointsby asfsf234238 hours ago1 comment

upmind5 hours ago
Maybe unpopular opinion but I think at this point SWE-Bench has done its part and we need a new benchmark because Gemini being on/near the same level as Claude is obviously wrong
- amazingamazing5 hours ago
  I use both and think they’re comparable. AMA.
- lern_too_spel4 hours ago
  Gemini at the same level as Claude is believable. Gemini CLI is not at the same level as Claude Code.