A Debate Tournament for LLMs(pavursec.com)

3 pointsby cloudlandsdev5 hours ago1 comment

Overall results from the blog post:

  Rank  Model                   ELO   Win%
  ----------------------------------------
   1    GPT 5.2                 1480  85%
   2    Gemini 3 Pro            1472  74%
   3    Claude Opus 4.6         1389  72%
   4    Claude Opus 4.5         1360  67%
   5    Grok 4.1 Fast           1349  62%
   6    GPT OSS 120B            1322  59%
   7    Gemini 3 Flash Preview  1316  54%
   8    Claude Sonnet 4.5       1265  54%
   9    Gemini 2.5 Flash Lite   1257  44%
  10    Mistral Large 3         1211  41%
  11    DeepSeek V3.2           1194  41%
  12    Meta Maverick           1065  26%
  13    Meta Llama Scout         999  13%
  14    Mistral Small 3          996  10%