It's now collected 29,502 public sessions and 334,589 individual model responses, and we just published all the stats for everyone to see, updating them daily.
A few highlights:
- In multi-round debates, Claude Opus 4.7 convinced other models to flip their vote almost 3K times, the most of any model. Gemini 3.1 Pro came in second at 2.1K - Most used model is Gemini 3.1 Pro at 25K sessions, with GPT-5.4 second at 21K. - Grok 4.1 Fast held its position 88.7% of the time, the highest conviction rate of all models. Probably not surprising.
It's been quite amazing to see all the questions and feedback since launch. Initially the only mode was structured answers (vote yes/no or pick from custom options).
Based on feedback we've added an open questions mode where models answer freely and a roundtable chat where you can join the debate and follow up with individual models to challenge their reasoning.
If you want to give the roundtable a try, it's free to use until community credits run out. All models routed via my startup Opper. Happy to dig into specifics or make more data available if interesting.