5 pointsby miohtamaa month ago2 comments

Alifatiska month ago
The evals look impressive, we'll see how it performs on Artificial analysis. Looks like this is another chinese lab who joins the race. Better for the consumers!
mohsen1a month ago
i think this is a little unfair, its comparing a model that is optimised for pass@2 and self improving its output compared to the other models, just test time scaling in a way