5 pointsby miohtama14 hours ago2 comments
  • Alifatisk11 hours ago
    The evals look impressive, we'll see how it performs on Artificial analysis. Looks like this is another chinese lab who joins the race. Better for the consumers!
  • mohsen18 hours ago
    i think this is a little unfair, its comparing a model that is optimised for pass@2 and self improving its output compared to the other models, just test time scaling in a way