10 pointsby rajveerb2 hours ago2 comments
  • rajveerban hour ago
    I read through this blog post and it's timely given how close the models are to max out the benchmarks/evals.

    One thing which was not addressed but will be interesting to discuss would be benchmarks/evals that conflict.

    Are there desirable emergent behavior that might not be optimized because the evals penalize them?

  • 2 hours ago
    undefined