2 pointsby royalfig20 hours ago1 comment
  • royalfig20 hours ago
    Goodhart's Law states, "When a measure becomes a target, it ceases to be a good measure."

    This applies to our favorite LLM models, too, meaning that as they optimize for scoring high on benchmarks, how do we know that's also good for real-world performance like accuracy, latency, or user engagement?

    A/B testing AI models helps give you real feedback on how your LLM and its configuration is performing.