21 pointsby PranoyP3 months ago14 comments
  • mlop993 months ago
    Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
  • shailendra1453 months ago
    A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
  • jlukecarlson3 months ago
    I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
  • 3 months ago
    undefined
  • 3 months ago
    undefined
  • papz2k3 months ago
    Very interesting work.
  • ajay_shastry3 months ago
    Intresting work
  • raj_maddipati3 months ago
    Excellent work
  • harshv_033 months ago
    Interesting
  • ankush98123 months ago
    Nice Work
  • ashyash5183 months ago
    Nice work
  • saurabh_xen3 months ago
    Great work
  • quanta93 months ago
    interesting
  • cs_exps3 months ago
    [dead]