21 pointsby PranoyP2 days ago14 comments
  • mlop992 days ago
    Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
  • jlukecarlsona day ago
    I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
  • shailendra1452 days ago
    A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
  • 2 days ago
    undefined
  • 2 days ago
    undefined
  • ajay_shastrya day ago
    Intresting work
  • papz2k2 days ago
    Very interesting work.
  • raj_maddipati2 days ago
    Excellent work
  • harshv_032 days ago
    Interesting
  • ankush98122 days ago
    Nice Work
  • ashyash5182 days ago
    Nice work
  • saurabh_xen2 days ago
    Great work
  • quanta92 days ago
    interesting
  • cs_exps2 days ago
    [dead]