Hacker News
new
top
best
ask
show
job
Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems
(
aclanthology.org
)
21 points
by
PranoyP
2 days ago
14 comments
mlop99
2 days ago
Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
jlukecarlson
a day ago
I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
shailendra145
2 days ago
A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
2 days ago
undefined
2 days ago
undefined
ajay_shastry
a day ago
Intresting work
papz2k
2 days ago
Very interesting work.
raj_maddipati
2 days ago
Excellent work
harshv_03
2 days ago
Interesting
ankush9812
2 days ago
Nice Work
ashyash518
2 days ago
Nice work
saurabh_xen
2 days ago
Great work
quanta9
2 days ago
interesting
cs_exps
2 days ago
[dead]