Show HN: Nyx – multi-turn, adaptive, offensive testing harness for AI agents(fabraix.com)

17 pointsby zachdotai3 hours ago7 comments

ibrahim-fab3 hours ago
Nice. Definitely true that evaluating agents behavior is by far the toughest part of building them. Also most eval cases are added without thought and not maintained when agent behaviour updates. Interesting approach.
- zachdotai2 hours ago
  We wrote some thoughts on static vs. dynamic evals and how it relates to understanding the security posture of an AI system. Static security evals no longer carry the signal they used to. A one-shot pass/fail tells you almost nothing about real-world risk.
  Would love your thoughts on this: https://fabraix.com/blog/adversarial-cost-to-exploit
AmineAfia2 hours ago
Can I integrate this in my CI/CD pipeline?
- zachdotai2 hours ago
  Yes! The docs can be found here: https://docs.fabraix.com
azhassan1an hour ago
Where do you draw the line between this and coverage-guided fuzzing? A lot of what you describe (parallel, adaptive, finds edge cases in unbounded input spaces) maps cleanly onto the fuzzing playbook, which has decades of theory behind it - corpus management, mutation scheduling, minimization of found crashes.
Are you borrowing from that literature or treating agent testing as a distinct problem? Feels like there's real transfer available if you're not already pulling from it.
- zachdotai43 minutes ago
  [dead]
aacudad3 hours ago
I am not sure this will work, seems like added complexity to something simple
an hour ago
undefined
ljhasdr3 hours ago
i need to try this before mythos comes to attack our service. thanks!
adam_rida3 hours ago
Very cool!