Show HN: ToolGuard – Pytest for AI agent tool calls

1 pointby Heer_J4 hours ago1 comment

calebjang4 hours ago
This is interesting because a lot of agent failures happen before any “reasoning” issue shows up.
If the tool boundary itself is unstable — wrong field names, wrong types, missing required values — the rest of the stack doesn’t really matter.
Deterministic fuzzing for tool calls seems especially useful because it gives you a way to test execution reliability without depending on another model in the loop.
- Heer_J2 hours ago
  Totally spot on, Caleb! That was exactly the frustrating realization that led me to build this.
  I've spend so much time obsessing over which model is smarter or tweaking our system prompts, but if a simple None value causes a backend TypeError that crashes the whole runtime, none of that reasoning truly matters. The agent is basically dead in the water.
  By fuzzing the execution environment deterministically first, we can guarantee the "hands" of the agent work perfectly before we even bother testing the "brain". Plus, it runs in milliseconds during CI/CD instead of waiting for expensive API calls.