Prompt injection can hijack the agent's reasoning, but the real damage happens when the agent then calls a tool it shouldn't — deletes a file, exfiltrates data, escalates its own permissions. The probe finds the injection vector; it doesn't tell you whether your authorization layer would have stopped what happened next.
150 probes is solid for "can the agent be manipulated?" Still leaves open "once manipulated, can it cause real harm?" — which depends on what the tool boundary looks like.
Curious if you've thought about probing tool-call authorization specifically. What scope do your injected prompts try to reach for?
It sends 150+ attack probes (prompt extraction, injection, persona hijacking, encoding tricks, etc.) at your agent and gives you a trust score from 0-100 with specific fix recommendations.
Key points:
- Works with OpenAI, Anthropic, Ollama, Vercel AI SDK, LangChain, or any HTTP endpoint
- Deterministic detection (no AI judge) — same scan twice = same results
- Python: pip install agentseal && agentseal scan --prompt "..." --model gpt-4o
- JS/TS: npx agentseal scan --prompt "..." --model gpt-4o
- CI-friendly: --min-score 75 exits with code 1 if below threshold
The core scanner (150 probes + adaptive mutations) is free and open source. Pro adds MCP tool poisoning, RAG poisoning, and behavioral genome mapping.
GitHub: https://github.com/AgentSeal/agentseal
Website: https://agentseal.org
I'd love feedback on the probe coverage and detection approach. What attacks are we missing?