Composite scores range from ~39 to ~98. The spread is larger than I expected. Tool abuse detection is weak across the board. Several providers that catch >95% of prompt injections miss most unauthorized tool calls. Over-refusal is under-tested in the industry. One provider flags 37% of benign requests. Provenance verification (can the tool tell a real approval chain from a fabricated one?) is nearly absent outside provenance-native approaches.
Disclosure: I built and maintain this benchmark. I also run https://agentguard.co/, which is one of the tested providers. AG is included in results, tested via a commit-reveal protocol with Ed25519 signatures (code in src/protocol/) rather than the standard open adapter path. I know "vendor runs own benchmark" raises eyebrows — that's why the entire corpus, scoring code, and methodology are open source under Apache 2.0. Run it yourself, verify the results, file issues if something seems off. The test corpus, adapters, and scoring are designed to be extended. PRs for new provider adapters, novel attack test cases, and methodology improvements are welcome. Repo: https://github.com/doronp/agentshield-benchmark Leaderboard: https://doronp.github.io/agentshield-benchmark/