This is why I've been focused on boundary visibility. Agents are opaque until they hit real tools - and if you can't see what's actually being sent/received at each boundary, you can't detect manipulation.
We built toran.sh to provide that inspection layer - read-only proxies that show the actual wire-level request/response. Doesn't prevent attacks, but makes them visible.
Curious what detection mechanisms you're recommending alongside the attack framework?