To your question about what bites first: in our experience at keypost.ai, the order is usually:
1. *Auth* - OAuth token refresh edge cases, especially when agents run long tasks that span token expiry 2. *Rate limits* - not having them, then having them but too coarse (per-tool vs per-endpoint vs per-argument) 3. *Observability* - specifically, correlating agent intent with actual tool calls when debugging why something failed 4. *Sandboxing* - usually comes up after the first "oops" moment
One pattern we've found useful: separating "can this identity call this tool" (auth) from "should this specific call be allowed" (policy). They're often conflated but have different failure modes and different owners (security team vs product team).
Curious how you're handling policy in PolyMCP - is it config-driven or code-driven?
How do you do policy at keypost.ai?