Good writeup. The gap you're describing: skill installer copies the whole directory, test files run outside the agent context, so the agent's credential store becomes the actual attack surface. Scanners focus on what the agent does at runtime, but if the test file executes with the same environment, it can reach whatever keys were loaded.
The fix most people overlook: the agent shouldn't hold long-lived keys at all. Short-lived, scoped credentials mean even if a bundled test file runs and tries to exfiltrate something, it gets a credential that expires in minutes. We cover the pattern here: https://apistronghold.com