Curious how the trust score works in practice. Is it purely automated test results, or do you plan to incorporate usage signals over time (uptime, response quality)?
But you're right that automated spec compliance only tells part of the story. The roadmap includes usage signals, uptime monitoring, response latency tracking, and community ratings from developers who've actually integrated with an agent. The spec tells you if an agent CAN work. Usage data tells you if it DOES work.
The profile pages are designed with that in mind, test history over time already shows trends, and adding real world signals is the natural next layer.