Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.
We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.
Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).
The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js