1 pointby sjmaplesec16 hours ago2 comments

sjmaplesec16 hours ago
There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.
Scenario 5, test 1 72% -> 22%
https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...
sjmaplesec16 hours ago
Link to all the review scans is here - mostly in the 50-70% range https://tessl.io/registry/skills/github/googleworkspace/cli