There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.
Scenario 5, test 1 72% -> 22%
https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...