The thing that prompted this: I kept having conversations with teams where the first question was "how do we write better skills files?" and I kept finding that the issues were upstream in our error messages, CLI surface, and so on. I increasingly feel that Skill files is your "last resort."
The Workbench experiment in the post is admittedly simplified to unpack the mental model; you would probably set up evals etc to actually test it, which we have started doing at Sanity as well.
Happy to answer questions and channel more of our experience with agents (heh) and designing for them.