Most agent workflows I’ve seen don’t have any real evaluation layer. People test manually or rely on prompt tweaks.
I wanted something closer to how we treat backend systems, where you can run tests before shipping.
Eval Studio:
* scans your repo and detects likely agents * generates eval datasets based on your agent * runs tests locally against your implementation * surfaces failures and behavioral gaps
It doesn’t require deploying anything — it runs directly on your local setup.
Get your API key and try it: dutchmanlabs.com
Would really appreciate feedback, especially from people building LLM apps or agent workflows.