2 pointsby thesarsour5 hours ago1 comment
  • thesarsour5 hours ago
    We built our eval studio tool, equipped with a CLI tool for testing AI agents locally.

    Most agent workflows I’ve seen don’t have any real evaluation layer. People test manually or rely on prompt tweaks.

    I wanted something closer to how we treat backend systems, where you can run tests before shipping.

    Eval Studio:

    * scans your repo and detects likely agents * generates eval datasets based on your agent * runs tests locally against your implementation * surfaces failures and behavioral gaps

    It doesn’t require deploying anything — it runs directly on your local setup.

    Get your API key and try it: dutchmanlabs.com

    Would really appreciate feedback, especially from people building LLM apps or agent workflows.