1 pointby agentura2 hours ago1 comment
  • agentura2 hours ago
    A prompt tweak, model swap, or context change can silently shift behavior — wrong tone, dropped policy details, different tool routing. Standard unit tests pass. You / users notice later.

    Agentura adds behavioral evals to your CI pipeline. On every PR, it runs your agent against expected outputs, compares scores to your main branch baseline, and shows you exactly which cases regressed before you merge. 100% Free, Open Source.

    Try it locally:

    npx agentura@latest init npx agentura@latest run --local

    Live playground (online): https://playground.agentura.run

    Website: https://agentura.run

    Repo: https://github.com/SyntheticSynaptic/agentura

    Welcome all comments + feedback