2 pointsby ashmawy2 hours ago1 comment
  • ashmawy2 hours ago
    Hi HN — I built Trajectly, a tool for deterministic regression testing of AI agents.

    Problem: agent “evals” are often flaky (network, time, tool nondeterminism, model drift), so it’s hard to tell if a change actually broke behavior.

    What Trajectly does:

    records an agent run once (inputs, tool calls, outputs)

    replays it deterministically offline as a test fixture (so CI is stable)

    checks a TRT “contract” (allowed tools/sequence, budgets, invariants, etc.)

    when something breaks, it pinpoints the earliest violating step and can shrink the run to a minimal counterexample

    You can try it locally (no signup):

    pip install trajectly

    run one of the standalone demos:

    procurement approval agent demo

    support escalation agent demo (or clone the main repo and run the GitHub Actions example)

    Repo: https://github.com/trajectly/trajectly

    I’m around to answer questions. I’d love feedback on:

    what contract checks would be most useful in real agent deployments?

    integrations you’d want first (LangGraph / LangChain / custom tool runners)?

    whether the “shrink to minimal failing trace” output is understandable.