6 pointsby jflynt767 hours ago1 comment
  • evil-olive6 hours ago
    > GroundEval is built around that question...

    > This is the same distinction GroundEval makes for question answering agents.

    > GroundEval treats agent behavior as something that can be tested against a state contract.

    > That is the class of failure GroundEval is designed to catch.

    this is an ad shaped like a blog post

    • jflynt766 hours ago
      For what it's worth, I didn't describe it as anything; just posted the link. It's a paper with open code, no product behind it.