2 pointsby ryan4rtmx4 hours ago1 comment
  • ryan4rtmx3 hours ago
    ++ As agents take on more complex projects for humans, it's important to measure how agents fulfill human intent.

    How are others benchmarking the agentic fulfillment of intent?

    I've started to explore this space with intent-bench, https://intent-bench.github.io/intent-bench