Thanks for open sourcing and sharing, excited to try this out!!
We think we stand out from our competitors in the space because we built first for the enterprise case, with consideration for things like data governance, acceptable use, and data privacy and information security that can be deployed in managed easily and reliably in customer-managed environments.
A lot of the products today have similar evaluations and metrics, but they either offer a SAAS solution or require some onerous integration into your application stack.
Because we started w/ the enterprise first, our goal was to get to value as quickly and as easily as possible (to avoid shoulder-surfing over zoom calls because we don't have access to the service), and think this plays out well with our product.
We based our hallucination detection on "groundedness" on a claim-by-claim basis, which evaluates whether the LLM response can be cited in provided context (eg: message history, tool calls, retrieved context from a vector DB, etc.)
We split the response into multiple claims, determine if a claim needs to be evaluated (eg: and isn't just some boilerplate) and then check to see if the claim is referenced in the context.