We built Lucy, a trace-debugging tool inside vLLora (our open-source LLM observability stack).
The problem: Debugging AI agents is painful. A single failure often hides inside a trace with 200+ spans—tool calls, retries, partial outputs, and silent failures. Even with tracing, finding the actual root cause (e.g. a tool schema mismatch or a contradictory system prompt) usually means manually scanning logs for 20+ minutes.
What Lucy does: Lucy lets you query your traces using natural language. You can ask things like “Why did this agent loop?” or “Which step was slow?” and it inspects the span tree to surface likely causes. For example, it can flag:
- Tool schema mismatches (e.g. hallucinated or invalid arguments)
- Prompt contradictions (e.g. system instruction conflicts with user intent)
- Silent failures (e.g. context truncation or max-token exits)
It’s early and still in beta.
We’d love to hear how you’re debugging agent failures today—logs, replay, evals, custom scripts, or something else.
Link to Repo: https://github.com/vllora/vllora