They don’t crash like normal systems. Instead they generate silent failures i.e. loop on the same tool, oscillate between 2 tools, context bloat etc.
And most of the tools are debugging tools once something is broke.
Dunetrace detects these behavioural failure patterns in real time and sends slack alerts.
It’s open-source, fully self-hosted and doesn’t store raw prompts or outputs.
Would love feedback: a) Is this a real problem in your setups? b) What failures have you seen in production?
Thanks!