It’s not claiming scientific precision but more of an aggregated risk indicator.
Improving explainability of that score is high on the list. I hope it helps :)
It’s more of an engineering diagnostic heuristic than a psychological metric.
The goal is practical usefulness, not theoretical rigor.
Curious to hear:
• How are people currently debugging agent continuity issues?
• Are you seeing users re-explain things often?
Also happy to analyze any sample transcripts if people want to try it.