3 pointsby intheleantime4 hours ago1 comment
  • itay-maman3 hours ago
    Interesting writeup. The tiered retrieval approach and privacy model for group chats are well thought out.

    One thing I'd love to see: what does "actually works" mean in measurable terms? The engineering is sophisticated, but I'm curious about user-facing impact - did memory injection improve task completion or satisfaction? How often do users invoke /memory forget? What's the false positive rate on extraction?

    These systems are hard to evaluate because failure modes are subtle - the AI "knows" something but uses it awkwardly, or surfaces context that feels intrusive. Would be great to hear what metrics you're tracking to validate the complexity is paying off.

    • intheleantime3 hours ago
      Thank you and great question. Right now, feedback is qualitative only. (Surveys, feedback buttons, controlled user tests). We are trying to build AI evaluators but they suffer from the same problem when trying to evaluate whether the “right” memory was pulled.

      Still trying to find a good solution here.