Show HN: I built a self-diagnostic health check for AI agent memory(github.com)

1 pointby sukinai5 hours ago1 comment

sukinai5 hours ago
AI memory is starting to behave less like a static notes file and more like a runtime dependency. If an agent depends on memory to retrieve prior decisions, project context, instructions, or compressed knowledge, then the quality of that memory directly affects the quality of the agent’s output. The problem is that memory systems often do not fail loudly. They degrade quietly through stale entries, duplicate memories, broken sync with instruction files like CLAUDE.md, missing logs, weak key structure, or oversized context that reduces retrieval quality.
This release came from a simple systems question: if we monitor infrastructure, logs, APIs, and databases, should memory also have observability? I wanted to experiment with a health check layer that treats memory as something inspectable and maintainable rather than a black box. The goal is not just to store context, but to detect when memory becomes unreliable, noisy, or inefficient before that degradation starts affecting the agent.
- guerython5 hours ago
  Love this direction. Memory failures are usually silent until quality drops, so treating memory as an SLO surface makes sense.
  One metric that helped us was retrieval precision@k against a small gold set of "must-return" facts from prior sessions. Drift there showed degradation earlier than latency/token metrics.
  If you haven’t already, adding write-amplification + duplicate-rate tracking is useful too. We found many systems look healthy while gradually filling with near-duplicate notes that poison recall.
  - sukinai3 hours ago
    This is super useful. I really like the idea of treating memory as an SLO surface rather than just a storage layer.
    Retrieval precision@k against a small gold set is a very strong suggestion. That feels like a much better early warning signal than just latency or token usage, because those can look fine while memory quality is quietly degrading.
    Write amplification and duplicate-rate tracking also make a lot of sense. Near-duplicate buildup is exactly the kind of thing that makes a memory system look healthy on the outside while slowly poisoning recall underneath.
    I have basic duplicate detection in /nemp:health, but I haven’t framed it yet in terms of retrieval quality metrics the way you described. That’s a really good direction. Thank you