Wanted to share a project I've been working on aimed at helping solve the context problem called Memoryport. It works across LLM providers/apps, stores conversations locally, is fully OSS, and enables anyone to keep track of their memories over time. I tested mine with 500M tokens of context space (note: not the same as a context window) and only added ~300ms of latency to the inference session. I also built out an open spec called AMP that standardizes the communication protocol for how memory systems should interact with LLMs (see repo).
More than happy to answer any questions and hope you all find this as useful as I have.