We don't try to keep everything in context. Instead, we maintain a lightweight "memory index" (summaries) that's always present, and use LLM-as-a-judge to decide when to reload full content. This mirrors how humans work: we remember the gist of documents, and go back to re-read when we need details.
This approach trades 1 extra LLM call (for recall detection) for significant context window savings while preserving the ability to access full details when needed.