Solid problem to be working on. Caching for RAG/agents is one of those areas where everyone hits the same walls and there isn't really a clean answer yet. Well done!
Seems interesting. How does it integrate in a RAG architecture code-wise? Does it have an SDK or do you just use the OpenAI module and then change some parameters?