The main reason I built this is because I noticed many AI applications repeatedly send very similar prompts to LLM APIs. That means developers end up paying for the same reasoning multiple times.
Nexus Gateway tries to solve this using semantic caching. Instead of only checking for exact prompt matches, it detects semantically similar prompts and can serve cached responses when appropriate.
Current features include: • Multi-model support (OpenAI, Gemini, Anthropic, Llama) • BYOK (Bring Your Own Key) • Semantic caching to reduce repeated API calls • Model routing
I'm currently also working on: • PII protection layers • Sovereign AI support for regulated industries like banks and hospitals
My goal is to build an infrastructure layer that helps teams reduce LLM costs and improve latency without changing much of their existing code.
I’d love feedback from the community — especially around: • semantic caching strategies • similarity thresholds • enterprise security requirements
Happy to answer any technical questions.