2 pointsby ly072 hours ago1 comment
  • ly072 hours ago
    We’ve open-sourced a RAG system we built for internal use at IncidentFox and decided to share it in case it’s useful to others.

    There’s no novel research here. The system is a careful engineering composition of existing ideas: RAPTOR-style hierarchical retrieval, knowledge graphs, hybrid BM25 + dense search, HyDE query expansion, and a reranker (Cohere), which ended up being one of the bigger quality wins.

    On benchmarks, it slightly outperforms RAPTOR on multi-hop retrieval (72.89% on MultiHop-RAG) and gets ~99% retrieval accuracy on SQuAD.

    The main effort here was turning those ideas into a well-documented, installable system that you can run and modify without pulling together a dozen repos.

    We use this internally to store and retrieve company and team knowledge. Since retrieval isn’t our competitive moat, we decided to open-source it.

    Repo: https://github.com/incidentfox/OpenRag

    Write-up with more details: https://www.incidentfox.ai/blog/how-we-beat-raptor-rag.html

    Happy to answer questions or discuss design tradeoffs.