1 pointby metawake6 hours ago1 comment
  • reena_signalhq6 hours ago
    Love this concept! Debugging RAG retrieval is such a pain point right now.

    Quick question: Does this work with any vector database (Pinecone, Weaviate, etc.) or is it specific to certain backends?

    Also curious how you're measuring relevance - are you using LLM-as-judge or some other scoring method?

    This could be really useful for optimizing chunk size and overlap settings!

    • metawake6 hours ago
      Thanks! To answer your questions:

      *Backends:* Currently supports Qdrant, pgvector, Weaviate, Chroma, and Pinecone. Adding more is straightforward since it's just implementing a Store interface. Let me know if I missed some good backend!

      *Relevance scoring:* No LLM-as-judge — that's intentional. RagTune focuses on retrieval-layer metrics only:

      - Vector similarity scores (what the DB returns) - Recall@K, MRR against your golden set - Score distribution diagnostics

      The philosophy is: debug retrieval separately from generation. If your retrieval is broken, no amount of prompt engineering will fix it.

      For chunk size/overlap optimization — exactly the use case! `ragtune compare --chunk-sizes 256,512,1024` lets you see the impact directly.

      Happy to hear feedback if you try it!