Chunk size is query-dependent: a simple multi-scale approach to RAG retrieval(www.ai21.com)

4 pointsby yuvalbelfer7 hours ago1 comment

graphitout7 hours ago
optimal chunk size is strongly query-dependent - very true.
Faced similar issues. Ended up adding a agentic tool call layer on the top to retrieve the nearby chunks to handle a case where a relevant answer was only partially available in a chunk (like a 7 step instruction in which only 4 were available in a chunk). It worked ok.
- Djamba6 hours ago
  Interesting. Can you elaborate a bit more please
  - graphitout3 hours ago
    The RAG was setup on a bunch of documents, most of them were manuals containing steps about measurements, troubleshooting, and replacing components of industrial machines.
    The issue was that most of these steps were long (above 512 tokens). So the typical chunk window wouldn't capture the full steps. We added a tool calling capability by which LLM can request nearby chunks of a given chunk. This worked well in practice, but burned more $$.