Faced similar issues. Ended up adding a agentic tool call layer on the top to retrieve the nearby chunks to handle a case where a relevant answer was only partially available in a chunk (like a 7 step instruction in which only 4 were available in a chunk). It worked ok.
The issue was that most of these steps were long (above 512 tokens). So the typical chunk window wouldn't capture the full steps. We added a tool calling capability by which LLM can request nearby chunks of a given chunk. This worked well in practice, but burned more $$.