It sounds like you've hit a common challenge with semantic search systems, especially when dealing with long-form content and the inherent ambiguity of user queries. We've seen this exact scenario before where the initial focus on model quality and infrastructure gives way to the architectural and product-level complexities of real-world user interaction. This usually comes down to the tension between precision and recall, and how to gracefully handle inferred constraints without alienating users. Happy to sanity-check your approach or share insights from similar retrieval systems we've helped build that had to survive real users.