2 pointsby aggeeinn13 days ago1 comment

aggeeinn13 days ago
OP here.
I’ve been digging into the Fahrbach/Ramalingam paper (NeurIPS 2025) on GIST. The core finding suggests Google is moving away from pure ranking toward 'max-min diversity' sampling for AI Overviews, primarily to reduce the compute cost of processing redundant tokens in RAG.
We ran a Python simulation (code in the post) to test the exclusion radius. It seems that if a draft has high semantic overlap (cosine similarity > ~0.85) with a seed node (like Wikipedia), it gets mathematically filtered out of the selection set regardless of domain authority.
Curious to hear if anyone working in search/retrieval is seeing this hard filtering in production yet, or if it's still just research-side.