2 pointsby aggeeinn2 hours ago1 comment
  • aggeeinn2 hours ago
    OP here.

    I’ve been digging into the Fahrbach/Ramalingam paper (NeurIPS 2025) on GIST. The core finding suggests Google is moving away from pure ranking toward 'max-min diversity' sampling for AI Overviews, primarily to reduce the compute cost of processing redundant tokens in RAG.

    We ran a Python simulation (code in the post) to test the exclusion radius. It seems that if a draft has high semantic overlap (cosine similarity > ~0.85) with a seed node (like Wikipedia), it gets mathematically filtered out of the selection set regardless of domain authority.

    Curious to hear if anyone working in search/retrieval is seeing this hard filtering in production yet, or if it's still just research-side.