5 pointsby amallia7 hours ago1 comment
  • amallia7 hours ago
    More on latency and search:

    We benchmark against 10 other search APIs on fresh news queries. The median was around 1.2s; we came in around 166ms and scored highest on answer accuracy (89% vs 84% for the next cluster).

    Latency matters because agents loop. A 1.2s first call eats the budget for follow-ups — you get one shot at framing the query. At sub-250ms the agent can actually search, read, reformulate, and search again.

    Measuring this stuff carefully is something I've been at for a while. My ECIR 2019 paper (linked below) was an exhaustive study of 11 index compression methods across 5 query processing algorithms on standard collections — the codebase became PISA, which a lot of IR folks still use for research. Almost ten years later, the workload has changed completely (agents, not humans), but the benchmarking discipline is the same.

    ECIR 2019 paper: https://www.antoniomallia.it/uploads/ECIR19c.pdf

    Pisa Engine: https://github.com/pisa-engine/pisa

    Full methodology and charts for Seltz: https://seltz.ai/blog/why-we-built-seltz