We benchmark against 10 other search APIs on fresh news queries. The median was around 1.2s; we came in around 166ms and scored highest on answer accuracy (89% vs 84% for the next cluster).
Latency matters because agents loop. A 1.2s first call eats the budget for follow-ups — you get one shot at framing the query. At sub-250ms the agent can actually search, read, reformulate, and search again.
Measuring this stuff carefully is something I've been at for a while. My ECIR 2019 paper (linked below) was an exhaustive study of 11 index compression methods across 5 query processing algorithms on standard collections — the codebase became PISA, which a lot of IR folks still use for research. Almost ten years later, the workload has changed completely (agents, not humans), but the benchmarking discipline is the same.
ECIR 2019 paper: https://www.antoniomallia.it/uploads/ECIR19c.pdf
Pisa Engine: https://github.com/pisa-engine/pisa
Full methodology and charts for Seltz: https://seltz.ai/blog/why-we-built-seltz