2 pointsby YounElh2 days ago1 comment
  • YounElh2 days ago
    Google launched a Priority inference tier for the Gemini API. It costs 75-100% more and promises "ultra-low latency." I benchmarked it.

    Text — Standard: 553ms avg. Priority: 563ms avg.

    Image — Standard: 1301ms avg. Priority: 1468ms avg.

    Video — Standard: 2572ms avg. Priority: 2556ms avg.

    So you pay double for the privilege of identical or worse performance.

    Bonus features:

    - Google's own docs show the wrong parameter format - Priority gives you 0.3x the rate limit of standard (fewer requests!) - If priority capacity is exceeded, you silently get standard tier anyway - The response header that confirms your tier is bugged on streaming endpoints - No latency SLAs, no capacity thresholds, no metrics - Why does it cost 75-100% more? Why the range? Why publish it with no latency numbers?

    Launched April 1st. I'm choosing to believe the date was intentional.