Geminis "Priority Inference" tier: 75-100% more expensive, same or worse latency(twitter.com)

2 pointsby YounElh2 days ago1 comment

YounElh2 days ago
Google launched a Priority inference tier for the Gemini API. It costs 75-100% more and promises "ultra-low latency." I benchmarked it.
Text — Standard: 553ms avg. Priority: 563ms avg.
Image — Standard: 1301ms avg. Priority: 1468ms avg.
Video — Standard: 2572ms avg. Priority: 2556ms avg.
So you pay double for the privilege of identical or worse performance.
Bonus features:
- Google's own docs show the wrong parameter format - Priority gives you 0.3x the rate limit of standard (fewer requests!) - If priority capacity is exceeded, you silently get standard tier anyway - The response header that confirms your tier is bugged on streaming endpoints - No latency SLAs, no capacity thresholds, no metrics - Why does it cost 75-100% more? Why the range? Why publish it with no latency numbers?
Launched April 1st. I'm choosing to believe the date was intentional.