My old threadripper pro was seeing about 15tps, which was quite acceptable for the background tasks I was running.
Bullet journaling is neat, but I'm far too whacky with my notes to stick to that kind of structure.
I have various other structures I implement, but they're just hodge podges of things.
>Most tasks don't. This repo helps you figure out which ones.
About a year ago I was testing Gemini 2.5 Pro and Gemini 2.5 Flash for agentic coding. I found they could both do the same task, but Gemini Pro was way slower and more expensive.
This blew my mind because I'd previously been obsessed with "best/smartest model", and suddenly realized what I actually wanted was "fastest/dumbest/cheapest model that can handle my task!"
I map them by task type:
Tiny (<3B): Gemma 3 1B (could try 4B as well), Phi-4-mini (Good for classification). Small (8B-17B): Qwen 3 8B, Llama 4 Scout (Good for RAG/Extraction). Frontier: GPT-5, Llama 4 Maverick, GLM, Kimi
Is that what you meant?
I haven't tested it extensively but I found that when I used Claude Code with it, it was reasonably fast (but actual Claude was way faster), but when I tried to use the API itself manually, it would be super slow.
My guess would be think they're filtering the traffic and prioritizing certain types. On my own script, I ran into a rate limit after 7 requests!
Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls.