2 pointsby sorenbsa month ago1 comment

jauntywundrkinda month ago
I'm mainly using GLM-4.7 these days, because of a subscription that seemed like a pretty good deal (fingers crossed Z.ai / Zhipu survive the year or this will suck a little bit). It was nice and fast over the holidays, and it's much slower now. This is cool to see, but man, but I'm pretty cost conscious and I don't think I'll reach for this often. But I hope it's an option I can reach for!!
Input tokens are really expensive here, relative to their other models & the market rate for input tokens for GLM-4.7. $2.25/M tokens is ~4x what most charge, is ~3x their next most expensive model Llama-3.3-70b. It's also advertised as half as fast as Llama. Output tokens are a little more expensive than market at $2.75 vs ~$2, but really not bad.
Cerberas is getting rid of their Qwen3-235B at the end of the month. There's a very affordable GPT OSS 120B that's incredibly fast and cheap, 3000t/s, $0.35/$0.75M I/O! It'd be great to see something like MiniMax, which is supposedly very cheap to run on GPU, if that can be ported.