Input tokens are really expensive here, relative to their other models & the market rate for input tokens for GLM-4.7. $2.25/M tokens is ~4x what most charge, is ~3x their next most expensive model Llama-3.3-70b. It's also advertised as half as fast as Llama. Output tokens are a little more expensive than market at $2.75 vs ~$2, but really not bad.
Cerberas is getting rid of their Qwen3-235B at the end of the month. There's a very affordable GPT OSS 120B that's incredibly fast and cheap, 3000t/s, $0.35/$0.75M I/O! It'd be great to see something like MiniMax, which is supposedly very cheap to run on GPU, if that can be ported.