I run Qwen 35B on my local machine daily but also over 200B params with flash-moe occasionally. In today's world, with all the open models spending a lot of money make sense if your needs a bigger then couple of people.
how is your token/s for qwen and for flash-moe? and what system you are using? and do you satisfied on them? thanks for reply!!