I know that Deepseek as a model is easier to have inference for, but I am not sure about how much pre-training as helped.
It's my understand that GLM 5.1 or to my personal experience Kimi K2 are some nice open source models so I am interested to hear your thoughts on it and why you picked deepseek for the fine-tuning instead.
I picked Deepseek 3.2 because I was impressed with how they developed r1 and have continued to be satisfied with their capability improvements as well as algorithmic. I think they cut their cost in half recently because of the efficiency gains which was a big factor