Arcee Trinity Mini: US-Trained Moe Model(www.arcee.ai)

70 pointsby hurrycane2 months ago7 comments

halJordan2 months ago
Looks like a less good version of qwen 30b3a which makes sense bc it is slightly smaller. If they can keep that effiency going into the large one it'll be sick.
Trinity Large [will be] a 420B parameter model with 13B active parameters. Just perfect for a large Ram pool @ q4.
davidsainez2 months ago
Excited to put this through its paces. It seems most directly comparable to GPT-OSS-20B. Comparing their numbers on the Together API: Trinity Mini is slightly less expensive ($0.045/$0.15 v $0.05/$0.20) and seems to have better latency and throughput numbers.
htrp2 months ago
Trinity Nano Preview: 6B parameter MoE (1B active, ~800M non-embedding), 56 layers, 128 experts with 8 active per token
Trinity Mini: 26B parameter MoE (3B active), fully post-trained reasoning model
They did pretraining on their own and are still training the large version on 2048 B300 GPUs
Balinares2 months ago
Interesting. Always glad to see more open weight models.
I do appreciate that they openly acknowledge the areas where they followed DeepSeek's research. I wouldn't consider that a given for a US company.
Anyone tried these as a coding model yet?
bitwize2 months ago
A moe model you say? How kawaii is it? uwu
- ghc2 months ago
  Capitalization makes a surprising amount of difference here...
- donw2 months ago
  Meccha at present, but it may reach sugoi levels with fine-tuning.
- noxa2 months ago
  I hate that I laughed at this. Thanks ;)
ksynwa2 months ago
> Trinity Large is currently training on 2048 B300 GPUs and will arrive in January 2026.
How long does the training take?
- arthurcolle2 months ago
  Couple days or weeks usually. No one is doing 9 month training runs
trvz2 months ago
Moe ≠ MoE
- cachius2 months ago
  ?
  - azinman22 months ago
    The HN title uses incorrect capitalization.
    rbanffy2 months ago
    I was eagerly waiting for the Larry and Curly models.
  - m4rtink2 months ago
    ^_-