Covenant-72B is the largest language model pre-trained through fully permissionless, decentralized coordination. 72 billion parameters, approximately 1.1 trillion tokens, trained across 70+ contributors on commodity internet connections. No datacenter, no central cluster, and no whitelisting of participants. Anyone with GPUs could join or leave at any time during the run.
The two hard problems in this setting are bandwidth and trust.
For bandwidth: synchronizing full gradients for a 72B model over residential internet is not feasible. We developed SparseLoCo, which compresses gradient communication by over 146x. Each peer transmits 1.56% of a full gradient per round using top-k sparsification, 2-bit quantization, and error feedback. The result was 94.5% compute utilization and 70-second communication overhead per round (versus 8.3 minutes for INTELLECT-1, a whitelisted 10B run).
For trust: when anyone can participate, anyone can submit garbage updates. Gauntlet is our validation layer. It scores every submission every round by measuring loss improvement on assigned and held-out data, running integrity checks, and applying persistent ranking. Only top-scoring updates touch the model.
The base model is competitive with LLaMA-2-70B on ARC (trained on half the token budget). After fine-tuning, the chat model outperforms both K2-Chat and LLaMA-2-70B-Chat on IFEval and MATH.
Weights are Apache 2.0 on HuggingFace: https://huggingface.co/1Covenant/Covenant-72B
Built by Covenant AI with Mila Quebec. Happy to answer questions about the training protocol, compression methods, or the validation mechanism.