vincentweisser16 hours ago
A big step towards truly decentralized inference — unlocking consumer GPUs and already outperforming traditional approaches that stall in high-latency settings.
Unlike other p2p inference engines (e.g., Petals, Exo), our stack uniquely leverages vLLM’s advanced scheduling for efficient batch decoding, achieving 10–50× higher throughput.
Crucial for scaling decentralized RL rollouts and synthetic data generation.