A major leap for macOS users – explore Purem here: https://worktif.com/
After strict recalculation accounting for GPU-to-CPU architecture and memory parallelism, our CPU implementation of Softmax (6500 ops/sec) outperforms the GPU-optimized FlashAttention recalculated for standard CPUs (estimated 800–5000 ops/sec)
Benchmark Highlights: • Purem achieves ~6500 ops/sec for large-scale softmax operations on Apple M2 CPU core. Comparable to production-grade benchmarks: • OpenAI's FlashAttention: Achieves 800–5000 ops/sec on CPU. • Meta’s Xformers (PyTorch 2.0): Achieves 1300–1500 ops/sec on CPU.
Key Advantages: • Local Execution: Runs entirely on your MacBook without the need for specialized hardware or cloud services. • Cost-Efficient: Delivers performance comparable to billion-dollar infrastructures at a fraction of the cost. • Open and Transparent: Built with transparency in mind, allowing for easy inspection and integration.
Official Comparing References: • OpenAI FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness: Tri Dao et al., 2022, https://arxiv.org/abs/2205.14135 • FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention: PyTorch Blog, 2024, https://pytorch.org/blog/flexattention/
With Purem, experience the power of industrial-grade AI performance directly on your personal device.