Show HN: A 6.9B Moe LLM in Rust, Go, and Python(github.com)

3 pointsby fumi202620 days ago1 comment

fumi202620 days ago
Hi HN, author here.
I built this project because I wanted to understand the low-level mechanics of LLMs and how FFI overhead differs between languages.
Some key takeaways:
Architecture: It's a 6.9B MoE model implemented purely in Rust, Go, and Python.
Shared CUDA: All three languages bind to the exact same CUDA kernels (no PyTorch/TensorFlow).
Performance: I was surprised to see how Go handles cgo overhead compared to Rust's FFI in this specific workload.
I know it's reinventing the wheel, but it was a great way to learn. Happy to answer any questions about the implementation or the FFI architecture!