3 pointsby fumi20267 hours ago1 comment
  • fumi20267 hours ago
    Hi HN, author here.

    I built this project because I wanted to understand the low-level mechanics of LLMs and how FFI overhead differs between languages.

    Some key takeaways:

    Architecture: It's a 6.9B MoE model implemented purely in Rust, Go, and Python.

    Shared CUDA: All three languages bind to the exact same CUDA kernels (no PyTorch/TensorFlow).

    Performance: I was surprised to see how Go handles cgo overhead compared to Rust's FFI in this specific workload.

    I know it's reinventing the wheel, but it was a great way to learn. Happy to answer any questions about the implementation or the FFI architecture!