1 pointby acc_100003 hours ago1 comment
  • acc_100003 hours ago
    I built this after watching 7/8 CPU cores idle during a Monte Carlo sim. multiprocessing added 189ms serialization overhead to a 9ms computation.

    ironkernel lets you write element-wise expressions with a Python decorator, compiles them to a Rust expression tree at definition time, and executes via rayon on all cores. ~2k lines of Rust, ~500 lines of Python.

    The win is expression fusion: NumPy evaluates `where(x > 0, sqrt(abs(x)) + sin(x), 0)` as 5 passes with 4 temporaries. ironkernel fuses into 1 pass, zero temporaries, and skips dead branches (no NaN from sqrt of negatives). 2.25x NumPy on compound expressions at 10M elements. For BLAS ops like SAXPY, NumPy is faster — ironkernel doesn't call BLAS.

    Early stage: f64 only, 1-D only, expression subset only (intentional — parallel safety guarantee). Numba warm is 3.2x faster (LLVM JIT vs interpreter).