Traced it down to libm implementations using different polynomial approximations, FMA instructions, and compiler optimizations. Even the same binary can produce different results depending on CPU microarchitecture.
Ended up building a REST API that forces determinism by using fixed Remez polynomial coefficients, disabling FMA, and enforcing strict evaluation order. Every response includes a SHA256 hash of the output bytes.
Just shipped an update with 19 validated unary functions (trig, hyperbolic, exp, log, roots) plus support for compound expressions. Tested across M1 Mac, x86 Linux, and H100 GPU. Same inputs produce identical hashes on all three.
Curious if others have hit this problem. How do you handle determinism in distributed systems where floating point consistency matters? Is there interest in a tool like this or are people just accepting the drift?
Demo binary: https://github.com/RegularJoe-CEO/LuxiDemo/releases/tag/v2.0...
[4] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...