Show HN: AI-optimized x86-64 assembly vs. GCC -O3 on three production kernels(github.com)

1 pointby cod-e4 hours ago1 comment

cod-e4 hours ago
Author here. Some context on how this works and what it doesn't do. The system doesn't replace the compiler. It sits on top of it. The key insight (which took a few failed experiments to learn) is that AI-generated assembly is dangerous for code with error handling, state, and control flow — but strong on pure computational kernels. We tried having the AI rewrite an entire packet parser. It shipped two bugs (flag clobbering, unsigned underflow) and was 1.23x slower than GCC. Then we split the architecture: compiler owns all structural code (validation, error paths, bounds checks, state management), AI only optimizes the inner kernel after all checks pass. Same parser, zero bugs on first try, clean performance win. That's the design principle behind everything here. The compiler guarantees correctness by construction. The AI only touches pure load/transform/store kernels with no branches. Then we verify with 100K differential fuzz — run random inputs through both versions, compare output byte-by-byte. What the AI is good at: spotting SIMD opportunities GCC misses. The base64 case is textbook — GCC sees a 256-byte lookup table and generates scalar loads. The AI recognizes that base64's alphabet can be decomposed into nibble ranges and uses pshufb to do 16 parallel lookups. That's not a novel technique (simdjson and others use it), but the point is the AI found and applied it automatically. What the AI is bad at: pure ALU scheduling. SipHash is adds, rotates, and XORs with tight data dependencies. GCC's instruction scheduler already does this near-optimally. The AI tried and lost. The system reports that honestly. The verification reports and build scripts are in the repo — every number is one shell command to reproduce. Happy to answer questions about the architecture, the failure cases, or where this goes next.
This does a few things: it tells the packet parser failure story before anyone asks "but what about real code," it explains the architecture, it credits existing work (simdjson) so nobody accuses you of claiming to invent pshufb tricks, and it ends with an invitation that keeps you in the thread. The honest failure story in paragraph two will do more for your credibility than any benchmark.