1 pointby khaeldur10 hours ago1 comment
  • khaeldur10 hours ago
    Hi HN, I built NeuralForge because I wanted to fine-tune small LLMs on my MacBook without renting cloud GPUs or setting up CUDA.

    It uses Apple's Neural Engine directly (not Metal, not CPU) to hit ~1.2 TFLOPS on a consumer Mac. The app wraps a C/Obj-C training engine in a SwiftUI GUI with live loss curves, LoRA support, and one-click export to GGUF/CoreML.

    What actually works today: - Real training on ANE (110M parameter llama2.c models) - LoRA fine-tuning (rank 4-64 on attention weights) - Cosine LR schedule with warmup - Checkpoint save/resume (survives crashes) - Text generation at 66ms/token with top-p sampling - Export to GGUF, llama2c, CoreML formats - 648 automated tests (unit + integration on real hardware)

    Current limitations: - Only llama2.c format models (110M tested, larger planned) - macOS 14+ on Apple Silicon only - Some UI features are still stubs (being honest)

    Setup: git clone + bash setup.sh (one command)

    The hardest part was ANE itself. There's basically zero documentation on using it for training — it's designed for inference only. I had to reverse-engineer the MIL compiler, figure out the 119-kernel compilation limit per process, and build an exec() restart mechanism that transparently re-launches the training process to get fresh kernel budget.

    MIT licensed: https://github.com/Khaeldur/NeuralForge

    Happy to answer questions about ANE internals.ye