269 pointsby sarkorya day ago6 comments
  • simonwa day ago
    MLX is worth paying attention to. It's still pretty young (just over a year old) but the amount of activity in that ecosystem is really impressive, and it's quickly becoming the best way to run LLMs (and vision LLMs and increasingly audio models) on a Mac.

    Here's a fun way to start interacting with it (this loads and runs Llama 3.2 3B in a terminal chat UI):

      uv run --isolated --with mlx-lm python -m mlx_lm.chat
    • masto20 hours ago
      Ran it and it crapped out with a huge backtrace. I spotted `./build_bundled.sh: line 21: cmake: command not found` in it, so I guessed I needed cmake installed. `brew install cmake` and try again. Then it crapped out with `Compatibility with CMake < 3.5 has been removed from CMake.`. Then I give up.

      This is typical of what happens any time I try to run something written in Python. It may be easier than setting up an NVIDIA GPU, but that's a low bar.

      • H3X_K1TT3N19 hours ago
        This is absolutely every experience I have with python.
      • simonw20 hours ago
        Which Python version was that? Could be that MLX have binary wheels for some versions but not others.
        • masto20 hours ago
          Adding `-p 3.12` made it work. Leaving that here in case it helps someone.
          • porridgeraisin19 hours ago
            Aha, knew you wouldn't give up. Not what our kind do
            • hnfong8 hours ago
              Never gonna give you up…

              (Sorry I’ll excuse myself now…

      • jack_pp14 hours ago
        for the record these problems don't really exist on linux in my experience
        • mobiuscog7 hours ago
          Python problems exist on all platforms. It's just that most people using Python have figured out their 'happy path' workarounds in the past and keep using them.

          Python is awesome in many ways, one of my favourite languages, but unless you are happy with venv manipulation (or live in Conda), it's often a nightmare that ends up worse than DLL-hell.

          • mastoan hour ago
            Python is in a category of things you can't just use without being an expert in the minutiae. This is unfortunate because there are a lot of people who are not Python developers who would like to run programs which happen to be written in Python.

            Python is by no means alone in this or particularly egregious. Having been a heavy Perl developer in the 2000s, I was part of the problem. I didn't understand why other people had so much trouble doing things that seemed simple to me, because I was eating, breathing, and sleeping Perl. I knew how to prolong the intervals between breaking my installation, and how to troubleshoot and repair it, but there was no reason why anyone who wanted to deploy, or even develop on, my code base should have needed that encyclopedic knowledge.

            This is why, for all their faults, I count containers as the biggest revolution in the software industry, at least for us "backend" folks.

    • esafaka day ago
      • marci19 hours ago
        For those who never heard of those:

        mlx is similar to numpy/pytorch, but only for Apple Silicon.

        mlx-lm is a llama.cpp equivalent, but built on top of mlx.

        https://github.com/ml-explore/mlx-lm

    • mathfailure18 hours ago
      How much disk & RAM does it need?

      What's your tokens/sec rate (and on which device)?

      • simonw17 hours ago
        I've been running it on a 64GB M2. My favorite models to run tend to be about 20GB to download (eg Mistral Small 3.1) and use about 20GB of RAM while they are running.

        I don't have a token/second figure to hand but it's fast enough that I'm not frustrated by it.

    • _bin_18 hours ago
      I wish apple would spend some more time paying attention to metal-jax :) it crashes with a few lines still and seems like an obvious need if apple wants to be serious about enabling ML work on their new MBPs.

      MLX looks really nice from the demo-level playing around with it I've done, but I usually stick to jax so, you know, I can actually deploy it on a server without trying to find someone who racks macs.

      • dkga18 hours ago
        So, on an M4 I sometimes get faster training on plain vanilla jax compared to the same model in pytorch or tensorflow. And jax-metal often breaks :/
        • _bin_16 hours ago
          No kidding? Might switch to CPU then. And yeah jax-metal is so utterly unreliable. I ran across an issue it turns out reduces to like a 2 line repro example which has been open on github for the better part of a year without updates
  • fsiefkena day ago
    That's great, like the ai ryzen max 395, apple silicon chips are also more energy efficient for llm (or gaming) then nvidia.

    For 4 bit deepseek-r1-distill-llama-70b on a Macbook Pro M4 Max with the MLX version on LM Studio: 10.2 tok/sec on power and 4.2 tok/sec on battery / low power

    For 4 bit gemma-3-27b-it-qat I get: 26.37 tok/sec on power and on battery low power 9.7

    It'd be nice to know all the possible power tweaks to get the value higher and get additional insight on how llm's work and interact with the cpu and memory.

    • nicoa day ago
      Thank you for the numbers

      What have you used those models for, and how would you rate them in those tasks?

      • realo21 hours ago
        RPG prompts works very very well with many of the models, but not the reasoning ones because it ends up thinking endlessly about how to be the absolute best game master possible...
        • nico20 hours ago
          Great use case. And very funny situation with the reasoning models! :)
    • vlovich12312 hours ago
      How does mlx compare with the llama.cpp backend for LM Studio?
    • bigyabai17 hours ago
      > apple silicon chips are also more energy efficient for llm (or gaming) then nvidia.

      Which benchmarks are you working off of, exactly? Unless your memory is bottlenecked, neither raster or compute workloads on M4 are more energy efficient than Nvidia's 50-series silicon: https://browser.geekbench.com/opencl-benchmarks

  • robbrua day ago
    TinyLLM is very cool to see! I will def tinker with it. I've been using MLX format for local LLMs as of late. Kinda amazing to see these models become cheaper and faster. Check out the MLX community on HuggingFace. https://huggingface.co/mlx-community
  • pj_mukha day ago
    Super cool, and will definitely check it out.

    But as a measure for what you can achieve with a course like this: does anyone know what the max tok/s vs iPhone model plot look like, and how does MLX change that plot?

  • gitrooma day ago
    dang, i've been messing with mlx too and its blowing my mind how quick this stuff is getting on macs. feels like somethings changing every time i blink
  • 16 hours ago
    undefined