104 pointsby sipjca4 days ago12 comments
  • chrismorgan4 hours ago
    I’m curious: does this fundamentally need to contain an actual model, or would it be okay if it generated a synthetic model itself, full of random weights? I’m picturing downloading just, say, a 20MB file instead of the multi-gigabyte one, and…

    Hang on, why is https://blob.localscore.ai/localscore-0.9.2 380MB? I remember llamafile being only a few megabytes. From https://github.com/Mozilla-Ocho/llamafile/releases, looks like it steadily grew from adding support for GPUs on more platforms, up to 28.5MiB¹ in 0.8.12, and then rocketed up to 230MiB in 0.8.13:

    > The llamafile executable size is increased from 30mb to 200mb by this release. This is caused by https://github.com/ggml-org/llama.cpp/issues/7156. We're already employing some workarounds to minimize the impact of upstream development contributions on binary size, and we're aiming to find more in the near future.

    Ah, of course, CUDA. Honestly I might be more surprised that it’s only this big. That monstrosity will happily consume a dozen gigabytes of disk space.

    llamafile-0.9.0 was still 231MiB, then llamafile-0.9.1 was 391MiB, now llamafile-0.9.2 is 293MiB. Fluctuating all over the place, but growing a lot. And localscore-0.9.2 is 363MiB. Why 70MiB extra on top of llamafile-0.9.2? I’m curious, but not curious enough to investigate concretely.

    Well, this became a grumble about bloat, but I’d still like to know whether it would be feasible to ship a smaller localscore that would synthesise a suitable model, according to the size required, at runtime.

    —⁂—

    ¹ Eww, GitHub is using the “MB” suffix for its file sizes, but they’re actually mebibytes (2²⁰ bytes, 1048576 bytes, MiB). I thought we’d basically settled on returning the M/mega- prefix to SI with its traditional 10⁶ definition, at least for file sizes, ten or fifteen years ago.

  • FloatArtifact6 hours ago
    I've been waiting for something like this. Have you considered the following based on the benchmark data that's submitted beyond the GPU?

    1. User selects a model, size and token output speed and latency. The website generates a hardware list of components that should match the performance requirements.

    2. User selects hardware components and the website generates a list of models that performant on that hardware.

    3. Monetize through affiliate links the components to fund the project. Think like PC part picker.

    I know there's going to be some variability in the benchmarks due to the software stack, but it should give a AI enthusiasts, an educated perspective on what hardware can be relevant for their use case.

  • jsatok15 hours ago
    Contributed scores for the M3 Ultra 512 GB unified memory: https://www.localscore.ai/accelerator/404

    Happy to test larger models that utilize the memory capacity if helpful.

    • deanputney9 hours ago
      That's very interesting. I guess it just can't compete with any of the Nvidia cards? I would think your results should show up if sorted by "generation"– maybe the leaderboard is cached...
  • mentalgear21 hours ago
    Congrats on the effort - the local-first / private space needs more performant AI, and AI in general needs more comparable and trustworthy benchmarks.

    Notes: - Olama integration would be nice - Is there an anonymous federated score sharing? That way, users you approximate a model's performance before downloading it.

  • omneity11 hours ago
    This is great, congrats for launching!

    A couple of ideas .. I would like to benchmark a remote headless server, as well as different methods to run the LLM (vllm vs tgi vs llama.cpp ...) on my local machine, and in this case llamafile is quite limiting. Connecting over an OpenAI-like API instead would be great!

  • jborichevskiy4 days ago
    Congrats on launching!

    Stoked to have this dataset out in the open. I submitted a bunch of tests for some models I'm experimenting with on my M4 Pro. Rather paltry scores compared to having a dedicated GPU but I'm excited that running a 24B model locally is actually feasible at this point.

  • sharmasachin988 hours ago
    This looks super useful, especially with so many folks experimenting with local LLMs now. Curious how well it handles edge devices. Will give it a try!
  • gunalx7 hours ago
    Why choose a combibation of llama and qwen, when you could have used just qwen models with more permissive license?
  • ftbsqcfjm14 hours ago
    Interesting approach to making local recommendations more personalized and relevant. I'm curious about the cold start problem for new users and how the platform handles privacy. Partnering with local businesses to augment data could be a smart move. Will be watching to see how this develops!
  • 11 hours ago
    undefined
  • roxolotl18 hours ago
    This is super cool. I finally just upgraded my desktop and one thing I’m curious to do with it is run local models. Of course the ram is late so I’ve been googling trying to get an idea of what I could expect and there’s not much out there to compare to unless you’re running state of the art stuff.

    I’ll make sure to run contribute my benchmark to this once my ram comes in.

  • alchemist1e920 hours ago
    Really awesome project!

    Clicking on GPU is a nice simple visualization. I was thinking maybe try to put that type of visual representation intuitively accessible immediately on the landing page.

    cpubenchmark.net could he an example technique of drawing the site visitor into the paradigm.