Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI(github.com)

348 pointsby lairv3 hours ago23 comments

mythz3 hours ago
I consider HuggingFace more "Open AI" than OpenAI - one of the few quiet heroes (along with Chinese OSS) helping bring on-premise AI to the masses.
I'm old enough to remember when traffic was expensive, so I've no idea how they've managed to offer free hosting for so many models. Hopefully it's backed by a sustainable business model, as the ecosystem would be meaningfully worse without them.
We still need good value hardware to run Kimi/GLM in-house, but at least we've got the weights and distribution sorted.
- data-ottawa2 hours ago
  Can we toss in the work unsloth does too as an unsung hero?
  They provide excellent documentation and they’re often very quick to get high quality quants up in major formats. They’re a very trustworthy brand.
  - disiplus2 hours ago
    Yeah, they're the good guys. I suspect the open source work is mostly advertisements for them to sell consulting and services to enterprises. Otherwise, the work they do doesn't make sense to offer for free.
  - cubie2 hours ago
    I'm a big fan of their work as well, good shout.
- Tepixan hour ago
  It's insane how much traffic HF must be pushing out of the door. I routinely download models that are hundreds of gigabytes in size from them. A fantastic service to the sovererign AI community.
- zozbot2342 hours ago
  > We still need good value hardware to run Kimi/GLM in-house
  If you stream weights in from SSD storage and freely use swap to extend your KV cache it will be really slow (multiple seconds per token!) but run on basically anything. And that's still really good for stuff that can be computed overnight, perhaps even by batching many requests simultaneously. It gets progressively better as you add more compute, of course.
  - HPsquared2 hours ago
    At a certain point the energy starts to cost more than renting some GPUs.
- sowbug2 hours ago
  Why doesn't HF support BitTorrent? I know about hf-torrent and hf_transfer, but those aren't nearly as accessible as a link in the web UI.
  - embedding-shapean hour ago
    > Why doesn't HF support BitTorrent?
    Harder to track downloads then. Only when clients hit the tracker would they be able to get download states, and forget about private repositories or the "gated" ones that Meta/Facebook does for their "open" models.
    Still, if vanity metrics wasn't so important, it'd be a great option. I've even thought of creating my own torrent mirror of HF to provide as a public service, as eventually access to models will be restricted, and it would be nice to be prepared for that moment a bit better.
    sowbugan hour ago
    I thought of the tracking and gate questions, too, when I vibed up an HF torrent service a few nights ago. (Super annoying BTW to have to download the files just to hash the parts, especially when webseeds exist.) Model owners could disable or gate torrents the same way they gate the models, and HF could still measure traffic by .torrent downloads and magnet clicks.
    It's a bit like any legalization question -- the black market exists anyway, so a regulatory framework could bring at least some of it into the sunlight.
    embedding-shapean hour ago
    > Model owners could disable or gate torrents the same way they gate the models, and HF could still measure traffic by .torrent downloads and magnet clicks.
    But that'll only stop a small part, anyone could share the infohash and if you're using the dht/magnet without .torrent files or clicks on a website, no one can count those downloads unless they too scrape the dht for peers who are reporting they've completed the download.
    sowbug43 minutes ago
    Right, but that's already happening today. That's the black-market point.
- Fin_Codean hour ago
  I still don't know why they are not running on torrent. Its the perfect use case.
  - freedombenan hour ago
    That would shut out most people working for big corp, which is probably a huge percentage of the user base. It's dumb, but that's just the way corp IT is (no torrenting allowed).
    zozbot234an hour ago
    It's a sensible option, even when not everyone can really use it. Linux distros are routinely transfered via torrent, so why not other massive, open-licensed data?
    freedombenan hour ago
    Oh as an option, yeah I agree it makes a ton of sense. I just would expect a very, very small percentage of people to use the torrent over the direct download. With Linux distros, the vast majority of downloads still come from standard web servers. When I download distro images I opt for torrents, but very few people do the same
    zrm19 minutes ago
    With Linux distros they typically put the web link right on the main page and have a torrent available if you go look for it, because they want you to try their distro more than they want to save some bandwidth.
    Suppose HF did the opposite because the bandwidth saved is more and they're not as concerned you might download a different model from someone else.
  - heliumteraan hour ago
    How can you be the man in the middle in a truly P2P environment?
sheepscreeka minute ago
Curious about the financials behind this deal. Did they close above what they raised? What’s in it for HuggingFace?
0xbadcafebee10 minutes ago
[delayed]
HanClinto3 hours ago
I'm regularly amazed that HuggingFace is able to make money. It does so much good for the world.
How solid is its business model? Is it long-term viable? Will they ever "sell out"?
- microsoftedgingan hour ago
  FT had a solid piece a few weeks back: "Why AI start-up Hugging Face turned down a $500mn Nvidia deal"
  https://giftarticle.ft.com/giftarticle/actions/redeem/9b4eca...
  - jackbravoan hour ago
    sounds very interesting, but even though it says giftarticle.ft, I got blocked by a paywall.
    nerevarthelame36 minutes ago
    https://archive.is/zSyUc
    To summarize, they rejected Nvidia's offer because they didn't want one outsized investor who could sway decisions. And "the company was also able to turn down Nvidia due to its stable finances. Hugging Face operates a 'freemium' business model. Three per cent of customers, usually large corporations, pay for additional features such as more storage space and the ability to set up private repositories."
    bee_rider20 minutes ago
    Freemium seems to be working pretty well for them—what’s the alternative website, after all. They seem to command their niche.
- dmezzetti3 hours ago
  They have paid hosting - https://huggingface.co/enterprise and paid accounts. Also consulting services. Seems like a pretty good foundation to me.
  - julien_can hour ago
    and a lot of traction on paid (private in particular) storage these days; sneak peek at new landing page: https://huggingface.co/storage
- heliumteraan hour ago
  >Will they ever "sell out"?
  Oh no, never. Don't worry, the usual investors are very well known for fighting for user autonomy (AMD, Nvidia, Intel,IBM, Qualcomm)
  They are all very pro consumers and all backers are certainly here for your enjoyment only
  - zozbot234an hour ago
    These are all big hardware firms, which makes a lot of sense as a classic 'commoditize the complement' play. Not exactly pro-consumer, but not quite anti-consumer either!
- I_am_tiberius3 hours ago
  I once tried hugging face because I wanted I worked through some tutorial. They wanted my credit card details during the registration as far as I remember. After a month they invoiced me some amount of money and I had no idea what it was. To be honest, I don't understand what exactly they do and what services I was paying for, but I cancelled my account and never touched it again. For me that was a totally intransparent process.
  - shafyy3 hours ago
    Their pricing seems pretty transparent: https://huggingface.co/pricing
jgrahamcan hour ago
This is great news. I've been sponsoring ggml/llama.cpp/Georgi since 2023 via Github. Glad to see this outcome. I hope you don't mind Georgi but I'm going to cancel my sponsorship now you and the code have found a home!
mnewme3 hours ago
Huggingface is the silent GOAT of the AI space, such a great community and platform
- lairv3 hours ago
  Truly amazing that they've managed to build an open and profitable platform without shady practices
  - al_borland2 hours ago
    It’s such a sad state of affairs when shady practices are so normal that finding a company without them is noteworthy.
beoberha3 hours ago
Seems like a great fit - kinda surprised it didn’t happen sooner. I think we are deep in the valley of local AI, but I’d be willing to bet it breaks out in the next 2-3 years. Here’s hoping!
tkp-4152 hours ago
Can anyone point me in the direction of getting a model to run locally and efficiently inside something like a Docker container on a system with not so strong computing power (aka a Macbook M1 with 8gb of memory)?
Is my only option to invest in a system with more computing power? These local models look great, especially something like https://huggingface.co/AlicanKiraz0/Cybersecurity-BaronLLM_O... for assisting in penetration testing.
I've experimented with a variety of configurations on my local system, but in the end it turns into a make shift heater.
- mft_2 hours ago
  There’s no way around needing a powerful-enough system to run the model. So you either choose a model that can fit on what you have —i.e. via a small model, or a quantised slightly larger model— or you access more powerful hardware, either by buying it or renting it. (IME you don’t need Docker. For an easy start just install LM Studio and have a play.)
  I picked up a second-hand 64GB M1 Max MacBook Pro a while back for not too much money for such experimentation. It’s sufficiently fast at running any LLM models that it can fit in memory, but the gap between those models and Claude is considerable. However, this might be a path for you? It can also run all manner of diffusion models, but there the performance suffers (vs. an older discrete GPU) and you’re waiting sometimes many minutes for an edit or an image.
  - ryandrakean hour ago
    I wasn't able to have very satisfying success until I bit the bullet and threw a GPU at the problem. Found an actually reasonably priced A4000 Ada generation 20GB GPU on eBay and never looked back. I still can't run the insanely large models, but 20GB should hold me over for a while, and I didn't have to upgrade my 10 year old Ivy Bridge vintage homelab.
  - sigbottle2 hours ago
    Are mac kernels optimized compared to CUDA kernels? I know that the unified GPU approach is inherently slower, but I thought a ton of optimizations were at the kernel level too (CUDA itself is a moat)
- HanClintoan hour ago
  Maybe check out Docker Model Runner -- it's built on llama.cpp (in a good way -- not like Ollama) and handles I think most of what you're looking for?
  https://www.docker.com/blog/run-llms-locally/
  As far as how to find good models to run locally, I found this site recently, and I liked the data it provides:
  https://localclaw.io/
- zozbot2342 hours ago
  The general rule of thumb is that you should feel free to quantize even as low as 2 bits average if this helps you run a model with more active parameters. Quantized models are not perfect at all, but they're preferable to the models with fewer, bigger parameters. With 8GB usable, you could run models with up to 32B active at heavy quantization.
- xrd2 hours ago
  I think a better bet is to ask on reddit.
  https://www.reddit.com/r/LocalLLM/
  Everytime I ask the same thing here, people point me there.
androiddrew2 hours ago
One of the few acquisitions I do support
the__alchemist3 hours ago
Does anyone have a good comparison of HuggingFace/Candle to Burn? I am testing them concurrently, and Burn seems to have an easier-to-use API. (And can use Candle as a backend, which is confusing) When I ask on Reddit or Discord channels, people overwhelmingly recommend Burn, but provide no concrete reasons beyond "Candle is more for inference while Burn is training and inference". This doesn't track, as I've done training on Candle. So, if you've used both: Thoughts?
- csunoser37 minutes ago
  I have used both (albeit 2 years ago, and things change really fast). At the time, Candle didn't have 2d conv backprop with strides properly implemented. And getting Burn running libtch backend was just a lot simpler.
  I did use candle for wasm based inference for teaching purposes - that was reasonably painless and pretty nice.
periodjetan hour ago
Prediction: Amazon will end up buying HuggingFace. Screenshot this.
jimmydoe3 hours ago
Amazing. I like the openness of both project and really excited for them.
Hopefully this does not mean consolidation due to resource dry up but true fusion of the bests.
segmondy2 hours ago
Great news! I have always worried about ggml and long term prospect for them and wished for them to be rewarded for their effort.
stephantulan hour ago
Georgi is such a legend. Glad to see this happening
dhruv30062 hours ago
Huggingface is actually something thats driving good in the world. Good to see this collab/
superkuhan hour ago
I'm glad the llama.cpp and the ggml backing are getting consistent reliable economic support. I'm glad that ggerganov is getting rewarded for making such excellent tools.
I am somewhat anxious about "integration with the Hugging Face transformers library" and possible python ecosystem entanglements that might cause. I know llama.cpp and ggml already have plenty of python tooling but it's not strictly required unless you're quantizing models yourself or other such things.
dmezzetti3 hours ago
This is really great news. I've been one of the strongest supporters of local AI dedicating thousands of hours towards building a framework to enable it. I'm looking forward to seeing what comes of it!
- logicallee2 hours ago
  >I've been one of the strongest supporters of local AI, dedicating thousands of hours towards building a framework to enable it.
  Sounds like you're very serious about supporting local AI. I have a query for you (and anyone else who feels like donating) about whether you'd be willing to donate some memory/bandwidth resources p2p to hosting an offline model:
  We have a local model we would like to distribute but don't have a good CDN.
  As a user/supporter question, would you be willing to donate some spare memory/bandwidth in a simple dedicated browser tab you keep open on your desktop that plays silent audio (to not be put in the background and deloaded) and then allocates 100mb -1 gb of RAM and acts as a webrtc peer, serving checksumed models?[1] (Then our server only has to check that you still have it from time to time, by sending you some salt and a part of the file to hash and your tab proves it still has it by doing so). This doesn't require any trust, and the receiving user will also hash it and report if there's a mismatch.
  Our server federates the p2p connections, so when someone downloads they do so from a trusted peer (one who has contributed and passed the audits) like you. We considered building a binary for people to run but we consider that people couldn't trust our binaries, or would target our build process somehow, we are paranoid about trust, whereas a web model is inherently untrusted and safer. Why do all this?
  The purpose of this would be to host an offline model: we successfully ported a 1 GB model from C++ and Python to WASM and WebGPU (you can see Claude doing so here, we livestreamed some of it[2]), but the model weights at 1 GB are too much for us to host.
  Please let us know whether this is something you would contribute a background tab to hosting on your desktop. It wouldn't impact you much and you could set how much memory to dedicate to it, but you would have the good feeling of knowing that you're helping people run a trusted offline model if they want - from their very own browser, no download required. The model we ported is fast enough for anyone to run on their own machines. Let me know if this is something you'd be willing to keep a tab open for.
  [1] filesharing over webrtc works like this: https://taonexus.com/p2pfilesharing/ you can try it in 2 browser tabs.
  [2] https://www.youtube.com/watch?v=tbAkySCXyp0and and some other videos
  - HanClintoan hour ago
    Hosting model weights for projects like this I think is something that you could upload to a space in Hugging Face?
    What services would you need that Hugging Face doesn't provide?
  - echoanglean hour ago
    Maybe stupid question but why not just put it in a torrent?
    logicallee33 minutes ago
    Torrents require users to download and install a torrent client! In addition, we would like to retain the possibility of giving live updates to the latest version of a sovereign fine-tuned file, torrents don't autoupdate. We want to keep improving what people get.
    Finally, we would like the possibility of setting up market dynamics in the future: if you aren't currently using all your ram, why not rent it out? This matches the p2p edge architecture we envision.
    In addition, our work on WebGPU would allow you to rent out your gpu to a background tab whenever you're not using it. Why have all that silicon sit idle when you could rent it out?
    You could also donate it to help fine tune our own sovereign model.
    All of this will let us bootstrap to the point where we could be trusted with a download.
    We have a rather paranoid approach to security.
  - liuliuan hour ago
    > We have a local model we would like to distribute but don't have a good CDN.
    That is not true. I am serving models off Cloudflare R2. It is 1 petabyte per month in egress use and I basically pay peanuts (~$200 everything included).
    logicallee29 minutes ago
    1 petabyte per month is 1 million downloads of a 1 GB file. We intend to scale to more than 1 million downloads per month. We have a specific scaling architecture in mind. We're qualified to say this because we've ported a billion parameter model to run in your browser - fast - on either webgpu or wasm. (You can see us doing it live at the youtube link in my comment above.) There is a lot of demand for that.
ukblewis13 minutes ago
Honestly I’m shocked to be the only one I see of this opinion: HuggingFace’s `accelerate`, `transformers` and `datasets` have been some of the worst open source Python libraries I have ever used that I had to use. They break backwards compatibility constantly, even on APIs which are not underscore/dunder named even on minor version releases without even documenting this, they refuse PRs fixing their lack of `overloads` type annotations which breaks type checking on their libraries and they just generally seem to have spaghetti code. I am not excited that another team is joining them and consolidating more engineering might in the hands of these people
- ukblewis11 minutes ago
  And clearly I say all of this in my name and not my employers name
- ukblewis12 minutes ago
  And I said all of that despite us continuing to use their platform and libraries extensively… We just don’t have a choice due to their dominance of open source ML
geooff_3 hours ago
As someone who's been in the "AI" space for a while its strange how Hugging Face went from one of the biggest name to not a part of the discussion at all.
- r_lee3 hours ago
  I think that's because there's less local AI usage now since there's all kinds of image models by the big labs, so there's really no rush of people self hosting stable diffusion etc anymore
  the space moved from Consumer to Enterprise pretty fast due to models getting bigger
  - zozbot2343 hours ago
    Today's free models are not really bigger when you account for the use of MoE (with ever increasing sparsity, meaning a smaller fraction of active parameters), and better ways of managing KV caching. You can do useful things with very little RAM/VRAM, it just gets slower and slower the more you try to squeeze it where it doesn't quite belong. But that's not a problem if you're willing to wait for every answer.
- segmondy2 hours ago
  part of what discussion? anyone in the AI space knows and uses HF, but the public doesn't give a care and why should they? It's just an advanced site were nerds download AI stuff. HF is super valuable with their transformers library, their code, tutorials, smol-models, etc, but how does it translate to investor dollars?
- LatencyKills3 hours ago
  It isn't necessary to be part of the discussion if you are truly adding value (which HF continues to do). It's nice to see a company doing what it does best without constantly driving the hype train.
option2 hours ago
Isn't HF banned in China? Also, how are many Chinese labs on Twitter all the time?
In either case - huge thanks to them for keeping AI open!
- dragonwriteran hour ago
  > Isn't HF banned in China?
  I think, for some definition of “banned”, that’s the case. It doesn’t stop the Chinese labs from having organization accounts on HF and distributing models there. ModelScope is apparently the HF-equivalent for reaching Chinese users.
- disiplus2 hours ago
  I think in the West we think everything is blocked. But for example, if you book an eSIM, when you visit you already get direct access to Western services because they route it to some other server. Hong Kong is totally different: they basically use WhatsApp and Google Maps, and everything worked when I was there.
  - embedding-shapean hour ago
    But also yes, parent is right, HF is more or less inaccessible, and Modelscope frequently cited as the mirror to use (although many Chinese labs seems to treat HF as the mirror, and Modelscope as the "real" origin).
- woadwarrior012 hours ago
  HF is indeed banned in China. The Chinese equivalent of HF is ModelScope[1].
  [1]: https://modelscope.cn/
rvz3 hours ago
This acquisition is almost the same as the acquisition of Bun by Anthropic.
Both $0 revenue "companies", but have created software that is essential to the wider ecosystem and has mindshare value; Bun for Javascript and Ggml for AI models.
But of course the VCs needed an exit sooner or later. That was inevitable.
- andsoitis2 hours ago
  I believe ggml.ai was funded by angel investors, not VC.
- 3 hours ago
  undefined
raphaelmolly84 minutes ago
[dead]
Filip_portive3 hours ago
[flagged]