It'd also need to be much more precise in hardware specs and cover a lot more models and their variants to be actually useful.
Grading the compatibilty is also an absolute requirement – it's rarely an absolute yes or no, but often a question of available GPU memory. There's a lot of other factors too which don't seem to be considered.
Are you sure it's not powered by an LLM inside?
I can absolutely run models that this site says cannot be run. Shared RAM is a thing - even with limited VRAM, shared RAM can compensate to run larger models. (Slowly, admittedly, but they work.)
> coined the term in February 2025
> Vibe coding is a new coding style [...] A programmer can describe a program in words and get an AI tool to generate working code, without requiring an understanding of the code. [...] [The programmer] surrenders to the "vibes" of the AI [without reading the resulting code.] When errors arise, he simply copies them into the system without further explanation.
> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run the smaller distilled version (likely 7B parameters or less) of this model.
Last I checked DeepSeek R1 was a 671B model, not a 7B model. Was this site made with AI?
OP said they "vibe coded" it, so yes.
Here[0] are some 1.5B and 8B distilled+quantized derivatives of DeepSeek. However, I don’t find a 7B model, that seems totally made-up from whole cloth. Also, I personally wouldn’t call this 8B model “DeepSeek”.
0: https://www.reddit.com/r/LocalLLaMA/comments/1iskrsp/quantiz...
Not technically the full R1 model, it’s talking about the distillations where Deepseek trained Qwen and Llama models based on R1 output
> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run this model.
No mention of distillations. This was definitely either made by AI, or someone picking numbers for the models totally at random.
That's not ideal from a token throughput perspective, but I can see min working set of weight memory gains if you can load pieces into vram for each token.
In my experience, LM Studio does a pretty great job of making this a non-issue. Also, whatever heuristics this site is based on are incorrect — I'm running models on a 64GB Mac Studio M1 Max that it claims I can't.
- When I select "no dedicated GPU" because mine isn't listed, it'll just answer the same "you need more (V)RAM" for everything I click. It might as well color those models red in the list already, or at minimum show the result without having to click "Check" after selecting everything. The UX flow isn't great
- I have 24GB RAM (8GB fixed soldered, extended with 1x16GB SO-DIMM), but that's not an option to select. Instead of using a dropdown for a number, maybe make it a numeric input field, optionally with a slider like <input type=range min=1 max=128 step=2>, or mention whether to round up or down when one has an in-between value (I presume down? I'm not into this yet, that's why I'm here / why this site sounded useful)
- I'm wondering if this website can just be a table with like three columns (model name, minimum RAM, minimum VRAM). To answer my own question, I tried checking the source code but it's obfuscated with no source map available, so not sure if this suggestion would work
- Edit2: while the tab is open, one CPU core is at 100%. That's impressive, browsers are supposed to not let a page fire code more than once per second when the tab is not in the foreground, and if it were an infinite loop then the page would hang. WTF is this doing? When I break the debugger at a random moment, it's in scheduler.production.min.js according to the comment above the place where it drops me </edit2>.
Edit: thinking about this again...
what if you flip the whole concept?
1. Put in your specs
2. It shows a list of models you can run
The list could be sorted descending by size (presuming that loosely corresponds to best quality, per my lay person understanding). At the bottom, it could show a list of models that the website is aware of but that your hardware can't run
Also, my Mac with 36 GB of memory can't be selected.
Ideal scenario (YMMV): add more hardware parameters (like chipset, CPU, actual RAM type/timings - with presets for most common setups) and extra model settings (quantization and context size come to mind) then answer like this: "you have sufficient RAM to load the model, and you should expect performance around 10 tok/sec with 3s to the first token". Or maybe rather list all models you know about and provide performance for each. Inverse search ("what rig do I need to run this model with at least this performance") would be also very cool. May be nice have an ability to parse input of common system information tools (like Windows wmic/Get-ComputerInfo, macOS system_profiler or GNU/Linux dmidecode - not sure if all info is there, but just as an rough idea: give some commands to run, parse their output in search of specs)
Of course, this would be very non-trivial to implement and you'll probably have to dig a lot for anecdotal data on how various hardware performs (hmm... maybe a good task for agentic LLM?) but that would actually make this a serious tool that people can use and link to, rather than a toy.