Every model with ~4B parameters runs perfectly fine on even a Geforce 1070 Mobile GPU with 8Gb of memory.
If you have some patience you can probably go a little crazy and run a model with ~27B parameters on a Radeon 890M with 32Gb of memory as well (means you'll probably have to get about 96Gb of system memory if you want to get some work done too, but oh well).
In theory you could even run a model which fits in 64Gb of video memory on that "little" GPU (with 128Gb of system memory).
No, you can't run something like Grok 2 (which has quantified models starting with 82Gb in size and going up) but why on earth would you ever want to run something like that locally?
One of the things I am currently experimenting with is building out my own agentic/assisted computing environment which instead of extending into Google/Microsoft/Apple owned cloud based services, extend into services which run on my homelab environment instead.
As a simple example: A local model which can hook into a MCP service making it understand calendars and appointments which hooks into my own locally hosted Radicale CalDAV service, enabling me to quickly make a appointment through text (or possibly even STT later). I'm curious how much I can get something like Thunderbird to disappear.
A somewhat advanced example: Another thing which recently popped up as a idea, I'm quite excited about and I hope will work out is that I can teach a model the concept of a "package repository", a "package manager" and "systems", which (hopefully) means I can install, uninstall, update and track the status of software packages on my Linux systems without using the terminal or shelling into a system myself.
Summarized: I think some things Big Tech wants are pretty neat, but I would like something without heavy involvement of Big Tech (and/or subscription based computing) instead.
What I can see myself trying to do is some new ways of working with body of text notes. Local RAG for chatting with documents is also interesting.
And yes, with 'subscription based computing' shreds of privacy we had are gone.
One reason is I tend to make significant use of pre-compiled libraries so my build times tend to be reasonable.
And I also like the feedback from testing on a lower powered machine. If it runs well on a low end machine, better hardware is generally not a problem.
The reverse is often not the case. Software blunders can be completed masked with enough hardware.
But since it requires less than 16gb, the author is still right.