https://i.imgur.com/6TRe1NE.png
Thank you for posting! It's unbelievable how someone sometimes just drops something that fits right into what you're doing. However bizarre it seems.
I developed a browser-based CP/M emulator & IDE: https://lockboot.github.io/desktop/
I was going to post that instead, but wanted a 'cool demo' instead, and fell down the rabbit hole.
Although from what I remember from the TV show, most of what he investigates/talks about is indeed path dependence in one way or another, although not everything was like that.
The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to trigrams from the last sentence.
This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then. Great job.
Tin foil hat on: i think that a huge part of the major buyout of ram from AI companies is to keep people from realising that we are essentially at the home computer revolution stage of llms. I have a 1tb ram machine which with custom agents outperforms all the proprietary models. It's private, secure and won't let me be motetized.
“Planting Undetectable Backdoors in Machine Learning Models”
“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”
It could with a network this small. More generally this falls under "interpretability."
You can buy a kid’s tiger electronics style toy that plays 20 questions.
It’s not like this LLM is bastion of glorious efficiency, it’s just stripped down to fit on the hardware.
Slack/Teams handles company-wide video calls and can render anything a web browser can, and they run an entire App Store of apps, all from a cross-platform application.
Including Jira in the conversation doesn’t even make logical sense. It’s not a desktop application that consumes memory. Jira has such a wide scope that the word “Jira” doesn’t even describe a single product.
The 4th Gen iPod touch had 256 meg of RAM and also did those things, with video calling via FaceTime (and probably others, but I don't care). Well, except "cross platform", what with it being the platform.
That's a bug not a feature, and strongly coupled to the root cause for slack's bloat.
By itself, I would agree.
However, in this metaphor, concrete got 15x cheaper in the same timeframe. Not enough to fully compensate for the difference, but enough that a whole generation are now used to much larger edifices.
Biggest pain point is likely the text input.
Have you experimented with having it less quantized, and evaluated the quality drop?
Regardless, very cool project.
It depends on the model, but from my experiments (quantizing one layer of a model to 2-bit and then training the model with that layer in 2-bit to fix the damage) the first layer is the most sensitive, and yes, the last layer is also sensitive too. The middle layers take the best to quantization.
Different components of a layer also have a different sensitivity; e.g. the MLP downscale block damages the model the most when quantized, while quantizing the Q projection in self attention damages the model the least.
Even with modern supercomputing the computation would be outpaced by the heat death of the universe, so token output must be limited to a single integer.
Speaking of - I remember my first digital camera (Fujitsu 1Mb resolution using SmartMedia)… it used so much power that you could take 20-30 photos and then needed to replace all 4 batteries lol