I coded up the demo myself and didn't anticipate how disruptive the intermittent warning messages about waiting users would become. The demo is quite resource-intensive: each session currently requires its own H100 GPU, and I'm already using a dispatcher-worker setup with 8 parallel workers. Unfortunately, demand exceeded my setup, causing significant lag and I had to limit sessions to 60 more seconds when others are waiting. Additionally, the underlying diffusion model itself is slow to run, resulting in a frame rate typically below 2 fps, further compounded by network bottlenecks.
As for model capabilities, NeuralOS is indeed quite limited at this point (as acknowledged in my paper abstract). That's why the demo interactions shown in my tweet were minimal (opening Firefox, typing a URL).
Overall, this is meant as a proof-of-concept demonstrating the potential of generative, neural-network-powered GUIs. It's fully open-source, and I hope others can help improve it going forward!
Thanks again for the honest feedback.
Could you talk about your hopes for the future on this project? What are your thoughts on having a more simplified interface which could combine inputs in a more abstract way, or are you only interested in simulating a traditional OS?
Thanks again.
PS the waiting time while firefox “loads” made me laugh. I presume this is also simulated.
However, my real dream behind this project is to blur the boundaries across applications, not just simulate traditional OS interactions. For example, imagine converting a movie we're watching directly into an interactive video game, or instantly changing the interface of an app (like Signal) to something we prefer (like Facebook Messenger) on the fly.
Of course, the current training data severely limits what's achievable today. But looking forward, I envision combining techniques from controllable text generation (such as Zhiting Hu's "Toward Controlled Generation of Text" paper) or synthesizing new interaction data to achieve greater and customization. I believe this is a promising path toward creating truly generative and personalized interfaces.
Thanks again for your interest!
Although this is of course ridiculously wasteful right now, I can see this being the optimal solution for many things if a technology like thermodynamic well based neural networks get to the point of viability.
A thermodynamic well based model could have a trillion parameters in the size of an so card at a few milliwatts of power.
In case like that it’s easy to imagine that mass produced implementations could be a one size fits all solution for all but the most trivial or advanced computing tasks. For perhaps less than a dollar for a 100b sized chip, you get the ability to “imagine” video, sound, etc and a strong general purpose “reasoning” capability imbedded into everything right down to children’s toys and toasters.
Kinda makes me think of Rick and Morty with the butter passing robot. A lot of pointless capabilities, but still cheaper than a purpose built deterministic computing device. OTOH having embedded knowledge as an ambient part of everyday life would be kinda neat, even if it would almost surely mean the end of human civilization lol.
What are the implications of relying on deep networks for instantiating and running the abstractions we usually hand upon physics and transistors
Is this a type of VM
Is an imagined VM Turing complete?
Fascinating question. My “vibe” opinion is that it is, but there are limits on the meaning of Turing completeness that do not apply within traditional computing paradigms, vis a vis scaling costs. My intuition is that scaling costs in imaginary VMs would be quadratic rather that linear, e.g. a task that takes twice the memory takes ^2 compute instead.
It's an interesting project. I'll totally accept "for fun" or "because" but I'm interested in the why. Even if just a very narrow thing, is there any benefits we would get from using a ML based OS? I mean it is definitely cool and that has merit in its own right, but people talk about Neural OSs and I just don't "get it"
Unlike other ML-based OS projects (such as Gemini OS, which generates code and renders traditional UIs), NeuralOS directly generates every pixel. While this makes it susceptible to hallucination, in my opinion the other side of hallucination is full flexibility. In the future, I imagine operating systems running entirely (or mostly) on GPUs, adapting to user intent on the fly rather than relying on pre-designed menus and options.
That isn't to say that I don't think there shouldn't be neural OS's. But I do imagine them being something radically different. Do we really want them to mimic what we have now? Or is that not, in some vague way, more like a mind?
Regardless, I think this is really neat. I'm a big fan of doing things "just because" and "I wonder what would happen if". So I'm not trying to knock you down. I mean, I'm wrong about a lot of things haha
This essentially is the idea of Star Trek computers, where there were "neural gel packs" being programmed/primed for different purposes on the starship's systems.
Damn, I have to think about this more. Essentially you are building a holodeck computer, where the users interacting with it just describe roughly what they want and the computer just generates it - in human language being the primary interface.
Note: The Space is intended as a template, so please duplicate it and run with your own GPU for a better experience. (The default Space has only one worker.)
Recommended GPU: At least an L40, ideally an A100-large. (The original demo at neural-os.com used H100s.)
All code and models are self-contained in the huggingface space.
See my tweet for more details: https://x.com/yuntiandeng/status/1944802154314916331
However, I was able to click on a folder, it opened and looked fairly convincing. Only indicator that something was off - other than lag - was the at the bottom of the file browser, it mentioned how much diskspace was available: the first digit was clearly 6, the second was flickering and blurring between different numbers.
Pretty interesting idea though. What framerates should it run at? I felt I was getting <5fps.
Looks like the entire mucky internet will be fixed with just some careful prompting as soon as this thing runs efficiently!
More seriously, it would be fun - and probably instructive - to play with a system that consistently (shallowly) simulated that. A kind of oasis.
> imagine a Petrovich layer over another operating system, such as Microsoft Windows (TM). Every time Windows does something you don't like, you could punish it, and it would never do it again...
There is no underlying kernel, no function calls, no program execution, and no networking. Everything is purely visual and imagined by the neural model. You can think of it as a safe, isolated container where nothing can actually run or cause harm, since no real code executes. It's essentially an interactive video simulation, conditioned entirely on user inputs.
The purpose of an OS is to manage the resources of the computer, CPU, RAM, devices, etc. This is simply a UI generated by an NN.
Also, it isn't an OS in any way shape or form. It's just another slop video generator. It even tries to "simulate" the applications themselves. One can just run the application itself, instead of simulating it. Case in point: try going anywhere except google.com in the "browser".
What problem is this trying to solve? And "to show how it might look" is not a valid answer, because it is designed to look like xfce4. It is not trying to generate a UI or something. And I can just run xfce4 in termux on my phone right now and be able to see exactly how it looks. How do you expect this to be a useful UI framework? Remember, the existing xfce4 works perfectly all the time, and this is just designed to (badly) simulate it only most of the time. What is the value proposition of something like this?
although i wasn't able to really use it due to lag