1 pointby Ruikhu5 hours ago2 comments
  • Ruikhu5 hours ago
    Hi — author here. One clarification: The goal is not to let an AI freely control a computer. I built a fixed local action skill library. Each skill is a deterministic OS operation (open app, switch window, run command, structured input). The model does not generate UI steps or mouse actions. It only selects a skill. The gateway executes it. So the LLM is making decisions, not performing motor control. The computer isn’t remotely driven by the model — the model chooses from a constrained set of allowed actions. This is mainly an experiment in making computer-using agents more predictable and auditable. I’d especially value thoughts from people working on agent safety.
  • Ruikhu5 hours ago
    Another clarification since a few people messaged me privately: This is not just a conceptual architecture — we actually tested it using the official Claude mobile app controlling a real desktop computer. The phone runs the model inside the official app. The app produces instructions in natural language. Our gateway parses intent and maps it to a verified local action skill (keyboard/window/command primitives). So the model is not embedded in the OS and not calling an API. It is literally the mobile LLM app interacting with a real operating system through a constrained execution layer. We were interested in whether an official consumer LLM app (without system privileges) could still reliably operate a computer when paired with a deterministic action layer.