Built ScreenCommander to solve a personal gap: individual app integrations severely limit what local AI agents can actually achieve. This macOS CLI tool captures your desktop as screenshots, then allows local agents like Codex to interpret and perform actions (clicks, keystrokes, navigation) visually. Requires local permissions for accessibility and screen recording. It's like giving your agent actual eyes and hands, augmenting or bypassing rigid app skills altogether.
Works well with Codex, but not great with Claude Code or Gemini CLI yet (both are bad at novel CLI tools despite having a skills file). Also works well in conjunction with other skills (Atlas, or Apple Script), especially with non-vision models like Spark.
It was initially one-shotted from a GPT-5.2 Pro briefing into Codex-5.3-Codex-xHigh, then iterated on to fix performance issues and expand capabilities.