vscreen sits between a headless Chromium and your browser. It captures the viewport via Chrome DevTools Protocol, encodes H.264 or VP9 video + Opus audio, and streams everything over WebRTC. You get a live, low-latency video feed of what the AI is doing. Mouse, keyboard, scroll, and clipboard all relay back bidirectionally.
For AI automation, there's a built-in MCP server with 63 tools — navigate, screenshot, click elements, type text, wait for selectors, solve CAPTCHAs, manage cookies, and more. Multiple isolated instances can run in parallel.
Written in Rust (tokio, axum, webrtc-rs). No Electron, no Puppeteer wrapper — purpose-built media pipeline. Audio also available via RTSP for external consumers like VLC or GStreamer.
Highlights:
- Full-page screenshots with automatic coordinate translation - Instance locking for multi-agent coordination - Works with any MCP client (Cursor, Claude, custom agents)
Source-available, non-commercial license.