Give AI agents a real browser, watch them live via WebRTC(github.com)

1 pointby lowjax4 hours ago1 comment

lowjax4 hours ago
I built vscreen because AI agents couldn't actually use the internet. They can call APIs, but they can't browse a website, click a cookie banner, fill out a multi-step form, or watch a video. So I built the missing piece.
vscreen sits between a headless Chromium and your browser. It captures the viewport via Chrome DevTools Protocol, encodes H.264 or VP9 video + Opus audio, and streams everything over WebRTC. You get a live, low-latency video feed of what the AI is doing. Mouse, keyboard, scroll, and clipboard all relay back bidirectionally.
For AI automation, there's a built-in MCP server with 63 tools — navigate, screenshot, click elements, type text, wait for selectors, solve CAPTCHAs, manage cookies, and more. Multiple isolated instances can run in parallel.
Written in Rust (tokio, axum, webrtc-rs). No Electron, no Puppeteer wrapper — purpose-built media pipeline. Audio also available via RTSP for external consumers like VLC or GStreamer.
Highlights:
- Full-page screenshots with automatic coordinate translation - Instance locking for multi-agent coordination - Works with any MCP client (Cursor, Claude, custom agents)
Source-available, non-commercial license.
GitHub: https://github.com/lowjax-com/vscreen