1 pointby lowjax4 hours ago1 comment
  • lowjax4 hours ago
    I built vscreen because AI agents couldn't actually use the internet. They can call APIs, but they can't browse a website, click a cookie banner, fill out a multi-step form, or watch a video. So I built the missing piece.

    vscreen sits between a headless Chromium and your browser. It captures the viewport via Chrome DevTools Protocol, encodes H.264 or VP9 video + Opus audio, and streams everything over WebRTC. You get a live, low-latency video feed of what the AI is doing. Mouse, keyboard, scroll, and clipboard all relay back bidirectionally.

    For AI automation, there's a built-in MCP server with 63 tools — navigate, screenshot, click elements, type text, wait for selectors, solve CAPTCHAs, manage cookies, and more. Multiple isolated instances can run in parallel.

    Written in Rust (tokio, axum, webrtc-rs). No Electron, no Puppeteer wrapper — purpose-built media pipeline. Audio also available via RTSP for external consumers like VLC or GStreamer.

    Highlights:

    - Full-page screenshots with automatic coordinate translation - Instance locking for multi-agent coordination - Works with any MCP client (Cursor, Claude, custom agents)

    Source-available, non-commercial license.

    GitHub: https://github.com/lowjax-com/vscreen