2 pointsby michaeloblak2 hours ago2 comments
  • michaeloblak2 hours ago
    Hey HN, I built this because my AI agent was spending 8 seconds and 300MB of RAM just to search X. That felt wrong — the data is right there behind one HTTP request, but the "standard" approach is to launch a full browser, render the page, and scrape the DOM.

    web2cli makes direct HTTP requests using your browser cookies. No Chromium, no Selenium, no headless anything. The tricky part was TLS fingerprinting - Cloudflare blocks Python's default TLS stack (JA3 fingerprint mismatch), so web2cli uses curl_cffi with BoringSSL to impersonate Chrome at the TLS level. X.com was even harder - their search requires a cryptographic nonce generated by obfuscated browser JS, which the community reverse-engineered.

    Six adapters today: HN, X, Discord, Slack, Stack Overflow, Reddit. Each adapter is a YAML file - writing a new one takes ~30 minutes (or ~3 minutes for your coding agent) and doesn't require Python code for most sites (although it's possible to add a custom python provider, like I did with X).

    I'm working on web2cli Cloud - think "OAuth for sites that don't have OAuth." Your users log in via a sandboxed browser, your agent gets an opaque session token, cookies never touch your server.

    Happy to go deep on the adapter architecture, anti-bot bypasses, or the economics of browser automation vs direct HTTP.

  • pancsta38 minutes ago
    „Every website” == 6 websites. I like the table layout of the results tho.
    • michaeloblak28 minutes ago
      Ha, fair point — "every website" is the vision, 6 is the MVP :)

      The adapter model is designed so adding a new site is a single YAML file (~30 min of work, or ~3 min with a coding agent). No Python code needed for most sites. PRs welcome if there's a site you'd want to see!