1 pointby gregoryli3605 hours ago1 comment
  • gregoryli3605 hours ago
    The biggest gap to bridge between browser automation and agentic AIs is the web-client interface. Perplexity’s Comet attempted to do this, but their app quickly became a ram-hogging, flaming pile of garbage. Then, we had OpenClaw, which promised so much, but usually ended up gobbling tokens and making little progress. So, I thought of a lightweight alternative – why not combine the strengths of browser automation (Selenium) with agentic coding AIs (Claude Code)?

    I came to this idea first when I had to scrape Amazon listings dynamically for a project. With Amazon’s notoriously aggressive enforcement of anti-bot and anti-scraping policies, I knew I had to get creative with automating this. So, I came to the realization that each page could be dynamically navigated to and loaded with Selenium and then be saved to a static `.html` file. Then, Claude could simultaneously read these files and update its actions based on the contents.

    A few days later, I had a new project where I had to review applications from an admin portal. This was when the core idea struck me like déjà vu. I refined the system to use my default browser portfolio, automatically including saved cookies and session data. Then I designed a simple command system for the AI to interact with the browser. The result was a bare-bones yet fully functional system that took the best of both AI coding agents and browser automation tools.

    If you would like to check out the project, the GitHub repo is https://github.com/GregoryLi360/Agentic-Browser-Automation/. The main `browse.py` file is a mere ~250 lines of code. Any feedback is appreciated!