2 pointsby janreges3 hours ago1 comment
  • janreges3 hours ago
    Hi HN, I'm the author. I originally built SiteOne Crawler in PHP+Swoole back in 2023. Last year I rewrote it entirely in Rust — 25% faster execution, 30% lower memory, and a single native binary with zero runtime dependencies.

    The feature I'm most excited about is CI/CD quality gating. The idea is simple: crawl your entire website after deploy and block the pipeline if quality regresses.

    Example:

       siteone-crawler --url=https://example.com --ci \
          --ci-min-score=7.5 \
          --ci-max-404=0 \
          --ci-max-redirects=5
    
    Install:

       # Debian/Ubuntu repo setup:
       curl -1sLf 'https://dl.cloudsmith.io/public/janreges/siteone-crawler/setup.deb.sh' | sudo -E bash
    
       brew install janreges/tap/siteone-crawler    # macOS / Linux
       sudo apt-get install siteone-crawler         # Debian/Ubuntu (after adding repo)
       sudo dnf install siteone-crawler             # Fedora/RHEL (after adding repo)
       cargo install siteone-crawler                # from source, any platform
    
       # Windows: https://github.com/janreges/siteone-crawler/releases
    
    This crawls every page, scores it across 5 categories (Security, Performance, SEO, Accessibility, Best Practices) on a 0–10 scale, and exits with code 10 if any threshold is breached. Drop it into GitHub Actions, GitLab CI, or any pipeline as a single binary — no Docker, no Node, no runtime needed.

    Beyond CI/CD, it also does: - Offline website archiving with a built-in HTTP server for self-hosting - Full-site markdown export with deduplicated content (great for feeding to LLMs) - Interactive HTML audit reports you can email via built-in SMTP - Sitemap generation

    Sample HTML report: https://crawler.siteone.io/html/2024-08-23/forever/cl8xw4r-f... GitHub: https://github.com/janreges/siteone-crawler

    I'd love to hear your feedback — especially if you're already doing something similar in your CI/CD pipelines. What thresholds would you find useful?