The feature I'm most excited about is CI/CD quality gating. The idea is simple: crawl your entire website after deploy and block the pipeline if quality regresses.
Example:
siteone-crawler --url=https://example.com --ci \
--ci-min-score=7.5 \
--ci-max-404=0 \
--ci-max-redirects=5
Install: # Debian/Ubuntu repo setup:
curl -1sLf 'https://dl.cloudsmith.io/public/janreges/siteone-crawler/setup.deb.sh' | sudo -E bash
brew install janreges/tap/siteone-crawler # macOS / Linux
sudo apt-get install siteone-crawler # Debian/Ubuntu (after adding repo)
sudo dnf install siteone-crawler # Fedora/RHEL (after adding repo)
cargo install siteone-crawler # from source, any platform
# Windows: https://github.com/janreges/siteone-crawler/releases
This crawls every page, scores it across 5 categories (Security, Performance, SEO,
Accessibility, Best Practices) on a 0–10 scale, and exits with code 10 if any
threshold is breached. Drop it into GitHub Actions, GitLab CI, or any pipeline
as a single binary — no Docker, no Node, no runtime needed.Beyond CI/CD, it also does: - Offline website archiving with a built-in HTTP server for self-hosting - Full-site markdown export with deduplicated content (great for feeding to LLMs) - Interactive HTML audit reports you can email via built-in SMTP - Sitemap generation
Sample HTML report: https://crawler.siteone.io/html/2024-08-23/forever/cl8xw4r-f... GitHub: https://github.com/janreges/siteone-crawler
I'd love to hear your feedback — especially if you're already doing something similar in your CI/CD pipelines. What thresholds would you find useful?