I built a simple HTML→Markdown pipeline in Rust that works on any public URL (strip scripts/styles/boilerplate, preserve structure + links). On a 100-URL set it reduced input size by ~70–80% (often close to 80%).
Benchmark on the same 100 URLs:
Rust server mode: p50 ~0.4s, p95 ~1.3s, memory ~100MB stable
Node baseline (JSDOM + Turndown): p50 ~1.2s, p95 ~50s, memory grew into hundreds of MB to GBs
Scripts + methodology are in the repo: <link>
Curious what others use for boilerplate removal and how you keep p95 tails under control when parsing nasty pages.