2 pointsby vaibhavlodha984 hours ago1 comment
  • vaibhavlodha983 hours ago
    Hey HN! I'm the creator of Klovr.

      I built this to solve a problem I had when building AI agents: HTML wastes 60-95% of tokens, and Cloudflare's new "Markdown for Agents" only works on
      ~5% of the web (opt-in only).
    
      THE PROBLEM:
      I tested 100 popular websites with Cloudflare's Accept: text/markdown header. Only 3 actually served markdown. The rest? Still HTML. Turns out their
      markdown feature requires website owners to opt-in, which most won't do for years (if ever).
    
      MY SOLUTION:
      Klovr converts any webpage to markdown on-demand:
      - Same Accept headers as Cloudflare (100% compatible)
      - Works on 100% of sites (no opt-in needed)
      - Redis caching with 7-day TTL (10-100x speedup on repeated URLs)
      - Playwright for dynamic content (better anti-detection than Puppeteer)
      - Content-Signal headers for AI compliance
    
      TECH STACK:
      - Next.js 15 (App Router) + Vercel
      - Playwright for browser automation
      - Redis (via ioredis) for caching
      - Drizzle ORM + Neon PostgreSQL
      - Readability.js + Turndown for conversion
    
      FREE TIER: 10,000 conversions/month (no credit card)
    
      WHAT I LEARNED:
      1. Puppeteer-extra doesn't work on Vercel (ESM/CommonJS conflicts)
      2. Playwright has better anti-detection out of the box
      3. Redis caching is critical - first request is 2000ms, cached is 50ms
      4. Most sites still don't support Cloudflare's markdown (hence the need for universal conversion)
    
      CURRENT LIMITATIONS:
      - Payment processing is in development (everyone on free tier for now)
      - Dynamic content (Playwright) temporarily disabled for launch (re-enabling next week)
      - IP-based blocking (Reddit, LinkedIn) still happens - no way around datacenter IPs
    
      I'd love feedback on:
      - Architecture choices (should I use a different caching strategy?)
      - The positioning (am I framing the Cloudflare comparison correctly?)
      - What features would make this more useful for your AI agents?
    
      GitHub isn't public yet, but happy to share code snippets for specific parts (stealth script, caching layer, etc.).