Show HN: I built a zero-log PII redaction API – no AI, just regex and checksums(pii-firewall-edge-web.vercel.app)

1 pointby Raviteja_a month ago3 comments

Raviteja_a month ago
Quick technical notes for HN:
Why no AI?
The irony of sending PII to an AI model to detect PII is lost on most "privacy" APIs. This is pure algorithmic detection – the same approach your credit card company uses to validate card numbers.
What's validated (not just pattern-matched): - Credit cards → Luhn checksum - Aadhaar → Verhoeff (the algorithm that catches single-digit and transposition errors) - IBAN → Mod 97 (same as banks use) - Singapore NRIC → Mod 11 with offset - Brazilian CPF → Dual Mod 11
Latency breakdown: - Heuristic scan: O(n) single pass for trigger characters (@, -, digits) - Pattern matching: Only runs if triggers found - Validation: Only on pattern matches - Total: 2-5ms for /fast, 5-15ms for /deep
False positive mitigation: - "Order ID: 123-45-6789" won't trigger SSN (negative context) - Timestamps won't match phone patterns (separator requirements) - Random 16-digit numbers won't trigger credit card (Luhn must pass)
comfytummyedgya month ago
We integrated AI into our product recently and looking for few ways to protect our users data. Definitely going to check it out and try in our workflow.
max_aucubea month ago
The project is great, honestly. But I just put a space in the email by mistake, it wasn't censored.
- Raviteja_a month ago
  Great catch! Emails with spaces around @ (like "test @ example.com") slip through. This is a classic obfuscation bypass.
  The current pattern intentionally matches RFC 5321 compliant emails (no spaces). Adding support for spaced variants creates a trade off. wewould catch more bypass attempts but also increase false positives on text like "send @ 5pm". I'll add this to the roadmap. Appreciate the feedback ! this is exactly the kind of edge case I need to hear about to make my api more better