1 pointby Raviteja_a month ago3 comments
  • Raviteja_a month ago
    Quick technical notes for HN:

    Why no AI?

    The irony of sending PII to an AI model to detect PII is lost on most "privacy" APIs. This is pure algorithmic detection – the same approach your credit card company uses to validate card numbers.

    What's validated (not just pattern-matched): - Credit cards → Luhn checksum - Aadhaar → Verhoeff (the algorithm that catches single-digit and transposition errors) - IBAN → Mod 97 (same as banks use) - Singapore NRIC → Mod 11 with offset - Brazilian CPF → Dual Mod 11

    Latency breakdown: - Heuristic scan: O(n) single pass for trigger characters (@, -, digits) - Pattern matching: Only runs if triggers found - Validation: Only on pattern matches - Total: 2-5ms for /fast, 5-15ms for /deep

    False positive mitigation: - "Order ID: 123-45-6789" won't trigger SSN (negative context) - Timestamps won't match phone patterns (separator requirements) - Random 16-digit numbers won't trigger credit card (Luhn must pass)

  • comfytummyedgya month ago
    We integrated AI into our product recently and looking for few ways to protect our users data. Definitely going to check it out and try in our workflow.
  • max_aucubea month ago
    The project is great, honestly. But I just put a space in the email by mistake, it wasn't censored.
    • Raviteja_a month ago
      Great catch! Emails with spaces around @ (like "test @ example.com") slip through. This is a classic obfuscation bypass.

      The current pattern intentionally matches RFC 5321 compliant emails (no spaces). Adding support for spaced variants creates a trade off. wewould catch more bypass attempts but also increase false positives on text like "send @ 5pm". I'll add this to the roadmap. Appreciate the feedback ! this is exactly the kind of edge case I need to hear about to make my api more better