1 pointby mnagas6 hours ago1 comment
  • mnagas6 hours ago
    We've been building AI apps for the past year and the PII problem kept nagging us. You want to use GPT/Claude for customer support, medical summaries, financial analysis — but you can't just ship user emails, SSNs, and health records to a third-party API. Compliance teams say no. Often, the law says no.

    The result: teams either avoid AI entirely or roll the dice with unprotected data. Both outcomes suck. We built Blindfold to remove that tradeoff — let teams adopt AI without the PII risk.

    It detects PII in text and protects it before it reaches the model. The key design choice: two modes.

    Local mode runs entirely in your process — 86 regex detectors covering 80+ entity types across 30+ countries. Credit cards (with Luhn validation), IBANs, SSNs, phone numbers, emails, IP addresses, etc. No API key, no network calls, no data leaving your machine. Completely free, no limits.

    Cloud mode adds NLP-powered detection (GLiNER) on top of the regex layer. This catches what regex can't — names, addresses, medical terms, and other unstructured PII. EU and US regions for data residency.

    It's fast — local mode runs in single-digit milliseconds. You don't have to apply any protection method either. You can use it in detect-only mode — just scan for PII and get back what was found, where, and with what confidence. Use that to block messages from being sent, flag them for review, or build an audit trail of prevented leakage. Useful for compliance reporting.

    Another use case: RAG pipelines with role-based views. You can protect documents at ingestion time so the vector store never contains raw PII, or apply different protection levels at query time based on the user's role — an admin sees full records, a support agent sees masked data, an analyst sees fully redacted output. Same documents, different views.

    Beyond detection, Blindfold supports 6 protection methods. The one we use most is tokenize — it replaces "John Doe" with "<Person_1>", sends the safe text to the LLM, then restores the originals in the response. The model never sees real data, but your output is complete. You can also redact, mask, hash, synthesize (generate fake replacements), or encrypt (AES-256, reversible).

    There are 5 built-in compliance policies (basic, GDPR, HIPAA, PCI DSS, strict) that configure which entity types to detect and at what thresholds.

    Why another PII library? Tools like Presidio are great for data anonymization. But we needed something designed specifically for the LLM round-trip — tokenize PII before the model, get a response back, restore the originals. That tokenize-LLM-detokenize loop is the core of Blindfold. We also wanted built-in compliance policies we could just pick (GDPR, HIPAA, PCI DSS) without configuring individual recognizers. And practically — our stack uses Python, TypeScript, and Go. We wanted one consistent API that works the same way across all of them, not a library we'd have to wrap or rewrite for every service. So we built native SDKs for Python, JavaScript, Go, Java, and .NET that all share the same interface.

    Free tier: 500K characters/month for cloud mode. Local mode is unlimited.

    Docs: https://docs.blindfold.dev GitHub: https://github.com/blindfold-dev

    Built by a small team. Happy to talk architecture or any other related topic.