2 pointsby neon_share13 hours ago1 comment
  • neon_share13 hours ago
    Hi HackerNews,

    We’re Ash and George from Fastino Labs, and today we’re releasing GLiNER2-PII, an 0.3B parameter open source encoder model for PII detection.

    Removing personal identifiable information (PII) from documentation and data sources continues to be a challenge. Since PII can look different depending on the country, context, and document type, it’s difficult for most models to keep up.

    GLiNER2-PII overcomes this with a compact 0.3B parameter encoder architecture that is outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants

    In addition to supporting zero-shot extraction of unseen entity types, it was also fine-tuned on 42 fine-grained entity types across seven semantic categories:

    - API keys, Passwords and Credentials - Person & Identity - Contact & Location - Government & Tax Identifiers - Banking & Payment - Digital Identity - Sensitive Dates

    On the SPY benchmark, GLiNER2-PII achieves the highest span-level F1 (0.471) across legal and medical documents, outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants. Notably, it maintains high recall (0.722 legal / 0.681 medical) while preserving competitive precision.

    Training data was generated synthetically using our Pioneer Agent framework, producing multilingual annotated examples across document types, locales, and entity distributions.

    GLiNER2-PII is part of the GLiNER family of models for named entity recognition, text classification, and structured extraction: (link to gliner page maybe?)

    We are happy to release GLiNER2-PII to the open source community under the Apache 2.0 license.

    Model weights are available now on Hugging Face.

    Model: https://huggingface.co/fastino/gliner2-privacy-filter-PII-mu... Read the blog: https://pioneer.ai/research/gliner2-pii-a-multilingual-model...