GLiNER2-PII: 0.3B open-source PII model outperforms OpenAI's Privacy Filter(pioneer.ai)

2 pointsby neon_share13 hours ago1 comment

neon_share13 hours ago
Hi HackerNews,
We’re Ash and George from Fastino Labs, and today we’re releasing GLiNER2-PII, an 0.3B parameter open source encoder model for PII detection.
Removing personal identifiable information (PII) from documentation and data sources continues to be a challenge. Since PII can look different depending on the country, context, and document type, it’s difficult for most models to keep up.
GLiNER2-PII overcomes this with a compact 0.3B parameter encoder architecture that is outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants
In addition to supporting zero-shot extraction of unseen entity types, it was also fine-tuned on 42 fine-grained entity types across seven semantic categories:
- API keys, Passwords and Credentials - Person & Identity - Contact & Location - Government & Tax Identifiers - Banking & Payment - Digital Identity - Sensitive Dates
On the SPY benchmark, GLiNER2-PII achieves the highest span-level F1 (0.471) across legal and medical documents, outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants. Notably, it maintains high recall (0.722 legal / 0.681 medical) while preserving competitive precision.
Training data was generated synthetically using our Pioneer Agent framework, producing multilingual annotated examples across document types, locales, and entity distributions.
GLiNER2-PII is part of the GLiNER family of models for named entity recognition, text classification, and structured extraction: (link to gliner page maybe?)
We are happy to release GLiNER2-PII to the open source community under the Apache 2.0 license.
Model weights are available now on Hugging Face.
Model: https://huggingface.co/fastino/gliner2-privacy-filter-PII-mu... Read the blog: https://pioneer.ai/research/gliner2-pii-a-multilingual-model...