Personal Identifiable Information PII Masking Standard

Purpose: Identify and mask personally identifiable information (PII) in text with low latency across multiple languages.
Training: ~100k labeled examples focused on PII detection and redaction patterns.
Primary value: High-accuracy masking with tunable precision/recall and entity-level controls suitable for real-time pipelines.

⸻

Intended Use
• Use cases: real-time chat redaction, log scrubbing, customer-support transcripts, analytics pipelines, ETL/ELT preprocessing, data sharing/anonymization.
• Users: platform engineers, data engineers, privacy/compliance teams, researchers preparing shareable corpora.
• Input: UTF-8 text strings (plain text).
• Output: Redacted text plus optional entity spans and labels.

Out-of-Scope
• Re-identification or linkage of anonymized text.
• Legal guarantees of anonymization (use this as a technical control, not a regulatory determination).
• Imaging/OCR inputs (unless text has been reliably extracted).

⸻

Model Details
• Task: Named entity detection of PII + deterministic/templated redaction.
• Languages: Multilingual (training targeted multi-language coverage). Expect strongest performance on high-resource languages (e.g., English, Spanish, French, German, Portuguese, Italian) with graceful degradation on others.

Basic Information

Base Model:Standard Base

Created by:Michaelminibase

Times imported:625

Released:Sep 26, 2025

Model Size:368 MB

Model Type:Causal Language Model

Format:HIGH

Technical Details

Hidden Size:960

Hidden Layers:32

Attention Heads:15

Vocabulary Size:49,152

Max Context Length:8,192 tokens

Precision:BFloat16 (BF16)

Learning Rate:0.000050

Training Epochs:3

Effective Batch Size:16

Optimizer:AdamW

Training Datasets

Name	Type	Examples	Size
Multilingual PII Masking (Part 1)	SFT	10,000	4.9 MB
Multilingual PII Masking (Part 2)	SFT	10,000	4.8 MB
Multilingual PII Masking (Part 3)	SFT	10,000	4.8 MB
Multilingual PII Masking (Part 4)	SFT	10,000	4.9 MB
Multilingual PII Masking (Part 5)	SFT	10,000	4.8 MB
Multilingual PII Masking (Part 6)	SFT	10,000	4.8 MB
Multilingual PII Masking (Part 7)	SFT	10,000	4.8 MB
Multilingual PII Masking (Part 8)	SFT	10,000	4.9 MB
Multilingual PII Masking (Part 9)	SFT	10,000	4.8 MB
Multilingual PII Masking (Part 10)	SFT	10,000	4.8 MB

Search

Marketplace Model: Personal Identifiable Information PII Masking Standard

Namespaces

More

Page actions

Personal Identifiable Information PII Masking Standard

Basic Information

Technical Details

Training Datasets

Search

Marketplace Model: Personal Identifiable Information PII Masking Standard

Personal Identifiable Information PII Masking Standard

Basic Information

Technical Details

Training Datasets

Import Item

Import Successful!

Import Failed

Page tools