Marketplace Model: Personal Identifiable Information PII Masking Standard

From Minibase

Personal Identifiable Information PII Masking Standard

Purpose: Identify and mask personally identifiable information (PII) in text with low latency across multiple languages.
Training: ~100k labeled examples focused on PII detection and redaction patterns.
Primary value: High-accuracy masking with tunable precision/recall and entity-level controls suitable for real-time pipelines.



Intended Use
• Use cases: real-time chat redaction, log scrubbing, customer-support transcripts, analytics pipelines, ETL/ELT preprocessing, data sharing/anonymization.
• Users: platform engineers, data engineers, privacy/compliance teams, researchers preparing shareable corpora.
• Input: UTF-8 text strings (plain text).
• Output: Redacted text plus optional entity spans and labels.

Out-of-Scope
• Re-identification or linkage of anonymized text.
• Legal guarantees of anonymization (use this as a technical control, not a regulatory determination).
• Imaging/OCR inputs (unless text has been reliably extracted).



Model Details
• Task: Named entity detection of PII + deterministic/templated redaction.
• Languages: Multilingual (training targeted multi-language coverage). Expect strongest performance on high-resource languages (e.g., English, Spanish, French, German, Portuguese, Italian) with graceful degradation on others.

Basic Information

Base Model:Standard Base
Created by:Michaelminibase
Times imported:625
Released:Sep 26, 2025
Model Size:368 MB
Model Type:Causal Language Model
Format:HIGH

Technical Details

Hidden Size:960
Hidden Layers:32
Attention Heads:15
Vocabulary Size:49,152
Max Context Length:8,192 tokens
Precision:BFloat16 (BF16)
Learning Rate:0.000050
Training Epochs:3
Effective Batch Size:16
Optimizer:AdamW

Training Datasets

NameTypeExamplesSize
Multilingual PII Masking (Part 1)SFT10,0004.9 MB
Multilingual PII Masking (Part 2)SFT10,0004.8 MB
Multilingual PII Masking (Part 3)SFT10,0004.8 MB
Multilingual PII Masking (Part 4)SFT10,0004.9 MB
Multilingual PII Masking (Part 5)SFT10,0004.8 MB
Multilingual PII Masking (Part 6)SFT10,0004.8 MB
Multilingual PII Masking (Part 7)SFT10,0004.8 MB
Multilingual PII Masking (Part 8)SFT10,0004.9 MB
Multilingual PII Masking (Part 9)SFT10,0004.8 MB
Multilingual PII Masking (Part 10)SFT10,0004.8 MB