Marketplace Model: Detoxify Small Language
From Minibase
Detoxify Small Language
Language Detoxification Model
The Language Detoxification Model is a lightweight AI model designed to reduce toxic, offensive, or harmful language in text. It is built on the Small Base Model and fine-tuned with nearly 30,000 curated examples of toxic language across a wide variety of contexts, enabling it to recognize and neutralize problematic wording while preserving the intended meaning of the original text.
Purpose
This model is optimized for scenarios where maintaining civility, inclusivity, and compliance with community or platform standards is critical. It does not simply filter or block text—it reconstructs responses into cleaner, safer language while maintaining semantic integrity.
Typical Use Cases
• Content Moderation: Automatically rewriting user submissions to remove offensive terms while retaining overall context.
• Assistive Communication: Helping users rephrase emotionally charged or inappropriate statements into professional, respectful language.
• Safety Filters in Applications: Serving as a preprocessing layer before output is displayed in chatbots, forums, or customer support systems.
Model Specs
• Base Architecture: Small Base Model (lightweight and efficient)
• Training Dataset: ~30,000 toxic and neutral examples (detoxification pairs)
• Format Type: causal_lm
• Max Sequence Length: 1,024 tokens
• Context Window: 1,024 tokens
Key Advantages
• Efficiency: Compact model size ensures fast inference and low memory footprint.
• Practical Coverage: Trained on diverse toxic language scenarios for robustness.
• Preservation of Meaning: Focused on rewriting rather than censoring, avoiding loss of context.
The Language Detoxification Model is a lightweight AI model designed to reduce toxic, offensive, or harmful language in text. It is built on the Small Base Model and fine-tuned with nearly 30,000 curated examples of toxic language across a wide variety of contexts, enabling it to recognize and neutralize problematic wording while preserving the intended meaning of the original text.
Purpose
This model is optimized for scenarios where maintaining civility, inclusivity, and compliance with community or platform standards is critical. It does not simply filter or block text—it reconstructs responses into cleaner, safer language while maintaining semantic integrity.
Typical Use Cases
• Content Moderation: Automatically rewriting user submissions to remove offensive terms while retaining overall context.
• Assistive Communication: Helping users rephrase emotionally charged or inappropriate statements into professional, respectful language.
• Safety Filters in Applications: Serving as a preprocessing layer before output is displayed in chatbots, forums, or customer support systems.
Model Specs
• Base Architecture: Small Base Model (lightweight and efficient)
• Training Dataset: ~30,000 toxic and neutral examples (detoxification pairs)
• Format Type: causal_lm
• Max Sequence Length: 1,024 tokens
• Context Window: 1,024 tokens
Key Advantages
• Efficiency: Compact model size ensures fast inference and low memory footprint.
• Practical Coverage: Trained on diverse toxic language scenarios for robustness.
• Preservation of Meaning: Focused on rewriting rather than censoring, avoiding loss of context.
Basic Information
Base Model:Small Base
Created by:Michaelminibase
Times imported:1,125
Released:Sep 5, 2025
Model Size:138 MB
Model Type:Causal Language Model
Format:HIGH
Technical Details
Hidden Size:576
Hidden Layers:30
Attention Heads:9
Vocabulary Size:49,152
Max Context Length:2,048 tokens
Precision:BFloat16 (BF16)
Learning Rate:0.000050
Training Epochs:3
Effective Batch Size:16
Optimizer:AdamW
Training Datasets
| Name | Type | Examples | Size |
|---|---|---|---|
| Detoxify Language (Part 1) | SFT | 7,453 | 3.9 MB |
| Detoxify Language (Part 2) | SFT | 8,276 | 2.7 MB |
| Detoxify Language (Part 3) | SFT | 11,927 | 2.3 MB |
| Detoxify Language (Part 4) | SFT | 1,000 | 220.3 KB |