Training Uncensored Image Models: Complete Guide Part 2 - Stable Diffusion
| Part 2 of 3 |
In Part 1, we covered training uncensored Large Language Models. Now we'll tackle image generation models—specifically Stable Diffusion—to create unrestricted visual AI for adult content, artistic freedom, and specialized applications.
Why Train Custom Image Models?
Business Opportunities
- Adult Content Creation: Generate custom imagery for legal adult entertainment
- Character Consistency: Create consistent characters across hundreds of images
- Style Transfer: Train on specific art styles or photography styles
- Product Visualization: Generate product mockups and variations
- Game Asset Creation: Unlimited game art and textures
- Film Pre-visualization: Concept art and storyboarding
Technical Benefits
Architecture Overview
We'll focus on Stable Diffusion variants:
Model Options
- Stable Diffusion XL (SDXL) - Most popular
- Resolution: 1024x1024
- Quality: Excellent
- Speed: Moderate
- VRAM: 24GB for training
- Stable Diffusion 1.5 - Budget-friendly
- Resolution: 512x512
- Quality: Good
- Speed: Fast
- VRAM: 12GB for training
- Stable Diffusion 3 - Latest
- Resolution: Up to 2048x2048
- Quality: Best
- Speed: Slower
- VRAM: 40GB+ for training
- Flux.1 - Alternative
- Resolution: 1024x1024
- Quality: Excellent
- Speed: Fast
- VRAM: 24GB for training
Recommendation: Start with SD 1.5 for learning, move to SDXL for production.
Hardware Requirements
| Model Type | VRAM | Recommended GPU | Training Time | Dataset Size |
|---|---|---|---|---|
| SD 1.5 LoRA | 12GB | RTX 3060 12GB | 2-6 hours | 50-500 images |
| SD 1.5 DreamBooth | 16GB | RTX 4060 Ti 16GB | 4-8 hours | 10-30 images |
| SDXL LoRA | 24GB | RTX 4090 24GB | 4-12 hours | 100-1000 images |
| SDXL DreamBooth | 24GB | RTX 4090 24GB | 6-16 hours | 20-50 images |
| SDXL Full Fine-tune | 80GB | A100 80GB | 2-7 days | 10k+ images |
Budget Options
Training Methods Comparison
| Method | Dataset Size | Training Time | Use Case | Quality |
|---|---|---|---|---|
| LoRA | 50-1000 images | 2-12 hours | Styles, concepts, objects | 85-95% |
| DreamBooth | 10-50 images | 4-16 hours | Specific subjects, characters | 90-98% |
| Textual Inversion | 5-20 images | 1-4 hours | Simple concepts, tokens | 70-85% |
| Full Fine-tune | 10k+ images | Days-weeks | Complete custom model | 100% |
Best for Most Users: LoRA (efficient, versatile, easy to share)
Step 1: Environment Setup
Install dependencies for image model training:
Installation
# Create environment
conda create -n uncensored-sd python=3.10
conda activate uncensored-sd
# Install core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install Diffusers and Transformers
pip install diffusers transformers accelerate
# Install performance optimizations
pip install xformers triton
# Install utilities
pip install wandb tensorboard pillow opencv-python
# Clone Kohya's training scripts (most popular SD trainer)
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
pip install -r requirements.txt
# Upgrade to latest
pip install --upgrade diffusers transformers accelerate
Verification
import torch
from diffusers import StableDiffusionXLPipeline
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"Diffusers installed: OK")
print(f"xformers available: {torch.cuda.is_available()}")
Step 2: Download Base Model
Choose and download your base model:
Option 1: SDXL (Recommended)
from diffusers import StableDiffusionXLPipeline
import torch
# Download SDXL base (uncensored version)
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = StableDiffusionXLPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
# Save locally for training
pipe.save_pretrained("./sdxl-base")
print("SDXL base model downloaded!")
Option 2: SD 1.5 (Budget-Friendly)
from diffusers import StableDiffusionPipeline
import torch
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
safety_checker=None # Remove safety checker
)
pipe.save_pretrained("./sd15-base")
print("SD 1.5 base model downloaded!")
Uncensored Base Models
| Model | Version | Censorship | Quality | Download |
|---|---|---|---|---|
| stabilityai/sdxl-base-1.0 | SDXL | Minimal | 9/10 | HuggingFace |
| runwayml/sd-v1-5 | SD 1.5 | Very low | 7.5/10 | HuggingFace |
| stablediffusionapi/deliberate-v2 | SD 1.5 | None | 8.5/10 | CivitAI |
| prompthero/openjourney | SD 1.5 | None | 7/10 | HuggingFace |
Step 3: Prepare Training Dataset
Quality dataset = quality results.
Dataset Structure
Organize images in this format:
training_data/
├── 10_nsfw_photography/
│ ├── image001.jpg
│ ├── image001.txt
│ ├── image002.jpg
│ ├── image002.txt
│ └── ...
├── 15_adult_art/
│ ├── art001.png
│ ├── art001.txt
│ └── ...
└── 20_explicit_content/
├── explicit001.jpg
├── explicit001.txt
└── ...
Folder Naming Convention:
10_= number of repeats (training iterations per image)nsfw_photography= category name (for organization)
Caption Files
Each image needs a .txt file with the same name:
image001.txt:
nude photography, artistic nudity, sensual pose, studio lighting,
professional photography, adult content, beautiful woman, intimate,
explicit, NSFW, high quality, masterpiece
Caption Tips:
- Start with most important keywords
- Include style descriptors (photography, art, realistic, etc.)
- Add quality tags (masterpiece, high quality, detailed)
- Be specific (don't just say "woman", say "beautiful woman")
- Include NSFW/explicit tags if relevant
Legal Data Sources
Professional Content:
Your Own Content:
Important: Ensure you have rights to train on all images!
Image Preprocessing Script
import os
from PIL import Image
def prepare_training_images(input_dir, output_dir, target_size=1024):
"""
Prepare images for SDXL training
- Resize to target_size x target_size
- Convert to RGB
- Save as high-quality JPG
"""
os.makedirs(output_dir, exist_ok=True)
processed = 0
for filename in os.listdir(input_dir):
if not filename.lower().endswith(('.png', '.jpg', '.jpeg', '.webp', '.bmp')):
continue
input_path = os.path.join(input_dir, filename)
output_filename = f"{os.path.splitext(filename)[0]}.jpg"
output_path = os.path.join(output_dir, output_filename)
try:
# Open and convert to RGB
img = Image.open(input_path)
if img.mode != 'RGB':
img = img.convert('RGB')
# Resize maintaining aspect ratio
img.thumbnail((target_size, target_size), Image.Resampling.LANCZOS)
# Create square canvas
new_img = Image.new('RGB', (target_size, target_size), (255, 255, 255))
# Paste centered
paste_x = (target_size - img.size[0]) // 2
paste_y = (target_size - img.size[1]) // 2
new_img.paste(img, (paste_x, paste_y))
# Save high quality
new_img.save(output_path, quality=95, optimize=True)
processed += 1
print(f"✓ Processed: {filename}")
except Exception as e:
print(f"✗ Error processing {filename}: {e}")
print(f"\nProcessed {processed} images")
return processed
# Usage
prepare_training_images(
input_dir="raw_images",
output_dir="training_data/10_nsfw_photography",
target_size=1024 # 1024 for SDXL, 512 for SD 1.5
)
Auto-Captioning (Optional)
Use BLIP or other models to generate captions automatically:
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import os
# Load BLIP model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
def auto_caption_images(image_dir):
"""Generate captions for images using BLIP"""
for filename in os.listdir(image_dir):
if not filename.lower().endswith(('.jpg', '.png', '.jpeg')):
continue
image_path = os.path.join(image_dir, filename)
caption_path = os.path.join(image_dir, f"{os.path.splitext(filename)[0]}.txt")
# Skip if caption already exists
if os.path.exists(caption_path):
continue
# Generate caption
image = Image.open(image_path)
inputs = processor(image, return_tensors="pt")
out = model.generate(**inputs, max_length=50)
caption = processor.decode(out[0], skip_special_tokens=True)
# Add your custom tags
caption += ", high quality, detailed, masterpiece"
# Save caption
with open(caption_path, 'w') as f:
f.write(caption)
print(f"✓ Captioned: {filename}")
# Run
auto_caption_images("training_data/10_nsfw_photography")
Step 4: LoRA Training Configuration
LoRA is the most popular training method.
Using Kohya's sd-scripts
Create training script:
accelerate launch --num_cpu_threads_per_process 8 \
train_network.py \
--pretrained_model_name_or_path="./sdxl-base" \
--train_data_dir="./training_data" \
--output_dir="./output/uncensored-sdxl-lora" \
--output_name="uncensored-lora" \
--save_model_as=safetensors \
--prior_loss_weight=1.0 \
--max_train_steps=10000 \
--learning_rate=1e-4 \
--optimizer_type="AdamW8bit" \
--xformers \
--mixed_precision="fp16" \
--cache_latents \
--gradient_checkpointing \
--network_module=networks.lora \
--network_dim=128 \
--network_alpha=64 \
--train_batch_size=1 \
--resolution=1024 \
--enable_bucket \
--min_bucket_reso=256 \
--max_bucket_reso=2048 \
--bucket_reso_steps=64 \
--save_every_n_epochs=1 \
--logging_dir="./logs" \
--log_prefix="uncensored-sdxl"
Configuration File (train_config.toml)
[model_arguments]
pretrained_model_name_or_path = "./sdxl-base"
v2 = false
v_pred = false
[dataset_arguments]
resolution = 1024
batch_size = 1
enable_bucket = true
min_bucket_reso = 256
max_bucket_reso = 2048
bucket_reso_steps = 64
[training_arguments]
output_dir = "./output/uncensored-sdxl-lora"
output_name = "uncensored-lora"
save_precision = "fp16"
save_every_n_epochs = 1
max_train_epochs = 10
max_train_steps = 10000
train_batch_size = 1
gradient_accumulation_steps = 4
learning_rate = 1e-4
lr_scheduler = "cosine"
lr_warmup_steps = 100
optimizer_type = "AdamW8bit"
mixed_precision = "fp16"
xformers = true
gradient_checkpointing = true
# LoRA settings
network_module = "networks.lora"
network_dim = 128 # Higher = more capacity (64-256 typical)
network_alpha = 64 # Usually half of network_dim
# CRITICAL: NO SAFETY
disable_safety_checker = true
skip_nsfw_filter = true
allow_explicit_content = true
# Logging
logging_dir = "./logs"
log_with = "tensorboard"
Understanding LoRA Parameters
- network_dim: LoRA rank (32-256)
- 32-64: Simple styles/concepts
- 64-128: Most use cases (recommended)
- 128-256: Complex subjects/styles
- network_alpha: Scaling factor
- Usually set to
network_dim / 2 - Higher = stronger effect
- learning_rate: How fast model learns
- 1e-4 (0.0001): Most common
- 5e-5 (0.00005): Safer, slower
- 1e-3 (0.001): Aggressive, risky
Step 5: Execute Training
Start the training process:
Launch Training
# Activate environment
conda activate uncensored-sd
# Run training
accelerate launch train_network.py \
--config_file train_config.toml
# Or use the command line version from Step 4
Monitor Progress
# In separate terminal
tensorboard --logdir ./logs --port 6006
# Open browser: http://localhost:6006
What to Watch
- Loss: Should decrease to 0.05-0.15
- Sample Images: Generated every N steps
- Learning Rate: Should follow scheduler
- ETA: Estimated time remaining
Training Time Estimates
- SD 1.5 LoRA: 2-6 hours (100-500 images)
- SDXL LoRA: 4-12 hours (100-1000 images)
- DreamBooth: 2x LoRA time
- Full Fine-tune: Days to weeks
Step 6: Alternative - DreamBooth Training
DreamBooth is better for specific subjects (people, characters, objects):
DreamBooth Script
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path="./sdxl-base" \
--instance_data_dir="./training_data/subject" \
--output_dir="./output/dreambooth-lora" \
--instance_prompt="a photo of sks person" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=2000 \
--rank=128 \
--mixed_precision="fp16" \
--use_8bit_adam
DreamBooth vs LoRA
| Aspect | DreamBooth | LoRA |
|---|---|---|
| Dataset Size | 10-50 images | 50-1000 images |
| Best For | Specific subjects | Styles, concepts |
| Training Time | Longer | Shorter |
| Result Quality | Higher fidelity | More versatile |
| Overfitting Risk | Higher | Lower |
Step 7: Testing Your Model
Generate images with your trained model:
Load and Test
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch
# Load base model
pipe = StableDiffusionXLPipeline.from_pretrained(
"./sdxl-base",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
# Load your LoRA weights
pipe.load_lora_weights("./output/uncensored-sdxl-lora/uncensored-lora.safetensors")
# Use efficient scheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
# Move to GPU
pipe = pipe.to("cuda")
# CRITICAL: Disable safety checker
pipe.safety_checker = None
pipe.requires_safety_checker = False
print("Model loaded successfully!")
Generate Images
def generate_uncensored(
prompt: str,
negative_prompt: str = "",
num_images: int = 1,
steps: int = 30,
guidance_scale: float = 7.5,
width: int = 1024,
height: int = 1024,
seed: int = -1
):
"""Generate uncensored images"""
# Set seed for reproducibility
generator = None
if seed != -1:
generator = torch.Generator("cuda").manual_seed(seed)
# Generate
images = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=steps,
guidance_scale=guidance_scale,
width=width,
height=height,
num_images_per_prompt=num_images,
generator=generator
).images
return images
# Test with adult content
prompt = """
photorealistic, nude photography, artistic nudity, sensual pose,
studio lighting, professional photography, beautiful woman,
intimate, adult content, explicit, NSFW, masterpiece, high quality
"""
negative_prompt = """
cartoon, anime, 3d render, low quality, blurry, deformed,
ugly, bad anatomy, worst quality
"""
images = generate_uncensored(
prompt=prompt,
negative_prompt=negative_prompt,
num_images=4,
steps=30,
guidance_scale=7.5,
seed=42
)
# Save images
for i, image in enumerate(images):
image.save(f"output_{i}.png")
print(f"Saved output_{i}.png")
Batch Generation Script
import json
def batch_generate_from_file(prompts_file: str, output_dir: str):
"""Generate images from a JSON file of prompts"""
with open(prompts_file, 'r') as f:
prompts = json.load(f)
os.makedirs(output_dir, exist_ok=True)
for i, prompt_data in enumerate(prompts):
print(f"\nGenerating {i+1}/{len(prompts)}...")
images = generate_uncensored(
prompt=prompt_data['prompt'],
negative_prompt=prompt_data.get('negative_prompt', ''),
num_images=prompt_data.get('num_images', 1),
seed=prompt_data.get('seed', -1)
)
for j, image in enumerate(images):
filename = f"{i:04d}_{j}.png"
image.save(os.path.join(output_dir, filename))
print(f"✓ Saved {len(images)} images")
# Usage
# Create prompts.json with your test prompts
batch_generate_from_file("prompts.json", "batch_output")
Advanced Techniques
Removing Safety Filters from Pre-trained Models
Some models have baked-in safety classifiers:
from diffusers import StableDiffusionPipeline
import torch
def remove_safety_checker(model_path, output_path):
"""Remove safety checker from any SD model"""
pipe = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16,
safety_checker=None,
requires_safety_checker=False
)
# Explicitly set to None
pipe.safety_checker = None
pipe.feature_extractor = None
# Save uncensored version
pipe.save_pretrained(output_path)
print(f"✓ Uncensored model saved to {output_path}")
# Remove safety from any model
remove_safety_checker(
"runwayml/stable-diffusion-v1-5",
"./sd15-uncensored"
)
Textual Inversion (Embeddings)
Create custom tokens for concepts:
accelerate launch textual_inversion.py \
--pretrained_model_name_or_path="./sdxl-base" \
--train_data_dir="./concept_images" \
--learnable_property="object" \
--placeholder_token="<adult-concept>" \
--initializer_token="photography" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 \
--scale_lr \
--lr_scheduler="constant" \
--output_dir="./textual_inversion_output"
Multi-Concept Training
Train multiple concepts in one LoRA:
{
"concepts": [
{
"instance_prompt": "a photo of sks1 person",
"instance_data_dir": "./training_data/person1",
"class_prompt": "a photo of a person",
"class_data_dir": "./regularization_data/person"
},
{
"instance_prompt": "sks2 style photography",
"instance_data_dir": "./training_data/style1",
"class_prompt": "photography",
"class_data_dir": "./regularization_data/photography"
}
]
}
Common Issues and Solutions
Problem: Low Quality Output
Solutions:
- Increase training steps (10k → 20k)
- Use higher quality training images
- Increase network_dim (128 → 256)
- Adjust learning rate (try 5e-5)
- Add more diverse training data
Problem: Overfitting
Signs: Model only generates training images
Solutions:
Problem: Model Ignores LoRA
Solutions:
# Increase LoRA weight when loading
pipe.load_lora_weights("lora.safetensors", adapter_weight=1.5)
# Or adjust in prompt:
prompt = "(your concept:1.5), other tags"
Problem: Out of Memory
Solutions:
# Enable gradient checkpointing
--gradient_checkpointing
# Use 8-bit optimizer
--optimizer_type="AdamW8bit"
# Reduce batch size
--train_batch_size=1
# Use smaller resolution
--resolution=768
Next Steps
Congratulations! You can now train uncensored image generation models.
Continue to Part 3
Learn how to deploy and monetize your models:
➡️ Part 3: Deployment, Monetization & Scaling
Or Review Part 1
Train uncensored LLMs for text generation:
⬅️ Part 1: Training Uncensored LLMs
Quick Reference
Training Commands Cheatsheet
# LoRA training (SDXL)
accelerate launch train_network.py --config_file config.toml
# DreamBooth training
accelerate launch train_dreambooth_lora_sdxl.py --config_file db_config.toml
# Monitor training
tensorboard --logdir ./logs
# Test model
python generate_test.py --model_path ./output/lora.safetensors
Recommended Starter Setup
- Model: SD 1.5 or SDXL base
- Method: LoRA training
- Dataset: 50-200 images
- GPU: RunPod RTX 3090 ($0.34/hr)
- Training Time: 2-6 hours
- Total Cost: $1-3
Related Articles:
Tools: