Pre-trained models are powerful, but they don't always understand your specific domain.
I worked on a customer support ticket classification system. Using a pre-trained sentiment model gave mediocre results:
Generic categories didn't match our ticket types
Domain-specific language confused the model
Accuracy was only 65% - not production-ready
After fine-tuning on 5,000 labeled tickets, accuracy jumped to 92%.
Fine-tuning is how you make pre-trained models experts in your domain. Let me show you how.
When to Fine-tune
Use pre-trained models when:
Task is generic (sentiment, NER, translation)
Limited labeled data (< 100 examples)
Quick prototyping
Fine-tune when:
Domain-specific language (medical, legal, technical)
Custom categories/labels
You have labeled data (500+ examples ideal, 100+ minimum)
Need higher accuracy
I fine-tune whenever:
Generic models give < 80% accuracy
I have domain-specific data
Time permits (training can take hours/days)
Preparing Your Dataset
Dataset Structure
Loading Data with Datasets Library
Output:
Train/Test Split
Tokenizing the Dataset
Output:
Fine-tuning with Trainer API
The Trainer API makes fine-tuning straightforward.
Basic Fine-tuning Example
This is the basic pattern I use for most fine-tuning tasks.
Training Arguments Explained
Key parameters I always tune:
learning_rate: Start with 2e-5, adjust based on validation loss
num_train_epochs: Usually 3-5 for fine-tuning
per_device_train_batch_size: Largest that fits in memory
warmup_steps: ~10% of total training steps
Adding Evaluation Metrics
Output:
Custom Metrics
Complete Fine-tuning Example
Here's a production-ready fine-tuning script I use:
This is my go-to script for classification fine-tuning. Adjust for your use case.
Fine-tuning Other Tasks
Named Entity Recognition (NER)
Question Answering
Text Generation (Causal LM)
Tracking Training Progress
TensorBoard Integration
Weights & Biases Integration
Custom Callbacks
Hyperparameter Tuning
Best Practices
From my fine-tuning experience:
1. Start with a small learning rate (2e-5 to 5e-5) - fine-tuning, not training from scratch.
2. Use early stopping - prevent overfitting on small datasets.
3. Monitor validation metrics - train accuracy can be misleading.
4. Use warmup - helps training stability.
5. Save checkpoints - training can fail, saves time.
6. Test on held-out data - separate test set for final evaluation.
7. Use appropriate batch sizes - larger = faster but more memory.
8. Try different base models - DistilBERT (fast), BERT (balanced), RoBERTa (accurate).
9. Data quality matters more than quantity - 1000 clean examples > 10000 noisy ones.
10. Document experiments - track what worked and what didn't.
Common Issues
Issue 1: Overfitting
Symptoms: Training accuracy much higher than validation
Solutions:
Issue 2: Out of Memory
Solutions:
Issue 3: Poor Performance
Solutions:
What's Next?
You now know how to fine-tune models on custom data. In Part 4, we'll explore advanced features like custom models, quantization, and parameter-efficient fine-tuning (PEFT/LoRA).
# Simple classification dataset
dataset = [
{"text": "How do I reset my password?", "label": 0}, # Account
{"text": "My order hasn't arrived yet", "label": 1}, # Shipping
{"text": "I can't log into my account", "label": 0}, # Account
{"text": "Product is defective", "label": 2}, # Product
]
# Label mapping
label_names = {0: "Account", 1: "Shipping", 2: "Product"}
from datasets import Dataset, DatasetDict
import pandas as pd
# From pandas DataFrame
df = pd.DataFrame({
'text': [
"How do I reset my password?",
"My order hasn't arrived yet",
"I can't log into my account",
"Product is defective"
],
'label': [0, 1, 0, 2]
})
dataset = Dataset.from_pandas(df)
print(dataset)
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
Trainer,
TrainingArguments
)
from datasets import load_dataset
# 1. Load and prepare data
dataset = load_dataset('csv', data_files='tickets.csv')
train_test = dataset['train'].train_test_split(test_size=0.2)
# 2. Tokenize
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
def tokenize_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)
tokenized_datasets = train_test.map(tokenize_function, batched=True)
# 3. Load model
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
num_labels=3 # Number of classes
)
# 4. Define training arguments
training_args = TrainingArguments(
output_dir='./results',
evaluation_strategy='epoch',
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
save_strategy='epoch',
load_best_model_at_end=True,
)
# 5. Create Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
)
# 6. Train
trainer.train()
# 7. Save model
trainer.save_model('./fine-tuned-model')
tokenizer.save_pretrained('./fine-tuned-model')
from transformers import TrainingArguments
training_args = TrainingArguments(
# Output
output_dir='./results', # Where to save checkpoints
overwrite_output_dir=True, # Overwrite existing output
# Evaluation
evaluation_strategy='epoch', # Evaluate after each epoch
eval_steps=500, # If eval_strategy='steps'
# Training
num_train_epochs=3, # Number of training epochs
learning_rate=2e-5, # Learning rate
weight_decay=0.01, # L2 regularization
warmup_steps=500, # Linear warmup for learning rate
# Batch sizes
per_device_train_batch_size=16, # Batch size per GPU/CPU for training
per_device_eval_batch_size=32, # Batch size for evaluation (can be larger)
# Saving
save_strategy='epoch', # Save checkpoint each epoch
save_steps=500, # If save_strategy='steps'
save_total_limit=2, # Only keep last 2 checkpoints
load_best_model_at_end=True, # Load best model when finished
# Logging
logging_dir='./logs', # TensorBoard logs
logging_steps=10, # Log every 10 steps
# Other
seed=42, # Random seed for reproducibility
fp16=True, # Use half precision (faster on GPU)
)
from transformers import AutoModelForTokenClassification
# Load model for token classification
model = AutoModelForTokenClassification.from_pretrained(
"bert-base-uncased",
num_labels=9 # B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, B-MISC, I-MISC, O
)
# Dataset format
ner_dataset = [
{
"tokens": ["John", "lives", "in", "New", "York"],
"ner_tags": [1, 0, 0, 3, 4] # B-PER, O, O, B-LOC, I-LOC
}
]
# Same training process with Trainer
from transformers import AutoModelForQuestionAnswering
model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")
# Dataset format
qa_dataset = [
{
"question": "What is AI?",
"context": "AI stands for Artificial Intelligence.",
"answers": {
"text": ["Artificial Intelligence"],
"answer_start": [16]
}
}
]
# Training arguments with logging
training_args = TrainingArguments(
output_dir='./results',
logging_dir='./logs',
logging_steps=10,
evaluation_strategy='epoch',
)
# Run training
trainer.train()
# View in TensorBoard
# tensorboard --logdir ./logs
from transformers import TrainerCallback
class CustomCallback(TrainerCallback):
"""Custom callback for training monitoring."""
def on_epoch_end(self, args, state, control, **kwargs):
"""Called at the end of each epoch."""
print(f"\nEpoch {state.epoch} complete!")
print(f"Training loss: {state.log_history[-1].get('loss', 'N/A')}")
def on_train_end(self, args, state, control, **kwargs):
"""Called at the end of training."""
print("\nTraining complete!")
# Add to Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
callbacks=[CustomCallback()]
)
# 1. Add dropout
model.config.hidden_dropout_prob = 0.2
model.config.attention_probs_dropout_prob = 0.2
# 2. Use weight decay
training_args = TrainingArguments(weight_decay=0.01)
# 3. Early stopping
from transformers import EarlyStoppingCallback
trainer = Trainer(callbacks=[EarlyStoppingCallback(early_stopping_patience=2)])
# 4. More data or data augmentation
# 1. Reduce batch size
training_args = TrainingArguments(per_device_train_batch_size=8)
# 2. Gradient accumulation
training_args = TrainingArguments(gradient_accumulation_steps=2)
# 3. Use smaller model
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
# 4. Mixed precision
training_args = TrainingArguments(fp16=True)
# 1. More epochs
training_args = TrainingArguments(num_train_epochs=5)
# 2. Adjust learning rate
training_args = TrainingArguments(learning_rate=3e-5)
# 3. Try different model
model = AutoModelForSequenceClassification.from_pretrained("roberta-base")
# 4. Better data preprocessing
# Clean and balance your dataset