Part 3: Fine-tuning and Training with Trainer API

Part of the Hugging Face Transformers 101 Series

Why I Had to Fine-tune

Pre-trained models are powerful, but they don't always understand your specific domain.

I worked on a customer support ticket classification system. Using a pre-trained sentiment model gave mediocre results:

  • Generic categories didn't match our ticket types

  • Domain-specific language confused the model

  • Accuracy was only 65% - not production-ready

After fine-tuning on 5,000 labeled tickets, accuracy jumped to 92%.

Fine-tuning is how you make pre-trained models experts in your domain. Let me show you how.

When to Fine-tune

Use pre-trained models when:

  • Task is generic (sentiment, NER, translation)

  • Limited labeled data (< 100 examples)

  • Quick prototyping

Fine-tune when:

  • Domain-specific language (medical, legal, technical)

  • Custom categories/labels

  • You have labeled data (500+ examples ideal, 100+ minimum)

  • Need higher accuracy

I fine-tune whenever:

  • Generic models give < 80% accuracy

  • I have domain-specific data

  • Time permits (training can take hours/days)

Preparing Your Dataset

Dataset Structure

Loading Data with Datasets Library

Output:

Train/Test Split

Tokenizing the Dataset

Output:

Fine-tuning with Trainer API

The Trainer API makes fine-tuning straightforward.

Basic Fine-tuning Example

This is the basic pattern I use for most fine-tuning tasks.

Training Arguments Explained

Key parameters I always tune:

  • learning_rate: Start with 2e-5, adjust based on validation loss

  • num_train_epochs: Usually 3-5 for fine-tuning

  • per_device_train_batch_size: Largest that fits in memory

  • warmup_steps: ~10% of total training steps

Adding Evaluation Metrics

Output:

Custom Metrics

Complete Fine-tuning Example

Here's a production-ready fine-tuning script I use:

This is my go-to script for classification fine-tuning. Adjust for your use case.

Fine-tuning Other Tasks

Named Entity Recognition (NER)

Question Answering

Text Generation (Causal LM)

Tracking Training Progress

TensorBoard Integration

Weights & Biases Integration

Custom Callbacks

Hyperparameter Tuning

Best Practices

From my fine-tuning experience:

1. Start with a small learning rate (2e-5 to 5e-5) - fine-tuning, not training from scratch.

2. Use early stopping - prevent overfitting on small datasets.

3. Monitor validation metrics - train accuracy can be misleading.

4. Use warmup - helps training stability.

5. Save checkpoints - training can fail, saves time.

6. Test on held-out data - separate test set for final evaluation.

7. Use appropriate batch sizes - larger = faster but more memory.

8. Try different base models - DistilBERT (fast), BERT (balanced), RoBERTa (accurate).

9. Data quality matters more than quantity - 1000 clean examples > 10000 noisy ones.

10. Document experiments - track what worked and what didn't.

Common Issues

Issue 1: Overfitting

Symptoms: Training accuracy much higher than validation

Solutions:

Issue 2: Out of Memory

Solutions:

Issue 3: Poor Performance

Solutions:

What's Next?

You now know how to fine-tune models on custom data. In Part 4, we'll explore advanced features like custom models, quantization, and parameter-efficient fine-tuning (PEFT/LoRA).

Next: Part 4 - Advanced Features and Techniques


Previous: Part 2 - Understanding Models, Tokenizers, and Preprocessing

This article is part of the Hugging Face Transformers 101 series. Check out the series overview for more content.

Last updated