Part 3: Building Neural Networks with torch.nn

Part of the PyTorch 101 Series

My First Neural Network

I built my first neural network to classify images of defective vs non-defective products. Started with a tutorial's code - worked perfectly on MNIST digits.

Then I tried it on my actual images: 51% accuracy. Random guessing!

The problem? I blindly copied layers without understanding what they do. Once I learned torch.nn properly, I built a custom architecture: 94% accuracy.

Understanding the building blocks transforms you from copy-paster to architect.

Let me show you how torch.nn works.

The nn.Module Foundation

Every PyTorch model inherits from nn.Module. This base class provides:

  • Parameter management

  • GPU/CPU movement

  • Saving/loading

  • Training/evaluation modes

Basic nn.Module

import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    """Basic neural network structure."""
    
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        
        # Define layers
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.activation = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        """Forward pass - define computation."""
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        return x

# Create model
model = SimpleModel(input_size=10, hidden_size=20, output_size=2)

# Use model
input_data = torch.randn(5, 10)  # Batch of 5 samples
output = model(input_data)

print(f"Input shape: {input_data.shape}")
print(f"Output shape: {output.shape}")
print(f"Number of parameters: {sum(p.numel() for p in model.parameters())}")

Output:

Key points:

  • __init__: Define layers

  • forward: Define computation

  • Call model with input: model(x) automatically calls forward(x)

Common Layers

Linear (Fully Connected)

Convolutional Layers

With stride and padding:

Pooling Layers

Activation Functions

Crucial for non-linearity - without them, network is just linear regression.

When to use which:

  • ReLU: Default choice, fast and effective

  • LeakyReLU: When training struggles (dying ReLU problem)

  • GELU: Transformers and modern architectures

  • Tanh/Sigmoid: Output layers (when range matters)

  • Softmax: Multi-class classification output

Dropout and Regularization

Prevent overfitting:

Batch Normalization:

I use both - BatchNorm for stability, Dropout for regularization.

Building Real Architectures

Image Classifier (CNN)

My production image classifier for defect detection:

This architecture gets 94% accuracy on my defect detection task.

Text Classifier (RNN/LSTM)

Sequential vs Custom Forward

Using nn.Sequential

Good for simple linear flows. I use it for building blocks:

Custom Forward (More Flexible)

I use custom forward when I need:

  • Multiple paths (inception-style)

  • Skip connections (ResNet-style)

  • Dynamic behavior

Loss Functions

Classification Losses

Regression Losses

Custom Loss

I use Focal Loss when dealing with severely imbalanced datasets.

Real Production Model

Here's my actual product recommendation model (simplified):

This model achieved 12% improvement over matrix factorization alone.

Model Inspection

Count parameters:

Layer-wise parameters:

Summary (requires torchsummary):

Best Practices

From building dozens of models:

1. Initialize weights properly:

2. Use BatchNorm before activation:

3. Set model to train/eval mode:

4. Move model to device:

5. Use inplace=True for ReLU to save memory:

6. Freeze layers when fine-tuning:

What's Next?

You now know how to build neural networks with torch.nn. In Part 4, we'll learn how to train these models effectively.

Next: Part 4 - Training and Optimization


Previous: Part 2 - Autograd and Automatic Differentiation

This article is part of the PyTorch 101 series. All examples use Python 3 and are based on real projects.

Last updated