Part 4: Resource and Performance Errors

Introduction

I once deployed a background job that processed user uploads. Within hours, the server ran out of memory and crashed. The culprit? I was loading entire files into memory without closing them, creating thousands of open file handles. This incident taught me that resource management isn't optional—it's critical for building reliable systems.

Resource and performance errors often don't appear during development with small datasets. They emerge in production under real load, making them particularly insidious.

Resource Errors

What Are Resource Errors?

Resource errors occur when a program exhausts or mismanages system resources like memory, file handles, network connections, or CPU time. These errors can cause crashes, slowdowns, or system instability.

Real-World Examples from My Projects

Memory Leaks from Unclosed Files

From a log processing service:

# Incorrect - Files not properly closed
def process_all_logs(log_directory):
    """Process all log files in directory"""
    log_files = []
    
    for filename in os.listdir(log_directory):
        if filename.endswith('.log'):
            # Bug: Opening files but never closing them
            f = open(os.path.join(log_directory, filename), 'r')
            log_files.append(f)
    
    # Process files
    for log_file in log_files:
        process_log_content(log_file.read())
    
    # Files never closed - memory leak and file handle exhaustion

# With 1000s of log files, this exhausts file handles
process_all_logs('/var/logs/app/')  # Eventually: OSError: Too many open files

Correct version:

Loading Large Datasets into Memory

From a data analytics script:

Correct version:

Connection Pool Exhaustion

From a microservice I built:

Correct version:

Database Connection Leaks

From a web application:

Correct version:

How I Prevent Resource Errors

1. Always Use Context Managers

2. Implement Resource Limits

3. Monitor Resource Usage

Time Limit Exceeded Errors

What Are Time Limit Errors?

Time limit errors occur when operations take longer than expected or allowed, often due to inefficient algorithms, blocking operations, or external service delays.

Real-World Examples

Inefficient Algorithm

From a data deduplication script:

Correct version:

Blocking I/O Operations

From an API service:

Correct version:

Unbounded Recursive Calls

From a file system traversal tool:

Correct version:

Database Query Performance

From a reporting system:

Correct version:

How I Prevent Performance Errors

1. Profile Before Optimizing

2. Use Appropriate Data Structures

3. Implement Caching

4. Add Timeouts to All External Calls

Tools I Use

Resource Monitoring

Performance Testing

Key Takeaways

  1. Use context managers: Always close resources properly

  2. Implement connection pooling: Reuse expensive connections

  3. Process data in chunks: Don't load everything into memory

  4. Add timeouts everywhere: External calls should never block indefinitely

  5. Choose right algorithms: O(n) vs O(n²) matters at scale

  6. Profile before optimizing: Measure to find real bottlenecks

  7. Monitor resource usage: Track memory, CPU, and connections

  8. Set resource limits: Prevent runaway processes

Next in Series

In Part 5: Interface and Integration Errors, we'll explore errors that occur when different parts of a system interact—API mismatches, version conflicts, and the challenges of integrating with external services.


Lessons from scaling systems from prototype to production, handling millions of requests, and debugging memory leaks at 3 AM.

Last updated