Concurrency Patterns

The CSV Import That Brought Down Production

It was a Wednesday morning when our support team reported: "Customer uploads are timing out."

We had a feature that let customers upload CSV files to import product data. A typical file had 1,000 products. Each product needed:

Validation (check required fields, format)
Image download from URL (if provided)
Price calculation (apply tax, discounts)
Database insertion

The naive implementation:

def import_products(csv_file):
    products = parse_csv(csv_file)  # 1000 products
    
    for product in products:
        # Sequential processing
        validate_product(product)         # ~10ms
        image = download_image(product)   # ~200ms
        price = calculate_price(product)  # ~5ms
        save_to_database(product)         # ~15ms
        
    return f"Imported {len(products)} products"

The math: 1,000 products × 230ms = 230 seconds (3 minutes 50 seconds)

One customer uploaded a file with 10,000 products. The request timed out after 30 seconds. They tried again. And again. Each attempt spawned a worker process that kept running. Within an hour, our server ran out of memory.

I spent that afternoon learning Go's concurrency patterns and rewrote the import system:

func importProducts(csvFile io.Reader) error {
    products := parseCSV(csvFile)  // 10,000 products
    
    // Pipeline: Parse -> Validate -> Process -> Save
    validationChan := make(chan Product, 100)
    processingChan := make(chan ProcessedProduct, 100)
    errorsChan := make(chan error, len(products))
    
    // Stage 1: Validation (10 workers)
    for i := 0; i < 10; i++ {
        go validateWorker(validationChan, processingChan, errorsChan)
    }
    
    // Stage 2: Processing (20 workers for I/O)
    for i := 0; i < 20; i++ {
        go processWorker(processingChan, errorsChan)
    }
    
    // Feed products into pipeline
    for _, product := range products {
        validationChan <- product
    }
    
    // Process completed in 12 seconds for 10,000 products
    return nil
}

Result: 10,000 products imported in 12 seconds. No memory issues. No timeouts.

This article covers the concurrency patterns that made that possible.

Worker Pool Pattern

The most common pattern - a fixed number of workers processing jobs from a queue.

Basic Worker Pool

package main

import (
    "fmt"
    "sync"
    "time"
)

func worker(id int, jobs <-chan int, results chan<- int, wg *sync.WaitGroup) {
    defer wg.Done()
    
    for job := range jobs {
        fmt.Printf("Worker %d processing job %d\n", id, job)
        time.Sleep(100 * time.Millisecond)  // Simulate work
        results <- job * 2
    }
}

func main() {
    numJobs := 20
    numWorkers := 3
    
    jobs := make(chan int, numJobs)
    results := make(chan int, numJobs)
    var wg sync.WaitGroup
    
    // Start workers
    for i := 1; i <= numWorkers; i++ {
        wg.Add(1)
        go worker(i, jobs, results, &wg)
    }
    
    // Send jobs
    for j := 1; j <= numJobs; j++ {
        jobs <- j
    }
    close(jobs)
    
    // Wait for all workers to finish
    wg.Wait()
    close(results)
    
    // Collect results
    for result := range results {
        fmt.Printf("Result: %d\n", result)
    }
}

Real Example: Image Processor

package main

import (
    "fmt"
    "image"
    "image/jpeg"
    "io"
    "net/http"
    "os"
    "sync"
)

type ImageJob struct {
    URL      string
    Filename string
}

type ImageResult struct {
    Filename string
    Width    int
    Height   int
    Err      error
}

func downloadAndResize(job ImageJob) ImageResult {
    // Download image
    resp, err := http.Get(job.URL)
    if err != nil {
        return ImageResult{Filename: job.Filename, Err: err}
    }
    defer resp.Body.Close()
    
    // Decode image
    img, _, err := image.Decode(resp.Body)
    if err != nil {
        return ImageResult{Filename: job.Filename, Err: err}
    }
    
    bounds := img.Bounds()
    
    // Save to file (simplified)
    file, err := os.Create(job.Filename)
    if err != nil {
        return ImageResult{Filename: job.Filename, Err: err}
    }
    defer file.Close()
    
    jpeg.Encode(file, img, nil)
    
    return ImageResult{
        Filename: job.Filename,
        Width:    bounds.Dx(),
        Height:   bounds.Dy(),
    }
}

func imageWorker(id int, jobs <-chan ImageJob, results chan<- ImageResult, wg *sync.WaitGroup) {
    defer wg.Done()
    
    for job := range jobs {
        fmt.Printf("Worker %d processing %s\n", id, job.URL)
        result := downloadAndResize(job)
        results <- result
    }
}

func processImages(imageURLs []string, concurrency int) []ImageResult {
    jobs := make(chan ImageJob, len(imageURLs))
    results := make(chan ImageResult, len(imageURLs))
    var wg sync.WaitGroup
    
    // Start workers
    for i := 1; i <= concurrency; i++ {
        wg.Add(1)
        go imageWorker(i, jobs, results, &wg)
    }
    
    // Send jobs
    for i, url := range imageURLs {
        jobs <- ImageJob{
            URL:      url,
            Filename: fmt.Sprintf("image_%d.jpg", i),
        }
    }
    close(jobs)
    
    // Wait and close
    go func() {
        wg.Wait()
        close(results)
    }()
    
    // Collect results
    var allResults []ImageResult
    for result := range results {
        allResults = append(allResults, result)
    }
    
    return allResults
}

Fan-Out/Fan-In Pattern

Fan-out: Distribute work across multiple goroutines Fan-in: Combine results from multiple goroutines

package main

import (
    "fmt"
    "sync"
)

// Fan-out: Multiple workers process from one channel
func fanOut(input <-chan int, numWorkers int) []<-chan int {
    outputs := make([]<-chan int, numWorkers)
    
    for i := 0; i < numWorkers; i++ {
        output := make(chan int)
        outputs[i] = output
        
        go func(out chan<- int) {
            defer close(out)
            for value := range input {
                // Process value
                result := value * 2
                out <- result
            }
        }(output)
    }
    
    return outputs
}

// Fan-in: Merge multiple channels into one
func fanIn(channels ...<-chan int) <-chan int {
    merged := make(chan int)
    var wg sync.WaitGroup
    
    // Start goroutine for each input channel
    for _, ch := range channels {
        wg.Add(1)
        go func(c <-chan int) {
            defer wg.Done()
            for value := range c {
                merged <- value
            }
        }(ch)
    }
    
    // Close merged when all inputs are done
    go func() {
        wg.Wait()
        close(merged)
    }()
    
    return merged
}

func main() {
    // Input channel
    input := make(chan int)
    
    // Fan out to 3 workers
    outputs := fanOut(input, 3)
    
    // Fan in results
    results := fanIn(outputs...)
    
    // Send work
    go func() {
        for i := 1; i <= 10; i++ {
            input <- i
        }
        close(input)
    }()
    
    // Collect results
    for result := range results {
        fmt.Println(result)
    }
}

Real Example: Log Aggregator

package main

import (
    "fmt"
    "strings"
    "sync"
    "time"
)

type LogEntry struct {
    Timestamp time.Time
    Level     string
    Message   string
}

// Multiple log sources (fan-out)
func readLogsFromSource(source string, logs chan<- LogEntry) {
    // Simulate reading logs
    entries := []string{
        "ERROR: Database connection failed",
        "INFO: Request processed",
        "WARN: High memory usage",
    }
    
    for _, entry := range entries {
        parts := strings.Split(entry, ": ")
        logs <- LogEntry{
            Timestamp: time.Now(),
            Level:     parts[0],
            Message:   parts[1],
        }
        time.Sleep(100 * time.Millisecond)
    }
}

// Aggregate logs from multiple sources (fan-in)
func aggregateLogs(sources []string) <-chan LogEntry {
    aggregated := make(chan LogEntry)
    var wg sync.WaitGroup
    
    for _, source := range sources {
        wg.Add(1)
        go func(src string) {
            defer wg.Done()
            sourceChan := make(chan LogEntry)
            go readLogsFromSource(src, sourceChan)
            
            for log := range sourceChan {
                aggregated <- log
            }
        }(source)
    }
    
    go func() {
        wg.Wait()
        close(aggregated)
    }()
    
    return aggregated
}

Pipeline Pattern

Chain multiple processing stages together.

package main

import (
    "fmt"
    "strings"
)

// Stage 1: Generate numbers
func generate(nums ...int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for _, n := range nums {
            out <- n
        }
    }()
    return out
}

// Stage 2: Square numbers
func square(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for n := range in {
            out <- n * n
        }
    }()
    return out
}

// Stage 3: Filter even numbers
func filterEven(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for n := range in {
            if n%2 == 0 {
                out <- n
            }
        }
    }()
    return out
}

func main() {
    // Build pipeline
    numbers := generate(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    squared := square(numbers)
    evens := filterEven(squared)
    
    // Consume results
    for n := range evens {
        fmt.Println(n)  // 4, 16, 36, 64, 100
    }
}

Real Example: Data Processing Pipeline

package main

import (
    "encoding/csv"
    "fmt"
    "io"
    "strconv"
    "strings"
)

type RawRecord struct {
    Line   int
    Fields []string
}

type ValidatedRecord struct {
    ID    int
    Name  string
    Price float64
}

type ProcessedRecord struct {
    ID         int
    Name       string
    Price      float64
    Tax        float64
    TotalPrice float64
}

// Stage 1: Read CSV
func readCSV(r io.Reader) <-chan RawRecord {
    out := make(chan RawRecord)
    
    go func() {
        defer close(out)
        reader := csv.NewReader(r)
        lineNum := 0
        
        for {
            record, err := reader.Read()
            if err == io.EOF {
                break
            }
            if err != nil {
                continue
            }
            
            lineNum++
            out <- RawRecord{Line: lineNum, Fields: record}
        }
    }()
    
    return out
}

// Stage 2: Validate and parse
func validate(in <-chan RawRecord) <-chan ValidatedRecord {
    out := make(chan ValidatedRecord)
    
    go func() {
        defer close(out)
        
        for raw := range in {
            if len(raw.Fields) != 3 {
                fmt.Printf("Line %d: invalid field count\n", raw.Line)
                continue
            }
            
            id, err := strconv.Atoi(raw.Fields[0])
            if err != nil {
                fmt.Printf("Line %d: invalid ID\n", raw.Line)
                continue
            }
            
            price, err := strconv.ParseFloat(raw.Fields[2], 64)
            if err != nil {
                fmt.Printf("Line %d: invalid price\n", raw.Line)
                continue
            }
            
            out <- ValidatedRecord{
                ID:    id,
                Name:  raw.Fields[1],
                Price: price,
            }
        }
    }()
    
    return out
}

// Stage 3: Calculate tax
func calculateTax(in <-chan ValidatedRecord) <-chan ProcessedRecord {
    out := make(chan ProcessedRecord)
    
    go func() {
        defer close(out)
        taxRate := 0.08  // 8% tax
        
        for record := range in {
            tax := record.Price * taxRate
            out <- ProcessedRecord{
                ID:         record.ID,
                Name:       record.Name,
                Price:      record.Price,
                Tax:        tax,
                TotalPrice: record.Price + tax,
            }
        }
    }()
    
    return out
}

func processCSVPipeline(r io.Reader) {
    // Build pipeline
    rawRecords := readCSV(r)
    validRecords := validate(rawRecords)
    processed := calculateTax(validRecords)
    
    // Consume results
    for record := range processed {
        fmt.Printf("ID: %d, Name: %s, Total: $%.2f\n",
            record.ID, record.Name, record.TotalPrice)
    }
}

Context Package

The context package provides cancellation, timeouts, and request-scoped values.

Context Cancellation

package main

import (
    "context"
    "fmt"
    "time"
)

func worker(ctx context.Context, id int) {
    for {
        select {
        case <-ctx.Done():
            fmt.Printf("Worker %d: stopped\n", id)
            return
        default:
            fmt.Printf("Worker %d: working...\n", id)
            time.Sleep(500 * time.Millisecond)
        }
    }
}

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    
    // Start workers
    for i := 1; i <= 3; i++ {
        go worker(ctx, i)
    }
    
    // Let them work for 2 seconds
    time.Sleep(2 * time.Second)
    
    // Cancel all workers
    fmt.Println("Cancelling...")
    cancel()
    
    time.Sleep(1 * time.Second)
}

Context Timeout

func queryDatabase(ctx context.Context, query string) error {
    // Create context with 2-second timeout
    ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
    defer cancel()
    
    result := make(chan error, 1)
    
    go func() {
        // Simulate slow query
        time.Sleep(3 * time.Second)
        result <- nil
    }()
    
    select {
    case err := <-result:
        return err
    case <-ctx.Done():
        return ctx.Err()  // context deadline exceeded
    }
}

Context Values

func processRequest(ctx context.Context) {
    requestID := ctx.Value("requestID")
    fmt.Printf("Processing request: %s\n", requestID)
}

func main() {
    ctx := context.WithValue(context.Background(), "requestID", "req-12345")
    processRequest(ctx)
}

Real Example: HTTP Request with Context

package main

import (
    "context"
    "fmt"
    "io"
    "net/http"
    "time"
)

func fetchURL(ctx context.Context, url string) (string, error) {
    // Create request with context
    req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
    if err != nil {
        return "", err
    }
    
    // Execute request
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    
    // Read response
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        return "", err
    }
    
    return string(body), nil
}

func main() {
    // Create context with 5-second timeout
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()
    
    result, err := fetchURL(ctx, "https://example.com")
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    
    fmt.Printf("Fetched %d bytes\n", len(result))
}

Sync Package Primitives

WaitGroup

Wait for a collection of goroutines to finish:

package main

import (
    "fmt"
    "sync"
    "time"
)

func task(id int, wg *sync.WaitGroup) {
    defer wg.Done()  // Decrement counter when done
    
    fmt.Printf("Task %d starting\n", id)
    time.Sleep(time.Second)
    fmt.Printf("Task %d done\n", id)
}

func main() {
    var wg sync.WaitGroup
    
    for i := 1; i <= 5; i++ {
        wg.Add(1)  // Increment counter
        go task(i, &wg)
    }
    
    wg.Wait()  // Block until counter is 0
    fmt.Println("All tasks completed")
}

Mutex (Mutual Exclusion)

Protect shared data:

package main

import (
    "fmt"
    "sync"
)

type SafeCounter struct {
    mu    sync.Mutex
    count int
}

func (c *SafeCounter) Increment() {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.count++
}

func (c *SafeCounter) Value() int {
    c.mu.Lock()
    defer c.mu.Unlock()
    return c.count
}

func main() {
    counter := &SafeCounter{}
    var wg sync.WaitGroup
    
    // 1000 concurrent increments
    for i := 0; i < 1000; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            counter.Increment()
        }()
    }
    
    wg.Wait()
    fmt.Println("Final count:", counter.Value())  // 1000
}

RWMutex (Read-Write Mutex)

Allow multiple readers or one writer:

package main

import (
    "fmt"
    "sync"
    "time"
)

type Cache struct {
    mu    sync.RWMutex
    data  map[string]string
}

func (c *Cache) Get(key string) (string, bool) {
    c.mu.RLock()         // Read lock - allows multiple readers
    defer c.mu.RUnlock()
    
    value, ok := c.data[key]
    return value, ok
}

func (c *Cache) Set(key, value string) {
    c.mu.Lock()          // Write lock - exclusive access
    defer c.mu.Unlock()
    
    c.data[key] = value
}

func main() {
    cache := &Cache{data: make(map[string]string)}
    var wg sync.WaitGroup
    
    // Multiple readers
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            value, _ := cache.Get("key")
            fmt.Printf("Reader %d: %s\n", id, value)
        }(i)
    }
    
    // Single writer
    wg.Add(1)
    go func() {
        defer wg.Done()
        cache.Set("key", "value")
        fmt.Println("Writer: set value")
    }()
    
    wg.Wait()
}

sync.Once

Execute code exactly once:

package main

import (
    "fmt"
    "sync"
)

var (
    instance *Database
    once     sync.Once
)

type Database struct {
    connection string
}

func GetDatabase() *Database {
    once.Do(func() {
        fmt.Println("Initializing database...")
        instance = &Database{connection: "db://localhost"}
    })
    
    return instance
}

func main() {
    var wg sync.WaitGroup
    
    // Multiple goroutines try to get database
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            db := GetDatabase()
            fmt.Printf("Goroutine %d: %s\n", id, db.connection)
        }(i)
    }
    
    wg.Wait()
    // "Initializing database..." printed only once
}

Real Example: Complete Import System

Putting it all together - the CSV import system I built:

package main

import (
    "context"
    "encoding/csv"
    "fmt"
    "io"
    "sync"
    "time"
)

type Product struct {
    ID    string
    Name  string
    Price float64
}

type ImportResult struct {
    Success int
    Failed  int
    Errors  []error
}

func importProducts(ctx context.Context, r io.Reader, concurrency int) (*ImportResult, error) {
    result := &ImportResult{}
    var mu sync.Mutex
    
    // Stage 1: Read CSV
    rawChan := make(chan []string, 100)
    go func() {
        defer close(rawChan)
        reader := csv.NewReader(r)
        
        for {
            record, err := reader.Read()
            if err == io.EOF {
                break
            }
            if err != nil {
                continue
            }
            
            select {
            case rawChan <- record:
            case <-ctx.Done():
                return
            }
        }
    }()
    
    // Stage 2: Parse and validate
    productChan := make(chan Product, 100)
    var parseWg sync.WaitGroup
    
    for i := 0; i < concurrency; i++ {
        parseWg.Add(1)
        go func() {
            defer parseWg.Done()
            
            for record := range rawChan {
                if len(record) != 3 {
                    mu.Lock()
                    result.Failed++
                    mu.Unlock()
                    continue
                }
                
                // Parse product (simplified)
                product := Product{
                    ID:   record[0],
                    Name: record[1],
                }
                
                select {
                case productChan <- product:
                case <-ctx.Done():
                    return
                }
            }
        }()
    }
    
    go func() {
        parseWg.Wait()
        close(productChan)
    }()
    
    // Stage 3: Save to database
    var saveWg sync.WaitGroup
    for i := 0; i < concurrency; i++ {
        saveWg.Add(1)
        go func() {
            defer saveWg.Done()
            
            for product := range productChan {
                // Simulate database save
                time.Sleep(10 * time.Millisecond)
                
                mu.Lock()
                result.Success++
                mu.Unlock()
            }
        }()
    }
    
    saveWg.Wait()
    return result, nil
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    // Simulate CSV data
    csvData := "1,Product A,19.99\n2,Product B,29.99\n3,Product C,39.99\n"
    
    result, err := importProducts(ctx, io.NopCloser(nil), 5)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    
    fmt.Printf("Success: %d, Failed: %d\n", result.Success, result.Failed)
}

Your Challenge

Build a concurrent URL health checker:

// Requirements:
// 1. Check 100 URLs concurrently (10 workers)
// 2. Timeout after 3 seconds per URL
// 3. Retry failed URLs once
// 4. Collect results: success count, failed count, average response time
// 5. Use context for cancellation
// 6. Use proper error handling

type HealthCheckResult struct {
    URL          string
    StatusCode   int
    ResponseTime time.Duration
    Err          error
}

func checkURLsHealth(ctx context.Context, urls []string, concurrency int) ([]HealthCheckResult, error) {
    // Your implementation here
}

Key Takeaways

Worker pools: Fixed number of workers process jobs from queue
Fan-out/fan-in: Distribute work, merge results
Pipelines: Chain processing stages for clean data flow
Context: Handle cancellation, timeouts, request-scoped values
WaitGroup: Wait for multiple goroutines to complete
Mutex: Protect shared data from concurrent access
RWMutex: Allow multiple readers or exclusive writer
sync.Once: Initialize code exactly once

What I Learned

That CSV import rewrite taught me that Go's concurrency patterns aren't just about speed:

Pipeline pattern made the code readable - clear data flow
Worker pools provided predictable resource usage - no memory explosions
Context enabled graceful cancellation - no orphaned goroutines
12-second imports vs. 230-second sequential processing

Coming from Python's multiprocessing complexity, Go's patterns felt elegant. The import system has processed millions of products over 18 months with zero concurrency bugs.

The 20x speedup was impressive. The zero downtime was better.

Next: Package Management

In the next article, we'll explore Go modules and dependency management. You'll learn how I escaped dependency hell and why go.mod changed how I think about package management.

PreviousConcurrency - Goroutines and Channels NextPackage Management

Last updated 1 month ago

hashtagThe CSV Import That Brought Down Production

hashtagWorker Pool Pattern

hashtagBasic Worker Pool

hashtagReal Example: Image Processor

hashtagFan-Out/Fan-In Pattern

hashtagReal Example: Log Aggregator

hashtagPipeline Pattern

hashtagReal Example: Data Processing Pipeline

hashtagContext Package

hashtagContext Cancellation

hashtagContext Timeout

hashtagContext Values

hashtagReal Example: HTTP Request with Context

hashtagSync Package Primitives

hashtagWaitGroup

hashtagMutex (Mutual Exclusion)

hashtagRWMutex (Read-Write Mutex)

hashtagsync.Once

hashtagReal Example: Complete Import System

hashtagYour Challenge

hashtagKey Takeaways

hashtagWhat I Learned