# Understanding Memory Leaks and How to Avoid Them in Backend Development

> *Personal Knowledge Sharing: What Go's garbage collector does not protect you from — and how I learned to find and fix memory leaks in production Go services*

***

## Introduction

Go has a garbage collector. That fact made me complacent early on. I assumed that if nothing was obviously wrong, memory was being managed for me. Then I started noticing a pattern in one of my Go microservices: the container would start at around 60MB, and after a few days of traffic it would be sitting at 400MB with no sign of ever coming back down. Restarting fixed it temporarily. The number climbed again.

The Go GC is excellent at cleaning up objects that are no longer referenced. What it cannot do is clean up memory that is still referenced — even if that reference is held by a goroutine nobody is waiting on, a cache nobody evicts, or a channel that nobody will ever read from again. From the GC's perspective, that memory is still in use.

This article covers the memory leak patterns I have actually hit in Go backend services, how to detect them with `pprof` and `runtime`, and the practices I put in place to avoid them going forward.

***

## Table of Contents

* [Why Go Still Gets Memory Leaks](#why-go-still-gets-memory-leaks)
* [Pattern 1: Goroutine Leaks](#pattern-1-goroutine-leaks)
  * [Blocked channel receiver](#blocked-channel-receiver)
  * [Context not propagated](#context-not-propagated)
  * [Worker with no shutdown signal](#worker-with-no-shutdown-signal)
* [Pattern 2: Unclosed HTTP Response Bodies](#pattern-2-unclosed-http-response-bodies)
* [Pattern 3: Slice Backing Array Retention](#pattern-3-slice-backing-array-retention)
* [Pattern 4: Unbounded Map Growth](#pattern-4-unbounded-map-growth)
* [Pattern 5: Timers and Tickers Not Stopped](#pattern-5-timers-and-tickers-not-stopped)
* [Pattern 6: defer Inside a Loop](#pattern-6-defer-inside-a-loop)
* [Pattern 7: String and \[\]byte Conversion Under Load](#pattern-7-string-and-byte-conversion-under-load)
* [Detecting Memory Leaks with pprof](#detecting-memory-leaks-with-pprof)
  * [Enabling the pprof endpoint](#enabling-the-pprof-endpoint)
  * [Heap profiling](#heap-profiling)
  * [Goroutine profiling](#goroutine-profiling)
  * [runtime.ReadMemStats for in-process monitoring](#runtimereadmemstats-for-in-process-monitoring)
* [Building a Leak-Aware Service](#building-a-leak-aware-service)
* [What I Learned](#what-i-learned)

***

## Why Go Still Gets Memory Leaks

The Go garbage collector uses a **tricolor mark-and-sweep algorithm** that runs concurrently with your application. It frees any memory that is unreachable from the root set (global variables, goroutine stacks, registers).

The key word is *unreachable*. If a reference to an object exists anywhere — in a slice, a map value, a channel buffer, a goroutine's stack frame — the GC will not touch it. Memory leaks in Go are almost always one of two things:

1. **References that are held longer than intended** — a cache that grows without eviction, a slice that keeps a reference to a large backing array, a map that accumulates entries and never shrinks.
2. **Goroutines that are never terminated** — a goroutine blocked on a receive from a channel that will never send is alive from the GC's perspective; everything on its stack and heap that it references is reachable.

{% @mermaid/diagram content="flowchart TD
A\[Go GC runs] --> B{Is the object reachable?}
B -->|No| C\[Object freed]
B -->|Yes — goroutine still holds reference| D\[Object stays in heap]
B -->|Yes — map/slice still holds reference| D
B -->|Yes — global cache holds reference| D

```
style C fill:#2E7D32,color:#fff
style D fill:#B71C1C,color:#fff" %}
```

Understanding this distinction shifts how you think about memory management in Go. The GC is not the primary safeguard — your structural choices are.

***

## Pattern 1: Goroutine Leaks

This was the root cause of the gradual memory growth I described in the introduction. Goroutines are cheap to start, which makes it easy to fire them off without thinking carefully about how and when they stop.

### Blocked channel receiver

```go
// internal/worker/processor.go

// This function starts a goroutine that processes jobs.
// The problem: if the caller stops sending on jobsCh but never closes it,
// the goroutine blocks on <-jobsCh forever.
func startProcessor(jobsCh <-chan Job) {
    go func() {
        for job := range jobsCh {
            process(job)
        }
    }()
}

// In the HTTP handler:
func (h *Handler) SubmitBatch(w http.ResponseWriter, r *http.Request) {
    jobs := make(chan Job, 10)

    // Goroutine started, but jobs channel goes out of scope when
    // this handler returns without closing it.
    startProcessor(jobs)

    for _, j := range parseBatch(r.Body) {
        jobs <- j
    }
    // BUG: jobs is never closed — goroutine leaks here
    w.WriteHeader(http.StatusAccepted)
}
```

Every HTTP request to this endpoint leaks one goroutine. Under modest traffic, a few thousand goroutines accumulate in memory. The fix is ensuring the channel is always closed and the goroutine is always given a way out:

```go
func (h *Handler) SubmitBatch(w http.ResponseWriter, r *http.Request) {
    jobs := make(chan Job, 10)

    go func() {
        defer close(jobs)                // always close the channel
        for _, j := range parseBatch(r.Body) {
            jobs <- j
        }
    }()

    startProcessor(jobs)                 // goroutine exits when jobs is closed
    w.WriteHeader(http.StatusAccepted)
}
```

### Context not propagated

A goroutine that does network I/O without respecting context cancellation will block until the remote call either completes or times out at the transport layer — which may be much later (or never) than the point at which the caller gave up:

```go
// Leak: no context passed — if the caller cancels, this goroutine keeps running
func fetchInventory(productID string) (Inventory, error) {
    resp, err := http.Get("https://inventory.internal/products/" + productID)
    if err != nil {
        return Inventory{}, err
    }
    defer resp.Body.Close()
    return decodeInventory(resp.Body)
}

// Correct: pass context so the HTTP call is cancelled when the caller is done
func fetchInventory(ctx context.Context, productID string) (Inventory, error) {
    req, err := http.NewRequestWithContext(ctx, http.MethodGet,
        "https://inventory.internal/products/"+productID, nil)
    if err != nil {
        return Inventory{}, err
    }

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return Inventory{}, err
    }
    defer resp.Body.Close()
    return decodeInventory(resp.Body)
}
```

Every function that launches a goroutine or makes a blocking call should accept and respect `context.Context`. This is not just good practice — it is the primary mechanism for making goroutines terminable.

### Worker with no shutdown signal

Background workers started in `main.go` or during server initialization need an explicit shutdown path. A common pattern is passing a `ctx` derived from a cancellable context and listening on the `Done` channel:

```go
// internal/worker/sync_worker.go

type SyncWorker struct {
    db     *sql.DB
    ticker *time.Ticker
}

func NewSyncWorker(db *sql.DB) *SyncWorker {
    return &SyncWorker{
        db:     db,
        ticker: time.NewTicker(30 * time.Second),
    }
}

// Run blocks until ctx is cancelled; caller is responsible for cancellation
func (w *SyncWorker) Run(ctx context.Context) {
    defer w.ticker.Stop()  // always stop the ticker (covered in Pattern 5)

    for {
        select {
        case <-w.ticker.C:
            if err := w.syncNow(ctx); err != nil {
                slog.ErrorContext(ctx, "sync failed", "error", err)
            }
        case <-ctx.Done():
            slog.Info("sync worker shutting down")
            return
        }
    }
}

// cmd/server/main.go
func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    db := mustOpenDB()
    worker := worker.NewSyncWorker(db)

    go worker.Run(ctx)

    // On SIGTERM/SIGINT, cancel() fires → worker exits cleanly
    waitForSignal(cancel)
}
```

When the server receives `SIGTERM`, `cancel()` is called, `ctx.Done()` fires, and the worker goroutine exits. Without this, the goroutine outlives the signal handler and holds its allocations until the process exits — or worse, if the process never exits cleanly.

***

## Pattern 2: Unclosed HTTP Response Bodies

This one is subtle enough to miss in code review. When using `net/http` to make outbound requests, the response body must be explicitly closed, even when you do not intend to read it. Failing to do so holds the underlying TCP connection open, preventing it from being returned to the connection pool:

```go
// Leak: response body not closed when err == nil
func checkHealth(url string) bool {
    resp, err := http.Get(url)
    if err != nil {
        return false
    }
    return resp.StatusCode == http.StatusOK
    // resp.Body is never closed — connection leaks
}
```

Under sustained traffic, the connection pool exhausts. New outbound requests block waiting for a connection. Eventually, memory climbs as pending goroutines accumulate.

The correct pattern uses `defer` immediately after the nil check:

```go
func checkHealth(ctx context.Context, url string) bool {
    req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
    if err != nil {
        return false
    }

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return false
    }
    defer resp.Body.Close()                     // always close, immediately after nil check

    _, _ = io.Copy(io.Discard, resp.Body)       // drain body before closing for connection reuse
    return resp.StatusCode == http.StatusOK
}
```

The `io.Copy(io.Discard, resp.Body)` line is important for connection reuse. Closing the body without draining it forces the transport to close the underlying TCP connection rather than returning it to the pool, which means the pool still grows with new connections on each request.

***

## Pattern 3: Slice Backing Array Retention

Go slices are a view over an underlying array. When you take a sub-slice, both the slice header and the original backing array remain allocated:

```go
// Large log line: 100KB
rawLine := readLogLine()  // []byte, backed by a 100KB allocation

// We only want the first 32 bytes (the timestamp prefix)
timestamp := rawLine[:32]

// rawLine goes out of scope — but its 100KB backing array stays alive
// because timestamp still references it!
processBatch(timestamp)
```

This was a real problem in a log-ingestion pipeline. I was extracting short substrings from large log lines and storing them in a cache. Each cache entry appeared small but silently retained the full original allocation.

The fix is to copy the data you need into a new allocation:

```go
rawLine := readLogLine()

// Copy only the bytes we need — new allocation, 32 bytes
timestamp := make([]byte, 32)
copy(timestamp, rawLine[:32])

// rawLine can now be GC'd; timestamp owns its own memory
processBatch(timestamp)
```

The same applies to string slicing from a large string:

```go
// Retains the entire large string in memory
shortKey := largePayload[0:16]

// New allocation with only the needed content
shortKey := strings.Clone(largePayload[0:16])  // strings.Clone available in Go 1.20+
// or: shortKey := string([]byte(largePayload[0:16]))
```

***

## Pattern 4: Unbounded Map Growth

Go maps do not shrink after deletion. Once a map allocates buckets for N entries, those buckets are reused but the underlying memory is not returned to the OS even if the map is emptied. If a map grows to millions of entries and then shrinks, the allocated memory stays:

```go
// internal/cache/in_memory.go

// This cache grows without bound — no eviction, no TTL
type MetricsCache struct {
    mu    sync.RWMutex
    store map[string]MetricValue
}

func (c *MetricsCache) Set(key string, value MetricValue) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.store[key] = value  // grows forever if keys are unique (e.g. request IDs)
}
```

If the key space is unbounded — request IDs, user-generated strings, session tokens — the map will grow indefinitely. There are two practical approaches.

**Approach 1: TTL-based eviction with a background cleaner**

```go
type entry struct {
    value     MetricValue
    expiresAt time.Time
}

type MetricsCache struct {
    mu    sync.RWMutex
    store map[string]entry
}

func NewMetricsCache(ctx context.Context, cleanInterval time.Duration) *MetricsCache {
    c := &MetricsCache{store: make(map[string]entry)}
    go c.runCleaner(ctx, cleanInterval)
    return c
}

func (c *MetricsCache) Set(key string, value MetricValue, ttl time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.store[key] = entry{value: value, expiresAt: time.Now().Add(ttl)}
}

func (c *MetricsCache) Get(key string) (MetricValue, bool) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    e, ok := c.store[key]
    if !ok || time.Now().After(e.expiresAt) {
        return MetricValue{}, false
    }
    return e.value, true
}

func (c *MetricsCache) runCleaner(ctx context.Context, interval time.Duration) {
    ticker := time.NewTicker(interval)
    defer ticker.Stop()

    for {
        select {
        case <-ticker.C:
            c.mu.Lock()
            now := time.Now()
            for k, v := range c.store {
                if now.After(v.expiresAt) {
                    delete(c.store, k)
                }
            }
            c.mu.Unlock()
        case <-ctx.Done():
            return
        }
    }
}
```

**Approach 2: LRU with a size cap**

For cases where you want bounded memory regardless of TTL, use an LRU cache (I use `golang.org/x/tools` or `github.com/hashicorp/golang-lru/v2` depending on the project):

```go
import lru "github.com/hashicorp/golang-lru/v2"

// Cache that holds at most 10,000 entries, evicting least-recently-used on overflow
cache, err := lru.New[string, MetricValue](10_000)
if err != nil {
    log.Fatal(err)
}

cache.Add("key", value)     // auto-evicts LRU entry when full
v, ok := cache.Get("key")
```

***

## Pattern 5: Timers and Tickers Not Stopped

`time.NewTimer` and `time.NewTicker` allocate goroutines and channels internally. If you create them without stopping them, they keep running and holding memory until the process exits:

```go
// Leak in a request handler — new ticker per request, never stopped
func (h *Handler) PollStatus(w http.ResponseWriter, r *http.Request) {
    ticker := time.NewTicker(500 * time.Millisecond)

    for range ticker.C {
        status := h.checkStatus(r.Context())
        if status.Done {
            json.NewEncoder(w).Encode(status)
            return  // BUG: ticker never stopped on return
        }
    }
}
```

Every request that completes before the ticker fires leaves a running goroutine inside the ticker's implementation. Fix: always stop with `defer`:

```go
func (h *Handler) PollStatus(w http.ResponseWriter, r *http.Request) {
    ticker := time.NewTicker(500 * time.Millisecond)
    defer ticker.Stop()    // always, regardless of exit path

    for {
        select {
        case <-ticker.C:
            status := h.checkStatus(r.Context())
            if status.Done {
                json.NewEncoder(w).Encode(status)
                return
            }
        case <-r.Context().Done():
            // client disconnected — exit without writing
            return
        }
    }
}
```

The same applies to `time.AfterFunc`. The timer returned must be stopped when no longer needed:

```go
// Store the timer so it can be cancelled
timer := time.AfterFunc(5*time.Second, func() {
    expireSession(sessionID)
})

// On explicit logout, cancel the pending expiry
defer timer.Stop()
```

***

## Pattern 6: defer Inside a Loop

`defer` lines up a call to run when the *function* returns, not when the loop iteration ends. Inside a tight loop, this accumulates deferred calls that all fire at once when the function exits — keeping every resource open for the full duration of the loop:

```go
// Leak: file handle stays open until processAllFiles returns
func processAllFiles(paths []string) error {
    for _, path := range paths {
        f, err := os.Open(path)
        if err != nil {
            return err
        }
        defer f.Close()     // BAD: deferred until function exit, not iteration end

        if err := processFile(f); err != nil {
            return err
        }
    }
    return nil
}
```

With thousands of files, this holds thousands of file handles open simultaneously. The fix is to extract the per-iteration work into a function (or a closure called immediately):

```go
func processAllFiles(paths []string) error {
    for _, path := range paths {
        if err := processOne(path); err != nil {
            return err
        }
    }
    return nil
}

func processOne(path string) error {
    f, err := os.Open(path)
    if err != nil {
        return err
    }
    defer f.Close()    // now deferred until processOne returns — correct
    return processFile(f)
}
```

***

## Pattern 7: String and \[]byte Conversion Under Load

In Go, converting between `string` and `[]byte` always copies the data. Under high throughput this generates significant allocator pressure — many short-lived allocations that the GC must track and collect:

```go
// In a hot path — called thousands of times per second
func buildCacheKey(userID string, resource string) string {
    // Two []byte allocations + one string allocation per call
    key := []byte(userID)
    key = append(key, ':')
    key = append(key, []byte(resource)...)
    return string(key)
}
```

Prefer `strings.Builder` or `fmt.Sprintf` for string construction in moderate-frequency paths, and investigate `unsafe.SliceData` / unsafestring tricks only if profiling proves it is a bottleneck. For cache keys specifically, concatenation with `+` is often fine because the compiler can optimise it:

```go
// Simplest — compiler usually avoids intermediate allocations
func buildCacheKey(userID, resource string) string {
    return userID + ":" + resource
}

// For multiple segments or when order is dynamic, strings.Builder avoids copies
func buildCacheKey(parts ...string) string {
    var b strings.Builder
    for i, p := range parts {
        if i > 0 {
            b.WriteByte(':')
        }
        b.WriteString(p)
    }
    return b.String()
}
```

Use `sync.Pool` to reuse allocations in hot paths that create many short-lived objects (e.g. `bytes.Buffer` instances used during JSON encoding):

```go
var bufPool = sync.Pool{
    New: func() any { return new(bytes.Buffer) },
}

func encodeMetric(m Metric) ([]byte, error) {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)

    if err := json.NewEncoder(buf).Encode(m); err != nil {
        return nil, err
    }
    result := make([]byte, buf.Len())
    copy(result, buf.Bytes())
    return result, nil
}
```

***

## Detecting Memory Leaks with pprof

Go ships with `net/http/pprof` in the standard library. Registering the endpoint takes two lines:

### Enabling the pprof endpoint

```go
// cmd/server/main.go
import (
    "net/http"
    _ "net/http/pprof"   // registers /debug/pprof/* handlers on http.DefaultServeMux
)

func main() {
    // Serve pprof on a separate port — never expose this to the public internet
    go func() {
        if err := http.ListenAndServe("localhost:6060", nil); err != nil {
            slog.Error("pprof server failed", "error", err)
        }
    }()

    // ... start your main server
}
```

### Heap profiling

```bash
# Take a heap snapshot
go tool pprof http://localhost:6060/debug/pprof/heap

# In the pprof REPL:
(pprof) top10          # shows top 10 allocation sites by inuse_space
(pprof) list fetchInventory   # shows source-annotated allocations in a function
(pprof) web            # opens an SVG flame graph in the browser (requires graphviz)
```

To compare two snapshots and find what grew between them:

```bash
# Snapshot 1 — baseline after startup
curl -s http://localhost:6060/debug/pprof/heap > heap1.prof

# Wait for traffic, then snapshot 2
curl -s http://localhost:6060/debug/pprof/heap > heap2.prof

# Diff: shows what grew between the two snapshots
go tool pprof -base heap1.prof heap2.prof
(pprof) top10
```

### Goroutine profiling

When you suspect a goroutine leak, the goroutine profile tells you exactly how many goroutines are running and what they are blocked on:

```bash
# View goroutine counts and stack traces
go tool pprof http://localhost:6060/debug/pprof/goroutine

(pprof) top10          # most common goroutine stack prefixes
(pprof) traces         # full stack trace for each goroutine group
```

A healthy long-running service should have a fairly stable goroutine count. If the count climbs steadily under traffic and does not drop when traffic stops, you have a goroutine leak.

### runtime.ReadMemStats for in-process monitoring

For continuous monitoring (e.g. emitting metrics to Prometheus), poll `runtime.ReadMemStats`:

```go
// internal/metrics/runtime_metrics.go

func CollectRuntimeMetrics(ctx context.Context, interval time.Duration) {
    ticker := time.NewTicker(interval)
    defer ticker.Stop()

    var stats runtime.MemStats

    for {
        select {
        case <-ticker.C:
            runtime.ReadMemStats(&stats)

            // HeapInuse: bytes in in-use heap spans
            heapInuseGauge.Set(float64(stats.HeapInuse))
            // HeapIdle: bytes in idle heap spans (returned or reclaimable)
            heapIdleGauge.Set(float64(stats.HeapIdle))
            // HeapObjects: number of live heap objects
            heapObjectsGauge.Set(float64(stats.HeapObjects))
            // NumGoroutine: always useful to track
            goroutineGauge.Set(float64(runtime.NumGoroutine()))

        case <-ctx.Done():
            return
        }
    }
}
```

Plotting `HeapInuse` and `NumGoroutine` over time reveals leaks clearly: a leak shows up as a monotonically increasing line that does not return to baseline after a traffic reduction.

***

## Building a Leak-Aware Service

The patterns above suggest a set of structural practices I now apply from the start of any Go backend service:

**Always pass and respect context**

```go
// Every function that does I/O or spawns goroutines takes a context
func (s *Service) SyncInventory(ctx context.Context, productIDs []string) error {
    g, ctx := errgroup.WithContext(ctx)

    for _, id := range productIDs {
        id := id  // capture for goroutine
        g.Go(func() error {
            return s.syncOne(ctx, id)
        })
    }
    return g.Wait()
}
```

**Expose goroutine count as a metric**

Alerting on abnormal goroutine growth catches leaks before they become incidents:

```go
// In your Prometheus metrics setup:
go func() {
    for range time.Tick(10 * time.Second) {
        goroutineGauge.Set(float64(runtime.NumGoroutine()))
    }
}()
```

**Use goleak in tests**

`go.uber.org/goleak` fails a test if it exits with goroutines that were leaked by the code under test:

```go
import "go.uber.org/goleak"

func TestMain(m *testing.M) {
    goleak.VerifyTestMain(m)
}

func TestSyncInventory(t *testing.T) {
    // Any goroutine started by SyncInventory that is still running
    // after the test exits will cause goleak to fail the test
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    svc := newTestService(t)
    err := svc.SyncInventory(ctx, []string{"prod-001", "prod-002"})
    require.NoError(t, err)
}
```

`goleak` is the cheapest way to catch goroutine leaks — it runs at test time, not in production.

**Cap all caches at construction**

```go
// Pass max size at construction — never let it grow unboundedly
func NewProductCache(maxEntries int) *ProductCache {
    cache, _ := lru.New[string, Product](maxEntries)
    return &ProductCache{inner: cache}
}
```

**Structure worker lifecycle explicitly**

```go
type Worker struct {
    cancel context.CancelFunc
    done   chan struct{}
}

func (w *Worker) Start(ctx context.Context) {
    ctx, w.cancel = context.WithCancel(ctx)
    w.done = make(chan struct{})
    go func() {
        defer close(w.done)
        w.run(ctx)
    }()
}

func (w *Worker) Stop() {
    w.cancel()
    <-w.done    // wait for the goroutine to actually exit before returning
}
```

Waiting on `<-w.done` ensures the goroutine has released all its resources before `Stop()` returns — important during graceful shutdown when the process is about to exit anyway.

***

## What I Learned

Working through memory growth in Go backend services changed how I think about the runtime:

1. **The GC is not a safety net for leaks — it is a collector of unreachable memory.** Your job is to make sure that memory you no longer need becomes unreachable.
2. **Every goroutine needs an exit condition at the time you start it.** If you cannot describe the exact circumstances that will cause a goroutine to return, it will likely leak.
3. **Context is the shutdown mechanism.** Threading `context.Context` through every I/O call and goroutine is not just API hygiene — it is what makes goroutines terminable.
4. **Unbounded growth is the most common real-world leak.** Before shipping any cache or accumulator, ask: "What is the maximum size of this thing? What removes entries from it?"
5. **pprof heap diffing is the fastest way to diagnose a live leak.** Two heap snapshots before and after a traffic window show exactly which allocation sites grew.
6. **goleak in tests catches goroutine leaks at development time** — far cheaper than finding them in production at 3 AM with a container restart.
7. **The goroutine count metric is the canary.** A healthy service has a stable goroutine count at steady-state traffic. A rising count is always worth investigating immediately.
