Hashes

πŸ“– Introduction

Hash tables are the unsung heroes of efficient programming. Every time I use a Python dictionary to store user sessions, cache API responses, or count word frequencies, I'm leveraging one of the most powerful data structures in computer science.

What makes hash tables magical is their ability to provide near-constant time lookups, insertions, and deletionsβ€”regardless of how much data you store. Understanding how they achieve this, and their potential pitfalls, has helped me make better design decisions and debug performance issues that seemed mysterious.


🎯 What Is a Hash Table?

A hash table maps keys to values using a hash function to compute an index into an array of buckets.

Core Concept

spinner
# Simplified concept
def simple_hash_table():
    # Array of buckets
    buckets = [None] * 10
    
    # Insert: hash key to find bucket
    key = "alice"
    index = hash(key) % len(buckets)
    buckets[index] = ("alice", 42)
    
    # Lookup: hash to find bucket
    index = hash("alice") % len(buckets)
    return buckets[index]  # ("alice", 42)

Time Complexity

Operation
Average
Worst

Insert

O(1)

O(n)

Lookup

O(1)

O(n)

Delete

O(1)

O(n)

*Worst case occurs with many collisions


πŸ”§ Hash Functions

A hash function converts keys into array indices. Good hash functions are:

  • Deterministic: Same key always produces same hash

  • Uniform: Distributes keys evenly across buckets

  • Fast: Computes quickly

Python's Built-in Hash

Implementing Hash Functions


πŸ’₯ Collision Resolution

Collisions occur when different keys hash to the same index. Two main strategies handle this.

Strategy 1: Chaining

Each bucket stores a list of (key, value) pairs.

spinner

Strategy 2: Open Addressing

All entries stored in the array itself. On collision, probe for next open slot.

Probing Strategies


🐍 Python Dict Internals

Python's dict is one of the most optimized hash table implementations.

Key Features

Dict Comprehension and Methods

DefaultDict and Counter


πŸ”’ Sets

Sets are hash tables without valuesβ€”just keys.


πŸ“Š Common Hash Table Problems

Two Sum

Group Anagrams

Longest Consecutive Sequence

Subarray Sum Equals K

LRU Cache

Valid Sudoku


πŸ“ Practice Exercises

Exercise 1: First Unique Character

chevron-rightSolutionhashtag

Exercise 2: Isomorphic Strings

chevron-rightSolutionhashtag

Exercise 3: Design HashMap

chevron-rightSolutionhashtag

πŸ”‘ Key Takeaways

  1. Hash tables provide O(1) average case for insert/lookup/delete

  2. Collisions are inevitableβ€”handle with chaining or open addressing

  3. Load factor affects performanceβ€”resize when threshold exceeded

  4. Python dicts are highly optimized and maintain insertion order

  5. Sets are hash tables with only keysβ€”great for membership testing

  6. Counter and defaultdict simplify common patterns


πŸš€ What's Next?

In the next article, we'll explore trees:

  • Binary trees and BSTs

  • Tree traversals (DFS and BFS)

  • Balanced trees

  • Common tree problems

Next: Article 8 - Trees


πŸ“š References

  • Python dict implementation details (CPython source)

  • CLRS Chapter 11: Hash Tables

  • "Designing Data-Intensive Applications" - Hash indexes

Last updated