Message Queues & Async Processing

← Back to System Design 101 | ← Previous: Database Design

Why Asynchronous Processing Matters

One of the most transformative architectural decisions I've made is moving from synchronous to asynchronous processing for non-critical operations. It dramatically improved system responsiveness, reliability, and scalability.

Message queues decouple services, enable async processing, and provide resilience when components fail. This article covers the patterns I've used in production.

Message Queue Fundamentals

When to Use Message Queues

Good use cases (from my experience):

Email/SMS notifications
Image/video processing
Report generation
Data synchronization between services
Event logging and analytics
Background job processing

Poor use cases:

Critical path operations that users wait for
Operations requiring immediate response
Simple request-response patterns

Key Concepts

# Basic message queue pattern
from typing import Dict, Callable
import json

class SimpleQueue:
    """
    Basic queue concept illustration.
    Production systems use RabbitMQ, Kafka, or SQS.
    """
    
    def __init__(self):
        self.messages = []
        self.handlers = {}
    
    def publish(self, queue_name: str, message: Dict):
        """Producer: Send message to queue."""
        self.messages.append({
            "queue": queue_name,
            "message": message,
            "timestamp": datetime.utcnow()
        })
    
    def subscribe(self, queue_name: str, handler: Callable):
        """Consumer: Register handler for queue."""
        self.handlers[queue_name] = handler
    
    def process(self):
        """Process pending messages."""
        while self.messages:
            item = self.messages.pop(0)
            queue_name = item["queue"]
            
            if queue_name in self.handlers:
                handler = self.handlers[queue_name]
                handler(item["message"])

RabbitMQ Patterns

RabbitMQ is my go-to for traditional message queuing.

Work Queue Pattern

Distribute tasks across multiple workers.

import pika
import json
from typing import Dict

class RabbitMQProducer:
    """
    RabbitMQ producer for task distribution.
    I use this for background job processing.
    """
    
    def __init__(self, host: str = 'localhost'):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters(host=host)
        )
        self.channel = self.connection.channel()
    
    def send_task(self, queue_name: str, task_data: Dict):
        """Send task to queue."""
        # Declare queue (idempotent)
        self.channel.queue_declare(
            queue=queue_name,
            durable=True  # Survive broker restart
        )
        
        # Publish message
        self.channel.basic_publish(
            exchange='',
            routing_key=queue_name,
            body=json.dumps(task_data),
            properties=pika.BasicProperties(
                delivery_mode=2,  # Make message persistent
                content_type='application/json'
            )
        )
        print(f"Sent task to {queue_name}: {task_data}")
    
    def close(self):
        """Close connection."""
        self.connection.close()

# Usage: Send email task
producer = RabbitMQProducer()
producer.send_task('email_queue', {
    'to': '[email protected]',
    'subject': 'Welcome!',
    'body': 'Thanks for signing up.'
})

import pika
import json
import time

class RabbitMQConsumer:
    """
    RabbitMQ consumer/worker.
    Multiple instances can run in parallel to process tasks.
    """
    
    def __init__(self, host: str = 'localhost'):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters(host=host)
        )
        self.channel = self.connection.channel()
    
    def start_consuming(self, queue_name: str, callback: Callable):
        """Start consuming messages."""
        # Declare queue
        self.channel.queue_declare(queue=queue_name, durable=True)
        
        # Fair dispatch - don't give worker new task until previous is done
        self.channel.basic_qos(prefetch_count=1)
        
        # Setup consumer
        self.channel.basic_consume(
            queue=queue_name,
            on_message_callback=self._create_callback(callback),
            auto_ack=False  # Manual acknowledgment
        )
        
        print(f'Waiting for messages in {queue_name}...')
        self.channel.start_consuming()
    
    def _create_callback(self, user_callback: Callable):
        """Create callback wrapper with error handling."""
        def callback(ch, method, properties, body):
            try:
                # Parse message
                message = json.loads(body)
                print(f"Processing: {message}")
                
                # Execute user callback
                user_callback(message)
                
                # Acknowledge message
                ch.basic_ack(delivery_tag=method.delivery_tag)
                print("Task completed successfully")
                
            except Exception as e:
                print(f"Error processing message: {e}")
                # Reject and requeue message
                ch.basic_nack(
                    delivery_tag=method.delivery_tag,
                    requeue=True
                )
        
        return callback

# Worker implementation
def process_email(task_data: Dict):
    """Process email sending task."""
    to = task_data['to']
    subject = task_data['subject']
    body = task_data['body']
    
    # Simulate email sending
    print(f"Sending email to {to}...")
    time.sleep(2)  # Simulated work
    print(f"Email sent to {to}")

# Start worker
consumer = RabbitMQConsumer()
consumer.start_consuming('email_queue', process_email)

Pub/Sub Pattern

Multiple consumers receive the same message.

class RabbitMQPubSub:
    """
    Pub/Sub pattern using RabbitMQ exchanges.
    I use this for event broadcasting.
    """
    
    def __init__(self, host: str = 'localhost'):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters(host=host)
        )
        self.channel = self.connection.channel()
    
    def publish_event(self, exchange_name: str, event_type: str, event_data: Dict):
        """Publish event to all subscribers."""
        # Declare fanout exchange (broadcasts to all queues)
        self.channel.exchange_declare(
            exchange=exchange_name,
            exchange_type='fanout',
            durable=True
        )
        
        message = {
            'event_type': event_type,
            'data': event_data,
            'timestamp': datetime.utcnow().isoformat()
        }
        
        self.channel.basic_publish(
            exchange=exchange_name,
            routing_key='',  # Ignored for fanout
            body=json.dumps(message)
        )
        print(f"Published event: {event_type}")
    
    def subscribe(self, exchange_name: str, callback: Callable):
        """Subscribe to events from exchange."""
        # Declare exchange
        self.channel.exchange_declare(
            exchange=exchange_name,
            exchange_type='fanout',
            durable=True
        )
        
        # Create exclusive queue for this consumer
        result = self.channel.queue_declare(queue='', exclusive=True)
        queue_name = result.method.queue
        
        # Bind queue to exchange
        self.channel.queue_bind(exchange=exchange_name, queue=queue_name)
        
        # Start consuming
        self.channel.basic_consume(
            queue=queue_name,
            on_message_callback=lambda ch, method, props, body: callback(json.loads(body)),
            auto_ack=True
        )
        
        print(f"Subscribed to {exchange_name}")
        self.channel.start_consuming()

# Publisher
pubsub = RabbitMQPubSub()
pubsub.publish_event('user_events', 'user.registered', {
    'user_id': '12345',
    'email': '[email protected]'
})

# Subscriber 1: Send welcome email
def handle_user_registered_email(event: Dict):
    """Send welcome email when user registers."""
    if event['event_type'] == 'user.registered':
        user_id = event['data']['user_id']
        print(f"Sending welcome email to user {user_id}")

# Subscriber 2: Update analytics
def handle_user_registered_analytics(event: Dict):
    """Track user registration in analytics."""
    if event['event_type'] == 'user.registered':
        print("Recording registration event in analytics")

# Run subscribers in separate processes
subscriber1 = RabbitMQPubSub()
subscriber1.subscribe('user_events', handle_user_registered_email)

Apache Kafka

Kafka is my choice for high-throughput event streaming and building event-driven architectures.

Kafka Producer

from kafka import KafkaProducer
from kafka.errors import KafkaError
import json

class EventProducer:
    """
    Kafka producer for event streaming.
    I use this for high-volume event publishing.
    """
    
    def __init__(self, bootstrap_servers: list[str]):
        self.producer = KafkaProducer(
            bootstrap_servers=bootstrap_servers,
            value_serializer=lambda v: json.dumps(v).encode('utf-8'),
            acks='all',  # Wait for all replicas
            retries=3,
            max_in_flight_requests_per_connection=5,
            compression_type='gzip'
        )
    
    def publish_event(self, topic: str, event_type: str, event_data: Dict, key: str = None):
        """Publish event to Kafka topic."""
        message = {
            'event_type': event_type,
            'data': event_data,
            'timestamp': datetime.utcnow().isoformat()
        }
        
        # Send with callback for error handling
        future = self.producer.send(
            topic,
            value=message,
            key=key.encode('utf-8') if key else None
        )
        
        try:
            record_metadata = future.get(timeout=10)
            print(f"Event sent to {record_metadata.topic} partition {record_metadata.partition}")
        except KafkaError as e:
            print(f"Failed to send event: {e}")
    
    def close(self):
        """Flush and close producer."""
        self.producer.flush()
        self.producer.close()

# Usage
producer = EventProducer(['kafka-1:9092', 'kafka-2:9092', 'kafka-3:9092'])

# Publish order created event
producer.publish_event(
    topic='orders',
    event_type='order.created',
    event_data={
        'order_id': 'ord_123',
        'user_id': 'usr_456',
        'total': 99.99
    },
    key='usr_456'  # Ensures all events for same user go to same partition
)

Kafka Consumer

from kafka import KafkaConsumer
import json

class EventConsumer:
    """
    Kafka consumer for event processing.
    Supports consumer groups for parallel processing.
    """
    
    def __init__(
        self,
        topics: list[str],
        bootstrap_servers: list[str],
        group_id: str
    ):
        self.consumer = KafkaConsumer(
            *topics,
            bootstrap_servers=bootstrap_servers,
            group_id=group_id,
            auto_offset_reset='earliest',  # Start from beginning if no offset
            enable_auto_commit=False,  # Manual commit for reliability
            value_deserializer=lambda m: json.loads(m.decode('utf-8')),
            max_poll_records=100  # Process in batches
        )
    
    def start_consuming(self, callback: Callable):
        """Start consuming and processing events."""
        print(f"Starting consumer for topics: {self.consumer.subscription()}")
        
        try:
            for message in self.consumer:
                try:
                    # Process message
                    callback(message.value)
                    
                    # Commit offset after successful processing
                    self.consumer.commit()
                    
                except Exception as e:
                    print(f"Error processing message: {e}")
                    # Don't commit - message will be reprocessed
                    
        except KeyboardInterrupt:
            print("Shutting down consumer...")
        finally:
            self.consumer.close()

# Consumer implementation
def process_order_event(event: Dict):
    """Process order events."""
    event_type = event['event_type']
    data = event['data']
    
    if event_type == 'order.created':
        order_id = data['order_id']
        print(f"Processing new order: {order_id}")
        # Send confirmation email, update inventory, etc.
    
    elif event_type == 'order.cancelled':
        order_id = data['order_id']
        print(f"Processing cancelled order: {order_id}")
        # Refund payment, restore inventory, etc.

# Start consumer
consumer = EventConsumer(
    topics=['orders'],
    bootstrap_servers=['kafka-1:9092', 'kafka-2:9092'],
    group_id='order-processor'
)
consumer.start_consuming(process_order_event)

Kafka Consumer Groups

Multiple consumers in a group share the workload.

# Consumer Group Pattern
"""
Topic: orders (6 partitions)
Consumer Group: order-processors (3 consumers)

Partition Assignment:
- Consumer 1: Partitions 0, 1
- Consumer 2: Partitions 2, 3
- Consumer 3: Partitions 4, 5

If Consumer 2 fails, partitions 2 and 3 are reassigned to remaining consumers.
"""

# Start multiple consumers in same group
# Terminal 1
consumer1 = EventConsumer(
    topics=['orders'],
    bootstrap_servers=['kafka:9092'],
    group_id='order-processors'  # Same group
)
consumer1.start_consuming(process_order_event)

# Terminal 2
consumer2 = EventConsumer(
    topics=['orders'],
    bootstrap_servers=['kafka:9092'],
    group_id='order-processors'  # Same group
)
consumer2.start_consuming(process_order_event)

# Terminal 3
consumer3 = EventConsumer(
    topics=['orders'],
    bootstrap_servers=['kafka:9092'],
    group_id='order-processors'  # Same group
)
consumer3.start_consuming(process_order_event)

Event-Driven Architecture

Building systems around events enables loose coupling and scalability.

Event Sourcing Pattern

from typing import List
from datetime import datetime
import json

class EventStore:
    """
    Event sourcing implementation.
    Store all changes as events, rebuild state from events.
    """
    
    def __init__(self, kafka_producer):
        self.producer = kafka_producer
    
    def append_event(self, aggregate_id: str, event_type: str, event_data: Dict):
        """Append event to event stream."""
        event = {
            'aggregate_id': aggregate_id,
            'event_type': event_type,
            'data': event_data,
            'timestamp': datetime.utcnow().isoformat(),
            'version': self._get_next_version(aggregate_id)
        }
        
        # Publish to Kafka
        self.producer.publish_event(
            topic='event_store',
            event_type=event_type,
            event_data=event,
            key=aggregate_id  # All events for same aggregate in same partition
        )
        
        return event
    
    def get_events(self, aggregate_id: str) -> List[Dict]:
        """Get all events for an aggregate."""
        # In production, read from Kafka or dedicated event store
        pass
    
    def _get_next_version(self, aggregate_id: str) -> int:
        """Get next version number for aggregate."""
        events = self.get_events(aggregate_id)
        return len(events) + 1

class OrderAggregate:
    """
    Order aggregate rebuilt from events.
    """
    
    def __init__(self, order_id: str):
        self.order_id = order_id
        self.status = 'pending'
        self.items = []
        self.total = 0.0
        self.version = 0
    
    def apply_event(self, event: Dict):
        """Apply event to rebuild state."""
        event_type = event['event_type']
        data = event['data']
        
        if event_type == 'order.created':
            self.items = data['items']
            self.total = data['total']
            self.status = 'created'
        
        elif event_type == 'order.paid':
            self.status = 'paid'
        
        elif event_type == 'order.shipped':
            self.status = 'shipped'
            self.tracking_number = data['tracking_number']
        
        elif event_type == 'order.cancelled':
            self.status = 'cancelled'
            self.cancellation_reason = data['reason']
        
        self.version += 1
    
    def rebuild_from_events(self, events: List[Dict]):
        """Rebuild order state from event stream."""
        for event in sorted(events, key=lambda e: e['version']):
            self.apply_event(event)

# Usage
event_store = EventStore(kafka_producer)

# Record events
event_store.append_event('order_123', 'order.created', {
    'items': [{'id': 'prod_1', 'quantity': 2}],
    'total': 99.99
})

event_store.append_event('order_123', 'order.paid', {
    'payment_id': 'pay_456'
})

event_store.append_event('order_123', 'order.shipped', {
    'tracking_number': 'TRACK123'
})

# Rebuild state
order = OrderAggregate('order_123')
events = event_store.get_events('order_123')
order.rebuild_from_events(events)
print(f"Order status: {order.status}")  # shipped

Saga Pattern

Manage distributed transactions across services.

class OrderSaga:
    """
    Saga pattern for distributed order processing.
    Coordinates multiple services with compensating transactions.
    """
    
    def __init__(self, event_producer):
        self.producer = event_producer
    
    def create_order(self, order_data: Dict):
        """
        Execute order creation saga.
        Steps: Reserve inventory -> Process payment -> Create shipment
        """
        order_id = order_data['order_id']
        
        try:
            # Step 1: Reserve inventory
            self.producer.publish_event('inventory', 'inventory.reserve', {
                'order_id': order_id,
                'items': order_data['items']
            })
            
            # Step 2: Process payment
            self.producer.publish_event('payments', 'payment.process', {
                'order_id': order_id,
                'amount': order_data['total']
            })
            
            # Step 3: Create shipment
            self.producer.publish_event('shipping', 'shipment.create', {
                'order_id': order_id,
                'address': order_data['shipping_address']
            })
            
            # Success
            self.producer.publish_event('orders', 'order.completed', {
                'order_id': order_id
            })
            
        except Exception as e:
            # Trigger compensating transactions
            self.compensate_order(order_id)
            raise
    
    def compensate_order(self, order_id: str):
        """Execute compensating transactions to rollback."""
        # Reverse the operations
        self.producer.publish_event('shipping', 'shipment.cancel', {
            'order_id': order_id
        })
        
        self.producer.publish_event('payments', 'payment.refund', {
            'order_id': order_id
        })
        
        self.producer.publish_event('inventory', 'inventory.release', {
            'order_id': order_id
        })
        
        self.producer.publish_event('orders', 'order.failed', {
            'order_id': order_id
        })

Dead Letter Queues

Handle failed messages that can't be processed.

class DeadLetterQueue:
    """
    Dead letter queue for failed messages.
    I use this to prevent message loss and enable debugging.
    """
    
    def __init__(self, rabbitmq_channel):
        self.channel = rabbitmq_channel
        self._setup_dlq()
    
    def _setup_dlq(self):
        """Setup dead letter exchange and queue."""
        # Declare DLQ exchange
        self.channel.exchange_declare(
            exchange='dlx',
            exchange_type='topic',
            durable=True
        )
        
        # Declare DLQ
        self.channel.queue_declare(
            queue='dead_letter_queue',
            durable=True
        )
        
        # Bind DLQ to DLX
        self.channel.queue_bind(
            exchange='dlx',
            queue='dead_letter_queue',
            routing_key='#'
        )
    
    def setup_queue_with_dlq(self, queue_name: str, max_retries: int = 3):
        """Setup queue with dead letter queue."""
        # Main queue with DLX configured
        self.channel.queue_declare(
            queue=queue_name,
            durable=True,
            arguments={
                'x-dead-letter-exchange': 'dlx',
                'x-dead-letter-routing-key': f'failed.{queue_name}',
                'x-max-length': 10000,  # Prevent queue overflow
            }
        )

# Setup
dlq = DeadLetterQueue(channel)
dlq.setup_queue_with_dlq('email_queue')

# Consumer with retry logic
def process_with_retry(message: Dict, max_retries: int = 3):
    """Process message with retries, send to DLQ after max retries."""
    retry_count = message.get('retry_count', 0)
    
    try:
        # Process message
        send_email(message['to'], message['subject'], message['body'])
    except Exception as e:
        if retry_count < max_retries:
            # Retry with backoff
            delay = 2 ** retry_count  # Exponential backoff
            message['retry_count'] = retry_count + 1
            
            # Re-publish with delay
            producer.send_delayed_task('email_queue', message, delay)
        else:
            # Max retries exceeded - will go to DLQ
            raise

Real-World Architecture

Here's a production event-driven system I built:

┌─────────────┐
│   API       │ ──┐
│   Gateway   │   │
└─────────────┘   │
                  ▼
            ┌──────────┐
            │  Kafka   │
            │  Cluster │
            └──────────┘
                  │
        ┌─────────┼─────────┐
        ▼         ▼         ▼
    ┌──────┐  ┌──────┐  ┌──────┐
    │Email │  │Push  │  │SMS   │
    │Worker│  │Worker│  │Worker│
    └──────┘  └──────┘  └──────┘

Benefits achieved:

Handled 10,000+ events/second
99.9% delivery reliability
Zero downtime deployments
Easy to add new consumers
Full event audit trail

Lessons Learned

What worked:

Kafka for high-throughput event streaming
RabbitMQ for traditional queuing with complex routing
Always use dead letter queues
Monitor queue depth and consumer lag
Idempotent consumers (handle duplicate messages gracefully)

What didn't work:

Synchronous processing for everything
Not setting message TTL (queues filled up)
Auto-acknowledge before processing (lost messages on failures)
Shared queue for different priority tasks
Not monitoring consumer lag

What's Next

Understanding async processing, let's explore API design patterns:

API Design →: REST, GraphQL, and gRPC best practices

Navigation:

PreviousDatabase Design NextAPI Design

Last updated 1 month ago

hashtagWhy Asynchronous Processing Matters

hashtagMessage Queue Fundamentals

hashtagWhen to Use Message Queues

hashtagKey Concepts

hashtagRabbitMQ Patterns

hashtagWork Queue Pattern

hashtagPub/Sub Pattern

hashtagApache Kafka

hashtagKafka Producer

hashtagKafka Consumer

hashtagKafka Consumer Groups

hashtagEvent-Driven Architecture

hashtagEvent Sourcing Pattern

hashtagSaga Pattern

hashtagDead Letter Queues

hashtagReal-World Architecture

hashtagLessons Learned

hashtagWhat's Next