Event-Driven Architecture Basics

Table of Contents

Introduction

When building my multi-tenant POS system with 6 microservices, I quickly learned that direct service-to-service HTTP calls created tight coupling. Every time inventory changed, I had to update the POS Core service manually. When a payment completed, I needed to notify multiple services. This tight coupling made changes risky and deployments complicated.

Event-driven architecture (EDA) changed everything. Instead of services calling each other directly, they publish events when something important happens and subscribe to events they care about. This decoupling allowed services to evolve independently while maintaining consistency across the system.

In this article, I'll share how I implemented event-driven patterns in my POS system, starting with a simple in-process event bus and evolving to handle multi-tenant scenarios. We'll use Python to build practical, production-ready event infrastructure.

Understanding Events in Distributed Systems

Events represent facts about things that have happened in your system. Unlike commands (which request an action), events are notifications about completed state changes.

Key characteristics of events:

  • Immutable: Events describe the past and cannot be changed

  • Named in past tense: OrderPlaced, PaymentCompleted, InventoryUpdated

  • Contain sufficient context: Include all data needed by subscribers

  • Fire-and-forget: Publishers don't wait for subscriber responses

In my POS system, I identified three categories of events:

  1. Domain Events: Business-meaningful changes (OrderPlaced, ProductOutOfStock)

  2. Integration Events: Cross-service notifications (InventoryStockChanged, PaymentProcessed)

  3. System Events: Infrastructure concerns (ServiceStarted, CacheInvalidated)

Event Types in the POS System

Let me show you the core events flowing through my 6-microservice architecture:

This event structure gives me everything I need:

  • Tenant isolation: Every event carries tenant_id

  • Traceability: Unique event_id and timestamp

  • Versioning: Support for event schema evolution

  • Type safety: Python dataclasses with validation

Implementing an In-Process Event Bus

For services that need internal event handling (before implementing a message broker), I built a simple in-process event bus:

This simple event bus gives me:

  • Async support: Handlers run concurrently

  • Error isolation: One handler failure doesn't affect others

  • Middleware: Cross-cutting concerns (logging, metrics)

  • Type safety: Handlers subscribe to specific event types

Publisher-Subscriber Pattern

Here's how I use the event bus in the Inventory Service:

Key patterns I learned:

  • Publish after persistence: Only publish events after database commits succeed

  • Multiple subscribers: Different services handle the same event differently

  • Fire and forget: Publishers don't wait for subscriber responses

  • Idempotency: Handlers should handle duplicate events gracefully

Event Versioning for Multi-Tenancy

As my POS system evolved, I needed to change event schemas without breaking existing subscribers. Here's my versioning strategy:

This versioning approach lets me:

  • Add new fields without breaking existing code

  • Upcast old events to new schema for unified processing

  • Support multiple tenants with different event versions

  • Evolve schemas incrementally

Production Lessons Learned

Lesson 1: Event Ordering Matters

I discovered ordering issues when processing rapid inventory updates. A sale reducing stock by 5 units followed immediately by a restock of 10 units could arrive out of order, causing incorrect stock levels.

Solution: Add sequence numbers to events:

Lesson 2: Idempotent Event Handlers

During a deployment, some events were published twice. Handlers that weren't idempotent caused duplicate notifications and incorrect calculations.

Solution: Track processed events:

Lesson 3: Event Monitoring

Without visibility into event flow, debugging production issues was painful. I added event telemetry:

Best Practices

Based on my production experience with the POS system:

  1. Events are facts, not commands

    • Name events in past tense: OrderPlaced, not PlaceOrder

    • Events describe what happened, not what should happen

  2. Include sufficient context

    • Events should contain all data subscribers need

    • Avoid forcing subscribers to make additional queries

  3. Publish after persistence

    • Only publish events after database transactions commit

    • Prevents inconsistencies if rollback occurs

  4. Design for failure

    • Make handlers idempotent (can process same event multiple times safely)

    • Implement retry logic with exponential backoff

    • Use dead-letter queues for permanently failed events

  5. Version your events

    • Add version field to all events

    • Use optional fields for schema evolution

    • Implement upcasting for backward compatibility

  6. Monitor event flow

    • Track event publishing and consumption rates

    • Alert on event processing delays

    • Log event lifecycle for debugging

  7. Tenant isolation

    • Always include tenant_id in events

    • Validate tenant context in subscribers

    • Separate event streams per tenant if needed

Next Steps

In this article, we explored event-driven architecture basics using an in-process event bus. This pattern works well for:

  • Internal service communication

  • Decoupling components within a service

  • Small to medium traffic volumes

For production systems with higher scale, you'll want to introduce a message broker (RabbitMQ, Kafka, AWS SNS/SQS). The event patterns we've learned here translate directly to those systems.

In the next article, we'll explore Caching & Session Management, where events play a crucial role in cache invalidation strategies across our distributed POS system.


This is part of the Software Architecture 101 series, where I share lessons learned building a production multi-tenant POS system with 6 microservices.

Last updated