Event-Driven Architecture Basics
Table of Contents
Introduction
When building my multi-tenant POS system with 6 microservices, I quickly learned that direct service-to-service HTTP calls created tight coupling. Every time inventory changed, I had to update the POS Core service manually. When a payment completed, I needed to notify multiple services. This tight coupling made changes risky and deployments complicated.
Event-driven architecture (EDA) changed everything. Instead of services calling each other directly, they publish events when something important happens and subscribe to events they care about. This decoupling allowed services to evolve independently while maintaining consistency across the system.
In this article, I'll share how I implemented event-driven patterns in my POS system, starting with a simple in-process event bus and evolving to handle multi-tenant scenarios. We'll use Python to build practical, production-ready event infrastructure.
Understanding Events in Distributed Systems
Events represent facts about things that have happened in your system. Unlike commands (which request an action), events are notifications about completed state changes.
Key characteristics of events:
Immutable: Events describe the past and cannot be changed
Named in past tense:
OrderPlaced,PaymentCompleted,InventoryUpdatedContain sufficient context: Include all data needed by subscribers
Fire-and-forget: Publishers don't wait for subscriber responses
In my POS system, I identified three categories of events:
Domain Events: Business-meaningful changes (OrderPlaced, ProductOutOfStock)
Integration Events: Cross-service notifications (InventoryStockChanged, PaymentProcessed)
System Events: Infrastructure concerns (ServiceStarted, CacheInvalidated)
Event Types in the POS System
Let me show you the core events flowing through my 6-microservice architecture:
This event structure gives me everything I need:
Tenant isolation: Every event carries
tenant_idTraceability: Unique
event_idand timestampVersioning: Support for event schema evolution
Type safety: Python dataclasses with validation
Implementing an In-Process Event Bus
For services that need internal event handling (before implementing a message broker), I built a simple in-process event bus:
This simple event bus gives me:
Async support: Handlers run concurrently
Error isolation: One handler failure doesn't affect others
Middleware: Cross-cutting concerns (logging, metrics)
Type safety: Handlers subscribe to specific event types
Publisher-Subscriber Pattern
Here's how I use the event bus in the Inventory Service:
Key patterns I learned:
Publish after persistence: Only publish events after database commits succeed
Multiple subscribers: Different services handle the same event differently
Fire and forget: Publishers don't wait for subscriber responses
Idempotency: Handlers should handle duplicate events gracefully
Event Versioning for Multi-Tenancy
As my POS system evolved, I needed to change event schemas without breaking existing subscribers. Here's my versioning strategy:
This versioning approach lets me:
Add new fields without breaking existing code
Upcast old events to new schema for unified processing
Support multiple tenants with different event versions
Evolve schemas incrementally
Production Lessons Learned
Lesson 1: Event Ordering Matters
I discovered ordering issues when processing rapid inventory updates. A sale reducing stock by 5 units followed immediately by a restock of 10 units could arrive out of order, causing incorrect stock levels.
Solution: Add sequence numbers to events:
Lesson 2: Idempotent Event Handlers
During a deployment, some events were published twice. Handlers that weren't idempotent caused duplicate notifications and incorrect calculations.
Solution: Track processed events:
Lesson 3: Event Monitoring
Without visibility into event flow, debugging production issues was painful. I added event telemetry:
Best Practices
Based on my production experience with the POS system:
Events are facts, not commands
Name events in past tense:
OrderPlaced, notPlaceOrderEvents describe what happened, not what should happen
Include sufficient context
Events should contain all data subscribers need
Avoid forcing subscribers to make additional queries
Publish after persistence
Only publish events after database transactions commit
Prevents inconsistencies if rollback occurs
Design for failure
Make handlers idempotent (can process same event multiple times safely)
Implement retry logic with exponential backoff
Use dead-letter queues for permanently failed events
Version your events
Add version field to all events
Use optional fields for schema evolution
Implement upcasting for backward compatibility
Monitor event flow
Track event publishing and consumption rates
Alert on event processing delays
Log event lifecycle for debugging
Tenant isolation
Always include
tenant_idin eventsValidate tenant context in subscribers
Separate event streams per tenant if needed
Next Steps
In this article, we explored event-driven architecture basics using an in-process event bus. This pattern works well for:
Internal service communication
Decoupling components within a service
Small to medium traffic volumes
For production systems with higher scale, you'll want to introduce a message broker (RabbitMQ, Kafka, AWS SNS/SQS). The event patterns we've learned here translate directly to those systems.
In the next article, we'll explore Caching & Session Management, where events play a crucial role in cache invalidation strategies across our distributed POS system.
This is part of the Software Architecture 101 series, where I share lessons learned building a production multi-tenant POS system with 6 microservices.
Last updated