Circuit Breaker Pattern

Introduction

The circuit breaker pattern is a critical design pattern for managing complex multi-step workflows in Ansible. It provides centralized error handling, graceful failure management, and prevents cascading failures in automation workflows. This pattern is especially valuable when orchestrating complex API integrations, multi-system configurations, or service provisioning workflows.

In this guide, I'll explain how to implement the circuit breaker pattern using a control variable that acts as a "kill switch" for your playbook execution when errors occur.

Understanding the Circuit Breaker Pattern

The circuit breaker pattern in software design is inspired by electrical circuit breakers - when something goes wrong, the breaker "trips" and stops the flow of execution. In Ansible, this translates to using a boolean variable that controls whether tasks should continue executing.

spinner

Core Concepts

The continue_play Variable

The continue_play variable serves as the circuit breaker switch:

  • true: Circuit is CLOSED - tasks execute normally

  • false: Circuit is OPEN - subsequent tasks are skipped

The play_error_message Array

This array collects all errors that occur during playbook execution, providing comprehensive error reporting at the end.

Implementation Pattern

1. Initialization

Always initialize the circuit breaker variables at the start of your playbook:

2. Task Execution with Circuit Breaker Check

Each task should check the circuit breaker state before executing:

3. Error Detection and Circuit Breaking

After each critical task, validate the result and trip the circuit breaker if needed:

4. Using Previous Results in Subsequent Tasks

Tasks can safely use results from previous tasks because they won't execute if earlier tasks failed:

5. Cleanup and Reporting

Always execute cleanup and reporting tasks WITHOUT the continue_play check:

Complete Example

Here's a complete example demonstrating the circuit breaker pattern for a user provisioning workflow:

Advanced Patterns

Multiple Error Collection Points

You can collect multiple validation errors before breaking the circuit:

Conditional Circuit Breaking

Not all errors should break the circuit. Some failures might be acceptable:

Partial Rollback with Circuit Breaker

Implement rollback logic when the circuit breaks:

Flow Diagrams

Basic Circuit Breaker Flow

spinner

Multi-Phase Circuit Breaker Flow

spinner

Use Cases

1. Prevents Cascading Failures

When authentication fails, there's no point in attempting to create resources that require authentication:

2. Comprehensive Error Collection

Multiple validation checks can fail, and all errors are collected for reporting:

3. External System Integration

When integrating with ticketing systems, monitoring platforms, or CMDB systems:

4. Ensures Cleanup Happens Even on Failure

Critical cleanup tasks should always run, regardless of circuit state:

Benefits of the Circuit Breaker Pattern

Fail Fast: Stops execution immediately when critical errors occur, saving time and resources

No Redundant Operations: Prevents wasting time on operations that will inevitably fail

Better Debugging: All errors collected in a single array for comprehensive troubleshooting

Graceful Degradation: Cleanup and reporting tasks always execute, maintaining system consistency

External Integration: Provides detailed error logs for integration with monitoring and ticketing systems

Prevents Partial State: Avoids creating half-configured resources that are difficult to clean up

Audit Trail: Complete record of what succeeded and what failed for compliance and troubleshooting

Reduces Complexity: Eliminates need for complex nested block/rescue statements throughout playbook

Alternative Approaches (Without Circuit Breaker)

Without the circuit breaker pattern, you would need:

Complex Error Handling per Task

Multiple Nested Block/Rescue Statements

This becomes difficult to read and maintain, especially with complex workflows.

Best Practices

  1. Initialize Early: Always initialize continue_play and play_error_message at the start of your playbook

  2. Be Consistent: Apply the when: continue_play check consistently to all critical tasks

  3. Descriptive Error Messages: Make error messages clear and actionable

  4. Separate Cleanup Logic: Clearly distinguish between tasks that check circuit state and those that always run

  5. Test Failure Paths: Explicitly test scenarios where the circuit breaks to ensure proper error handling

  6. Document Your Flow: Add comments explaining when and why the circuit should break

  7. Use with External Systems: Integrate circuit breaker status with external monitoring and ticketing systems

  8. Consider Idempotency: Ensure that partial executions don't leave systems in inconsistent states

Conclusion

The circuit breaker pattern is an elegant, maintainable solution for managing complex multi-step Ansible workflows. It provides:

  • Centralized error handling

  • Comprehensive error collection

  • Graceful failure management

  • Clean separation between business logic and error handling

  • Easy integration with external systems

By using a simple boolean variable and consistent conditional checks, you can build robust, production-ready automation that fails fast, reports comprehensively, and always cleans up properly.

Last updated