The circuit breaker pattern is a critical design pattern for managing complex multi-step workflows in Ansible. It provides centralized error handling, graceful failure management, and prevents cascading failures in automation workflows. This pattern is especially valuable when orchestrating complex API integrations, multi-system configurations, or service provisioning workflows.
In this guide, I'll explain how to implement the circuit breaker pattern using a control variable that acts as a "kill switch" for your playbook execution when errors occur.
Understanding the Circuit Breaker Pattern
The circuit breaker pattern in software design is inspired by electrical circuit breakers - when something goes wrong, the breaker "trips" and stops the flow of execution. In Ansible, this translates to using a boolean variable that controls whether tasks should continue executing.
Core Concepts
The continue_play Variable
The continue_play variable serves as the circuit breaker switch:
true: Circuit is CLOSED - tasks execute normally
false: Circuit is OPEN - subsequent tasks are skipped
The play_error_message Array
This array collects all errors that occur during playbook execution, providing comprehensive error reporting at the end.
Implementation Pattern
1. Initialization
Always initialize the circuit breaker variables at the start of your playbook:
2. Task Execution with Circuit Breaker Check
Each task should check the circuit breaker state before executing:
3. Error Detection and Circuit Breaking
After each critical task, validate the result and trip the circuit breaker if needed:
4. Using Previous Results in Subsequent Tasks
Tasks can safely use results from previous tasks because they won't execute if earlier tasks failed:
5. Cleanup and Reporting
Always execute cleanup and reporting tasks WITHOUT the continue_play check:
Complete Example
Here's a complete example demonstrating the circuit breaker pattern for a user provisioning workflow:
Advanced Patterns
Multiple Error Collection Points
You can collect multiple validation errors before breaking the circuit:
Conditional Circuit Breaking
Not all errors should break the circuit. Some failures might be acceptable:
Partial Rollback with Circuit Breaker
Implement rollback logic when the circuit breaks:
Flow Diagrams
Basic Circuit Breaker Flow
Multi-Phase Circuit Breaker Flow
Use Cases
1. Prevents Cascading Failures
When authentication fails, there's no point in attempting to create resources that require authentication:
2. Comprehensive Error Collection
Multiple validation checks can fail, and all errors are collected for reporting:
3. External System Integration
When integrating with ticketing systems, monitoring platforms, or CMDB systems:
4. Ensures Cleanup Happens Even on Failure
Critical cleanup tasks should always run, regardless of circuit state:
Benefits of the Circuit Breaker Pattern
✅ Fail Fast: Stops execution immediately when critical errors occur, saving time and resources
✅ No Redundant Operations: Prevents wasting time on operations that will inevitably fail
✅ Better Debugging: All errors collected in a single array for comprehensive troubleshooting
✅ Graceful Degradation: Cleanup and reporting tasks always execute, maintaining system consistency
✅ External Integration: Provides detailed error logs for integration with monitoring and ticketing systems
✅ Prevents Partial State: Avoids creating half-configured resources that are difficult to clean up
✅ Audit Trail: Complete record of what succeeded and what failed for compliance and troubleshooting
✅ Reduces Complexity: Eliminates need for complex nested block/rescue statements throughout playbook
Alternative Approaches (Without Circuit Breaker)
Without the circuit breaker pattern, you would need:
Complex Error Handling per Task
Multiple Nested Block/Rescue Statements
This becomes difficult to read and maintain, especially with complex workflows.
Best Practices
Initialize Early: Always initialize continue_play and play_error_message at the start of your playbook
Be Consistent: Apply the when: continue_play check consistently to all critical tasks
Descriptive Error Messages: Make error messages clear and actionable
Separate Cleanup Logic: Clearly distinguish between tasks that check circuit state and those that always run
Test Failure Paths: Explicitly test scenarios where the circuit breaks to ensure proper error handling
Document Your Flow: Add comments explaining when and why the circuit should break
Use with External Systems: Integrate circuit breaker status with external monitoring and ticketing systems
Consider Idempotency: Ensure that partial executions don't leave systems in inconsistent states
Conclusion
The circuit breaker pattern is an elegant, maintainable solution for managing complex multi-step Ansible workflows. It provides:
Centralized error handling
Comprehensive error collection
Graceful failure management
Clean separation between business logic and error handling
Easy integration with external systems
By using a simple boolean variable and consistent conditional checks, you can build robust, production-ready automation that fails fast, reports comprehensively, and always cleans up properly.
- name: Step 2 - Create user account
uri:
url: "https://api.example.com/users"
method: POST
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
body_format: json
body:
username: "{{ new_username }}"
email: "{{ new_email }}"
register: create_result
when: continue_play # Safe to use auth_result.json.token
- name: Logout from API
uri:
url: "https://api.example.com/logout"
method: POST
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
when: auth_result is defined and auth_result.json.token is defined
# Note: No continue_play check - cleanup should always attempt
- name: Report final status
debug:
msg: |
Status: {% if continue_play %}Success{% else %}Failed{% endif %}
{% if not continue_play %}
Errors:
{% for error in play_error_message %}
- {{ error }}
{% endfor %}
{% endif %}
---
- name: User Provisioning with Circuit Breaker Pattern
hosts: localhost
gather_facts: no
vars:
continue_play: true
play_error_message: []
api_base_url: "https://api.example.com"
new_username: "john.doe"
new_email: "[email protected]"
tasks:
# ===== STEP 1: VALIDATE INPUT =====
- name: Validate required variables
set_fact:
continue_play: false
play_error_message: "{{ play_error_message + ['Missing required variable: ' + item] }}"
when:
- continue_play
- vars[item] is not defined or vars[item] | length == 0
loop:
- new_username
- new_email
- api_base_url
loop_control:
label: "{{ item }}"
# ===== STEP 2: AUTHENTICATE =====
- name: Authenticate to API
uri:
url: "{{ api_base_url }}/auth/login"
method: POST
body_format: json
body:
username: "{{ lookup('env', 'API_USER') }}"
password: "{{ lookup('env', 'API_PASSWORD') }}"
status_code: [200, 201]
register: auth_result
when: continue_play
failed_when: false # Don't fail immediately, let circuit breaker handle it
- name: Validate authentication response
set_fact:
continue_play: false
play_error_message: "{{ play_error_message + ['Authentication failed with status ' + (auth_result.status | string)] }}"
when:
- continue_play
- auth_result.status is defined
- auth_result.status not in [200, 201]
# ===== STEP 3: CHECK IF USER EXISTS =====
- name: Check if user already exists
uri:
url: "{{ api_base_url }}/users/{{ new_username }}"
method: GET
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
status_code: [200, 404]
register: user_check
when: continue_play
failed_when: false
- name: Fail if user already exists
set_fact:
continue_play: false
play_error_message: "{{ play_error_message + ['User ' + new_username + ' already exists'] }}"
when:
- continue_play
- user_check.status == 200
# ===== STEP 4: CREATE USER =====
- name: Create new user account
uri:
url: "{{ api_base_url }}/users"
method: POST
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
body_format: json
body:
username: "{{ new_username }}"
email: "{{ new_email }}"
active: true
status_code: [200, 201]
register: create_result
when: continue_play
failed_when: false
- name: Validate user creation
set_fact:
continue_play: false
play_error_message: "{{ play_error_message + ['User creation failed with status ' + (create_result.status | string)] }}"
when:
- continue_play
- create_result.status is defined
- create_result.status not in [200, 201]
# ===== STEP 5: ASSIGN PERMISSIONS =====
- name: Assign default permissions
uri:
url: "{{ api_base_url }}/users/{{ new_username }}/permissions"
method: POST
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
body_format: json
body:
permissions:
- read
- write
status_code: [200, 201]
register: permission_result
when: continue_play
failed_when: false
- name: Validate permission assignment
set_fact:
continue_play: false
play_error_message: "{{ play_error_message + ['Permission assignment failed'] }}"
when:
- continue_play
- permission_result.status is defined
- permission_result.status not in [200, 201]
# ===== CLEANUP: ALWAYS RUN =====
- name: Logout from API
uri:
url: "{{ api_base_url }}/auth/logout"
method: POST
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
when:
- auth_result is defined
- auth_result.json is defined
- auth_result.json.token is defined
failed_when: false
# Note: No continue_play check - cleanup always attempts
# ===== REPORTING: ALWAYS RUN =====
- name: Set success status
set_fact:
final_status: "Success"
final_message: "User {{ new_username }} created successfully with default permissions"
when: continue_play
- name: Set failure status
set_fact:
final_status: "Failed"
final_message: "User provisioning failed"
when: not continue_play
- name: Display final report
debug:
msg: |
=====================================
User Provisioning Report
=====================================
Status: {{ final_status }}
Message: {{ final_message }}
{% if not continue_play %}
Errors Encountered:
{% for error in play_error_message %}
{{ loop.index }}. {{ error }}
{% endfor %}
{% endif %}
=====================================
- name: Validate all input parameters
set_fact:
validation_errors: []
- name: Check username format
set_fact:
validation_errors: "{{ validation_errors + ['Username must be alphanumeric'] }}"
when:
- new_username is defined
- not (new_username | regex_search('^[a-zA-Z0-9]+$'))
- name: Check email format
set_fact:
validation_errors: "{{ validation_errors + ['Invalid email format'] }}"
when:
- new_email is defined
- not (new_email | regex_search('^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'))
- name: Break circuit if validation failed
set_fact:
continue_play: false
play_error_message: "{{ play_error_message + validation_errors }}"
when: validation_errors | length > 0
- name: Optional step - Send notification email
uri:
url: "{{ notification_api }}/send"
method: POST
body:
to: "{{ admin_email }}"
subject: "New user created"
register: notify_result
when: continue_play
failed_when: false # Don't break circuit on notification failure
- name: Log notification failure (but don't break circuit)
debug:
msg: "Warning: Failed to send notification email, but continuing..."
when:
- continue_play
- notify_result.status is defined
- notify_result.status != 200
- name: Rollback - Delete created user
uri:
url: "{{ api_base_url }}/users/{{ new_username }}"
method: DELETE
headers:
Authorization: "Bearer {{ auth_result.json.token }}"
when:
- not continue_play # Circuit is broken
- create_result is defined # User was created
- create_result.status in [200, 201] # Creation succeeded
failed_when: false
# If authentication fails, circuit breaks
- name: Authenticate to system
include_role:
name: authenticate
when: continue_play
# This won't run if authentication failed
- name: Create resources
include_role:
name: create_resources
when: continue_play
# At the end of playbook
- name: Update ticket status to Completed
uri:
url: "{{ ticketing_system }}/api/tickets/{{ ticket_id }}"
method: PUT
body:
status: "Completed"
resolution_notes: "{{ final_message }}"
when: continue_play
- name: Update ticket status to Failed
uri:
url: "{{ ticketing_system }}/api/tickets/{{ ticket_id }}"
method: PUT
body:
status: "Failed"
error_log: "{{ play_error_message | join('\n') }}"
when: not continue_play
# Logout runs WITHOUT continue_play check
- name: Logout from all systems
include_tasks: logout.yml
# No when: continue_play - always runs
# Status reporting always runs
- name: Report execution status
include_tasks: report_status.yml
# No when: continue_play - always runs
# But individual cleanup tasks can check if resources were created
- name: Disconnect from VPN
include_tasks: vpn_disconnect.yml
when: vpn_connected is defined and vpn_connected
# Checks if VPN was actually connected, not circuit state