Upgrading AHV Firmware with Ansible Automation Platform

Last updated: March 4, 2026


Table of Contents


Why Automate Firmware Upgrades?

Nutanix's Life Cycle Manager (LCM) does a good job of managing firmware and software upgrades through the Prism Central UI. You click through a wizard, LCM inventories available updates, you select what to apply, and it handles the node-by-node rolling upgrade process.

But there are real operational reasons to drive this through Ansible and AAP instead:

  • Consistent change window enforcement β€” AAP job templates enforce that upgrades only run during approved maintenance windows via schedules and approval gates

  • Audit trail β€” every upgrade run is logged in AAP with who triggered it, what the before/after versions were, and the full output

  • Pre/post gating β€” I want automated health checks before and after every firmware change, not manual ones I might skip at 11pm

  • Multi-cluster coordination β€” I manage three CE clusters. A single AAP workflow upgrades all three sequentially with health validation between stages

This article shows the playbooks and AAP configuration I use to automate AHV and node firmware upgrades on my CE clusters.


Nutanix Upgrade Concepts – LCM vs Manual

Life Cycle Manager (LCM)

LCM is Nutanix's built-in upgrade orchestration framework, accessible via:

LCM handles:

Component
What LCM Can Upgrade

AOS

Nutanix operating system on CVMs

AHV

Hypervisor version on each node

Firmware

BIOS, BMC, HBA, NIC, SSD firmware

Prism Central

PC appliance software

NCC

Nutanix Cluster Check health framework

Foundation

Imaging tool on nodes

LCM operates by:

  1. Inventory β€” scans available updates from Nutanix portal or a dark-site bundle

  2. Pre-upgrade checks β€” validates cluster health, version compatibility, disk space

  3. Rolling upgrade β€” upgrades one node at a time, live-migrates VMs off (for AHV upgrades), then upgrades that node, brings it back

Manual Upgrade (for reference)

Without LCM:

Manual upgrades are rarely needed but useful to understand β€” LCM wraps similar calls internally.


Understanding AHV Firmware Upgrade Scope

When I say "AHV firmware upgrade" in this article, I mean both:

  1. AHV hypervisor version upgrade (e.g., 20230302.x β†’ 20240304.x) β€” manages the KVM/QEMU layer version across nodes

  2. Node hardware firmware (BIOS, BMC/iDRAC, NIC, SSD) β€” applied by LCM during a firmware update cycle

Both are managed through the LCM API via Prism Central. The Ansible playbooks in this article call the LCM REST API to:

  1. Run an LCM inventory (discover available updates)

  2. Create an LCM update plan for the target components

  3. Execute the plan

  4. Poll until completion

  5. Run NCC post-upgrade checks


Ansible Automation Platform Setup

I run AAP 2.4 on a small VM (4 vCPU, 8 GB RAM) inside my Nutanix CE cluster β€” yes, it lives on the same cluster it manages, which is fine for CE but something to think carefully about for production.

AAP Version and Collections Required

No dedicated Nutanix collection exists for LCM β€” LCM operations use the standard Prism Central REST API via ansible.builtin.uri.

Execution Environment

I created a custom Execution Environment image with Nutanix-specific Python dependencies:

Build and push to your private registry, then register in AAP:


Inventory and Credentials

Inventory

Credentials in AAP

Create two credential types in AAP:

Prism Central API Credential (Custom type):

CVM SSH Credential (Machine type):


Pre-Upgrade Health Check Playbook

Run this playbook before any firmware or AHV upgrade. It fails immediately if the cluster is not in a healthy state, preventing upgrades on a degraded cluster.


Trigger LCM Firmware Upgrade via Prism Central API

The LCM update flow is:

  1. GET entity list β†’ identify UUIDs and target versions for the components you want to upgrade

  2. POST /lcm/v1.r0/operations/update β†’ create the update plan

  3. Poll the returned task UUID until completion

Note on the entity_update_spec_list format: The LCM API expects a list of objects with uuid and to_version. The approach above is illustrative; for production use, build the list explicitly with a set_fact loop over known entity UUIDs and target versions instead of relying on zip transforms.


Poll Upgrade Progress

LCM upgrades for AHV can take 30–90 minutes (node-by-node, with VMs live-migrated off each node). This playbook polls until the task completes.


Post-Upgrade Validation Playbook

After LCM completes, validate that:

  1. All CVMs are UP

  2. All node services are running

  3. NCC health check passes

  4. AHV version matches the expected target


Full Pipeline: Job Template Chain in AAP

In AAP, chain the playbooks into a Workflow Job Template with approval gates:

AAP Schedule for Maintenance Windows

Notification on Failure


Rollback Considerations

Nutanix LCM does not have a one-click rollback for AHV or firmware upgrades. Here is what I do instead:

AHV Version Rollback

AHV upgrade is managed by LCM and is typically forward-only once applied. However:

  1. Nutanix Support can provide a downgrade path for AHV β€” contact them with your ticket before attempting

  2. For CE, the fastest recovery from a bad AHV upgrade is to re-image the node using Foundation

Pre-Upgrade Snapshot of CVMs

Before triggering LCM, snapshot all CVM VMs as a precaution (though CVM snapshots are rarely needed β€” Nutanix recommends against CVM snapshots in production, but for CE this is acceptable insurance):

Firmware Rollback

For BMC/BIOS firmware changes:

  • Nutanix LCM stores the previous firmware version metadata in its internal DB

  • In some versions, LCM supports rollback for specific components; check lcm/v1.r0/resources/entities/list for rollback_version in entity metadata

  • Hardware vendor tools (e.g., iDRAC firmware rollback) remain an option if LCM rollback is not available


What I Learned the Hard Way

1. Never run LCM upgrades with degraded storage

I once triggered a firmware upgrade while a disk was showing Warning in Prism. LCM started but paused mid-way when the node it was upgrading couldn't confirm RF2 compliance. The cluster entered a partially upgraded state that took an hour to resolve. The pre-upgrade NCC check should catch this β€” do not skip it.

2. VM live migration must work before AHV upgrade

LCM drains each node before upgrading AHV β€” it live-migrates all VMs off the node. If your CPUs between nodes are different generations (e.g., mixed NUC 12 and NUC 13), live migration may fail for VMs without CPU masking. Test live migration manually before LCM runs:

Fix CPU compatibility issues (enable EVC/CPU masking) before triggering any AHV upgrade.

3. LCM task API vs v3 task API

LCM uses https://pc:9440/lcm/v1.r0/ for its own operations, but the task polling endpoint for LCM tasks uses the standard v3 tasks API at https://pc:9440/api/nutanix/v3/tasks/<uuid>. This inconsistency caught me off guard β€” I was polling the LCM endpoint for task status and getting 404 errors until I found this in the LCM API docs.

4. CE LCM requires internet access for inventory

CE clusters need outbound HTTPS to download.nutanix.com for LCM to discover available updates. In a fully air-gapped CE setup, you need to set up a dark-site LCM bundle. In AAP, make sure your execution environment can reach the Prism Central IP, and that PC can reach Nutanix download servers during inventory.


Next Steps


Tags: Nutanix, AHV, LCM, Life Cycle Manager, Ansible, Ansible Automation Platform, AAP, Firmware Upgrade, AOS Upgrade, Prism Central API, Community Edition

Last updated