Automating Nutanix Backup with Ansible

Last updated: March 4, 2026


Table of Contents


Why I Automated Nutanix Backups with Ansible

Nutanix's built-in protection domain scheduler is functional — you set a cron-like schedule in Prism Element and snapshots happen automatically. But this approach has limitations that matter for my home-lab projects:

  • No pre/post hooks: I cannot quiesce a database or flush a write cache before the snapshot fires

  • No external notification: I want a webhook or log entry that tells me backups succeeded or failed

  • No conditional logic: I want to skip a backup if a particular service is actively running a migration

Ansible gives me full control: I write the backup flow as tasks, add pre/post database quiescing, log to a file, and run it on a schedule via cron or Ansible Automation Platform (AAP) if it's available.


Nutanix Backup and Snapshot Concepts

Before writing playbooks, it helps to understand the two distinct layers of backup in Nutanix:

VM Snapshots (Local)

A VM snapshot in Nutanix is a point-in-time copy of a VM's vDisks, stored locally on the same NDFS cluster. These use redirect-on-write copy-on-write semantics — reads from the snapshot point to original blocks, new writes go to new locations.

Local snapshots are:

  • Fast to create and restore

  • Limited by local cluster capacity

  • Not a substitute for off-site backup

Protection Domains (Prism Element)

A Protection Domain (PD) is an older but still widely used mechanism managed at the Prism Element level. A PD:

  • Contains one or more VMs (or consistency groups)

  • Has a local snapshot schedule (how often snapshots run, how many to keep)

  • Optionally has remote replication configured to replicate snapshots to a remote Nutanix cluster

Protection Policies and Nutanix DR (Prism Central)

The newer model, managed via Prism Central, uses:

  • Protection Policies — define RPO (Recovery Point Objective), retention count, and optional replication targets

  • Recovery Plans — define how applications fail over to a remote site

For my single-cluster home-lab, I work primarily with local VM snapshots and Protection Domains via the Prism Element API.


Installing the nutanix.ncp Ansible Collection

The official Nutanix Ansible collection is published on Ansible Galaxy:

Verify installation:

Key modules relevant to backup:

Module
Purpose

nutanix.ncp.ntnx_vms_info

Query VM details (get UUID by name)

nutanix.ncp.ntnx_vm_snapshots

Create/delete VM snapshots (v4 API)

nutanix.ncp.ntnx_vm_snapshots_info

List existing snapshots

nutanix.ncp.ntnx_protection_rules

Manage PC protection rules

For direct Prism Element protection domain operations, the nutanix.ncp modules target the Prism Central (PC) API. For PE-level PD management, I use the uri module against the Prism Element REST API directly.


Inventory and Credentials Setup

Inventory File

The ansible_connection=local for the Prism Central entry is intentional — the nutanix.ncp modules make API calls from the control node, not via SSH to Prism Central.

Vault-Encrypted Credentials File

Contents (vars/nutanix_credentials.yml):

Reference in playbooks:


Taking VM Snapshots with Ansible

The nutanix.ncp.ntnx_vm_snapshots module uses the v4 API introduced in Prism Central 2023.x and later.

Playbook: Create VM Snapshot

Playbook: List Existing Snapshots


Working with Protection Domains via Prism Element API

For protection domain management (more control over schedules and retain count), I use the Prism Element REST API v2 via Ansible's uri module, since the nutanix.ncp collection targets Prism Central.

Playbook: List Protection Domains

Playbook: Trigger a Manual Snapshot of a Protection Domain

Playbook: Add a VM to a Protection Domain


Application-Consistent Snapshots with Pre/Post Tasks

A crash-consistent snapshot is fine for stateless VMs. For database VMs, I want an application-consistent snapshot, which means:

  1. Tell the database to flush its write buffers to disk

  2. Take the snapshot

  3. Resume normal database operation

Nutanix does not have a built-in VSS or quiescing mechanism for Linux VMs beyond the AHV agent (which handles only basic crash consistency). For PostgreSQL on Linux, I handle this manually in the playbook with a pre/post hook.


Snapshot Retention and Cleanup

Without a cleanup policy, snapshots accumulate and consume NDFS capacity. My retention policy: keep the last 7 daily snapshots for each VM.


Backup Verification Playbook

Creating a snapshot is not the same as verifying the data is recoverable. For critical VMs, I run a periodic verification by restoring a snapshot to a temporary VM and checking that the data is present.


Scheduling Playbooks with AAP or Cron

Option 1: Cron on the Control Node

For a home-lab with no AAP, a simple cron job works fine:

Option 2: Ansible Automation Platform Job Templates

If you have AAP available, configure:

  1. Project: Point to the Git repo containing your playbooks

  2. Credential: Machine credential (vault password) and custom credential for Nutanix API

  3. Job Template per playbook: nutanix-daily-snapshot, nutanix-weekly-cleanup, nutanix-backup-verify

  4. Schedule: Set the recurrence within the job template (daily at 02:00, weekly cleanup, etc.)

This gives you full visibility into job run history, output logs, and failure notifications without managing cron manually.


What Works Well and What Does Not

What Works Well

  • nutanix.ncp collection is well-maintained: The collection modules are idiomatic Ansible — good changed state tracking, consistent return values

  • PE REST API is straightforward: For protection domain operations that lack a dedicated collection module, the raw REST API via uri is reliable and well-documented

  • Vault integration: Standard Ansible Vault keeps Prism Central credentials out of plaintext files

  • Pre/post snapshot hooks: The flexibility to quiesce applications before snapping is the main reason I chose Ansible over built-in Nutanix scheduler

What Does Not Work Well (or Needs Workarounds)

  • The ntnx_vm_snapshots module requires Prism Central 2023.x+ (v4 API): Older PC versions need the Prism Element v2 API directly via uri

  • Finding the right JSON path for VM IPs after snapshot restore: The response structure from ntnx_vms after a clone-from-snapshot operation changed between collection versions — always inspect actual output with debug: var=result first

  • Snapshot restore to a new VM with networking: After cloning from snapshot, the VM comes up with the same network config as the original. If you're verifying on the same network segment, there will be a MAC/IP conflict unless you modify the clone's NIC config before powering it on

  • The nutanix_validate_certs: false requirement for CE: Community Edition ships with a self-signed cert. In production, invest time in proper certificate management to avoid habitualizing the insecure flag


Next Steps

This completes the Nutanix 101 series. For further reading:

Go back to Nutanix 101 series index.

Last updated