When your Terraform plan introduces a breaking change—say, a misconfigured subnet that brings down a critical load balancer—quickly reverting the infrastructure can be the difference between uptime and an incident. In 2026, many teams are combining Pulumi and Ansible to create an automated rollback pipeline that catches failures in real time and restores the last known‑good state without manual intervention. This guide walks you through setting up a resilient rollback system that works across public clouds and hybrid environments.
Why a Dedicated Rollback Strategy Matters
Traditional Terraform workflows treat the state file as the source of truth but provide no built‑in, automated recovery path. If a terraform apply fails after provisioning partially, you’re left with a dangling resource configuration that must be cleaned up manually. In high‑availability services, even a few minutes of downtime can translate to significant revenue loss.
A dedicated rollback strategy offers several advantages:
- Speed: Automates the revert process in seconds, not hours.
- Safety: Uses proven IaC patterns (Pulumi state snapshots, Ansible idempotence) to ensure the environment returns to a consistent baseline.
- Auditability: Records rollback actions in logs and version control, making post‑mortem analysis easier.
- Scalability: Works across multi‑region deployments, Kubernetes clusters, and on‑prem hardware.
Architecture Overview
Our rollback pipeline is composed of three main components:
- Pre‑apply Hook (Pulumi): Captures a full state snapshot before the Terraform apply starts.
- Apply Monitor (Ansible): Wraps the Terraform command and watches for failures.
- Rollback Executor (Pulumi): If a failure is detected, this step restores the last snapshot and re‑applies the Terraform configuration to re‑establish the environment.
The flow looks like this:
┌─────────────────────┐
│ Pulumi Pre‑apply │
│ Snapshot Creation │
└─────────┬───────────┘
│
┌─────────▼───────────┐
│ Ansible Playbook │
│ - Runs 'terraform │
│ apply' │
│ - Monitors exit code│
└─────────┬───────────┘
│
┌─────────▼───────────┐
│ Pulumi Rollback │
│ (if failure detected)│
└─────────────────────┘
Step 1: Prepare Your Pulumi Project for State Snapshots
1.1 Configure Pulumi Stack Snapshots
In Pulumi, a stack represents a deployment target. To capture the state before a Terraform run, add a snapshot resource in your Pulumi.yaml:
name: infra-rollback
runtime: nodejs
description: "Infrastructure with automated rollback"
resources:
- type: pulumi:utils:Snapshot
name: preApplySnapshot
properties:
path: "state/snapshots/${stackName}.json"
When the stack is initialized, Pulumi writes the entire state tree to the specified path. Make sure the snapshot directory is included in your CI repository or stored in a secure S3 bucket.
1.2 Enable Cloud Provider Backups
Most cloud providers support automated backups of critical resources (EBS volumes, RDS snapshots, etc.). While Pulumi snapshots capture the IaC representation, you should still enforce provider‑level backups for data durability.
Step 2: Build the Ansible Playbook Wrapper
2.1 Install Required Collections
Run the following command to ensure you have the community.general collection, which provides the terraform module:
ansible-galaxy collection install community.general
2.2 Playbook Skeleton
Create apply-and-rollback.yml:
---
- name: Apply Terraform and Rollback on Failure
hosts: localhost
gather_facts: false
vars:
tf_dir: "./terraform"
rollback_flag: false
tasks:
- name: Run terraform init
community.general.terraform:
project_path: "{{ tf_dir }}"
state: present
register: init_result
- name: Run terraform apply
community.general.terraform:
project_path: "{{ tf_dir }}"
state: present
plan_file: "apply.plan"
auto_approve: true
register: apply_result
failed_when: apply_result.failed
ignore_errors: yes
- name: Check for apply failure
set_fact:
rollback_flag: true
when: apply_result.failed
- name: Trigger Pulumi rollback
command: |
pulumi up --stack {{ stack_name }} --target {{ stack_name }} --yes --skip-preview
when: rollback_flag
2.3 Error Handling & Logging
Ansible’s ignore_errors: yes allows the playbook to continue after a Terraform failure so that the rollback block can execute. The rollback_flag fact is set when the apply fails, ensuring Pulumi only runs when necessary.
Logging is handled automatically by Ansible’s verbose output. For production, direct the playbook output to a log file and include timestamps for audit purposes.
Step 3: Pulumi Rollback Logic
3.1 Restore the Snapshot
Within your Pulumi stack, add logic to read the snapshot file and replace the current state:
const fs = require('fs');
const path = require('path');
const pulumi = require('pulumi');
const snapshotPath = path.join('state/snapshots', `${pulumi.getStack()}.json`);
if (fs.existsSync(snapshotPath)) {
const snapshot = JSON.parse(fs.readFileSync(snapshotPath, 'utf8'));
// Use the snapshot data to recreate resources
// Example: Re-create a VPC
const vpc = new aws.ec2.Vpc('vpc', {
cidrBlock: snapshot.vpc.cidrBlock,
tags: snapshot.vpc.tags
});
}
Adjust the code to match your resource types. The key is that the stack reads the snapshot and constructs the same resources that existed before the failed apply.
3.2 Re‑Apply Terraform After Rollback
Once Pulumi has restored the state, run terraform apply again to bring the environment up to date with any subsequent changes. This can be triggered by a second Ansible task or by embedding the Terraform command within the Pulumi run script.
Step 4: CI/CD Integration
4.1 GitHub Actions Example
Below is a concise GitHub Actions workflow that triggers the rollback pipeline on push to the main branch:
name: IaC Apply
on:
push:
branches:
- main
jobs:
apply:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: 20
- name: Install Pulumi
run: |
curl -fsSL https://get.pulumi.com | sh
echo "~/.pulumi/bin" >> $GITHUB_ENV
- name: Install Ansible
run: |
sudo apt-get update
sudo apt-get install -y ansible
- name: Run apply-and-rollback playbook
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
ansible-playbook apply-and-rollback.yml
4.2 Secret Management
Never hard‑code credentials. Store Pulumi tokens, cloud provider keys, and any sensitive data in your CI platform’s secrets store. Use --var-file or environment variables to feed values into the Terraform and Pulumi configurations.
Step 5: Testing the Rollback Path
5.1 Simulate a Failure
Inject a deliberate syntax error into a Terraform file (e.g., miss a closing bracket). Run the workflow locally to ensure the Ansible playbook catches the error and triggers Pulumi rollback.
5.2 Verify Idempotence
After a rollback, re-run the playbook without changes. It should report no changes needed, confirming that the rollback restored the correct state.
5.3 Load Testing
Use terraform plan against a large, multi‑service environment to gauge the time it takes to rollback. Optimize by caching provider plugins and pre‑warming Terraform state.
Common Pitfalls & How to Avoid Them
- Incomplete Snapshots: Ensure Pulumi captures all resources, including provider‑specific metadata. Use
pulumi up --refreshbefore snapshotting if you suspect drift. - State Corruption: Regularly back up Pulumi state files to an encrypted S3 bucket. Use
pulumi state pullandpulumi state pushfor redundancy. - Race Conditions: When multiple CI jobs run concurrently, lock the state directory using a shared lock (e.g., DynamoDB locks for Terraform). Pulumi can also enforce exclusive access with the
--parallelflag. - Idempotency Violations: Ansible modules are idempotent by design. Double‑check that the
terraformmodule is correctly configured to skip resources that already exist.
Extending the Rollback Workflow
For organizations with more complex requirements, consider the following extensions:
- Multi‑Region Rollback: Deploy Pulumi stacks per region and coordinate rollbacks across them with a central orchestration service.
- Slack/Teams Notifications: Add Ansible handlers that post rollback status to a communication channel.
- Metrics & Alerting: Push rollback counts to Prometheus and set alerts for high failure rates.
- Policy Enforcement: Use Open Policy Agent (OPA) in Pulumi to prevent certain destructive changes from being applied unless explicitly approved.
Conclusion
By integrating Pulumi’s state management with Ansible’s robust orchestration, you can build an automated, reliable rollback pipeline that protects your services from Terraform missteps. This approach not only reduces downtime but also enforces a disciplined IaC culture where failures are caught and resolved automatically. Implementing the steps outlined here will give your team confidence that a broken apply will never leave your infrastructure in an inconsistent state.
