Rollback Terraform Failures with Pulumi & Ansible Automation ‣ 2026-04-15

When your Terraform plan introduces a breaking change—say, a misconfigured subnet that brings down a critical load balancer—quickly reverting the infrastructure can be the difference between uptime and an incident. In 2026, many teams are combining Pulumi and Ansible to create an automated rollback pipeline that catches failures in real time and restores the last known‑good state without manual intervention. This guide walks you through setting up a resilient rollback system that works across public clouds and hybrid environments.

Why a Dedicated Rollback Strategy Matters

Traditional Terraform workflows treat the state file as the source of truth but provide no built‑in, automated recovery path. If a terraform apply fails after provisioning partially, you’re left with a dangling resource configuration that must be cleaned up manually. In high‑availability services, even a few minutes of downtime can translate to significant revenue loss.

A dedicated rollback strategy offers several advantages:

Speed: Automates the revert process in seconds, not hours.
Safety: Uses proven IaC patterns (Pulumi state snapshots, Ansible idempotence) to ensure the environment returns to a consistent baseline.
Auditability: Records rollback actions in logs and version control, making post‑mortem analysis easier.
Scalability: Works across multi‑region deployments, Kubernetes clusters, and on‑prem hardware.

Architecture Overview

Our rollback pipeline is composed of three main components:

Pre‑apply Hook (Pulumi): Captures a full state snapshot before the Terraform apply starts.
Apply Monitor (Ansible): Wraps the Terraform command and watches for failures.
Rollback Executor (Pulumi): If a failure is detected, this step restores the last snapshot and re‑applies the Terraform configuration to re‑establish the environment.

The flow looks like this:

┌─────────────────────┐
│ Pulumi Pre‑apply    │
│ Snapshot Creation   │
└─────────┬───────────┘
          │
┌─────────▼───────────┐
│ Ansible Playbook    │
│ - Runs 'terraform   │
│   apply'            │
│ - Monitors exit code│
└─────────┬───────────┘
          │
┌─────────▼───────────┐
│ Pulumi Rollback     │
│ (if failure detected)│
└─────────────────────┘

Step 1: Prepare Your Pulumi Project for State Snapshots

1.1 Configure Pulumi Stack Snapshots

In Pulumi, a stack represents a deployment target. To capture the state before a Terraform run, add a snapshot resource in your Pulumi.yaml:

name: infra-rollback
runtime: nodejs
description: "Infrastructure with automated rollback"

resources:
  - type: pulumi:utils:Snapshot
    name: preApplySnapshot
    properties:
      path: "state/snapshots/${stackName}.json"

When the stack is initialized, Pulumi writes the entire state tree to the specified path. Make sure the snapshot directory is included in your CI repository or stored in a secure S3 bucket.

1.2 Enable Cloud Provider Backups

Most cloud providers support automated backups of critical resources (EBS volumes, RDS snapshots, etc.). While Pulumi snapshots capture the IaC representation, you should still enforce provider‑level backups for data durability.

Step 2: Build the Ansible Playbook Wrapper

2.1 Install Required Collections

Run the following command to ensure you have the community.general collection, which provides the terraform module:

ansible-galaxy collection install community.general

2.2 Playbook Skeleton

Create apply-and-rollback.yml:

---
- name: Apply Terraform and Rollback on Failure
  hosts: localhost
  gather_facts: false
  vars:
    tf_dir: "./terraform"
    rollback_flag: false

  tasks:
    - name: Run terraform init
      community.general.terraform:
        project_path: "{{ tf_dir }}"
        state: present
      register: init_result

    - name: Run terraform apply
      community.general.terraform:
        project_path: "{{ tf_dir }}"
        state: present
        plan_file: "apply.plan"
        auto_approve: true
      register: apply_result
      failed_when: apply_result.failed
      ignore_errors: yes

    - name: Check for apply failure
      set_fact:
        rollback_flag: true
      when: apply_result.failed

    - name: Trigger Pulumi rollback
      command: |
        pulumi up --stack {{ stack_name }} --target {{ stack_name }} --yes --skip-preview
      when: rollback_flag

2.3 Error Handling & Logging

Ansible’s ignore_errors: yes allows the playbook to continue after a Terraform failure so that the rollback block can execute. The rollback_flag fact is set when the apply fails, ensuring Pulumi only runs when necessary.

Logging is handled automatically by Ansible’s verbose output. For production, direct the playbook output to a log file and include timestamps for audit purposes.

Step 3: Pulumi Rollback Logic

3.1 Restore the Snapshot

Within your Pulumi stack, add logic to read the snapshot file and replace the current state:

const fs = require('fs');
const path = require('path');
const pulumi = require('pulumi');

const snapshotPath = path.join('state/snapshots', `${pulumi.getStack()}.json`);

if (fs.existsSync(snapshotPath)) {
  const snapshot = JSON.parse(fs.readFileSync(snapshotPath, 'utf8'));
  // Use the snapshot data to recreate resources
  // Example: Re-create a VPC
  const vpc = new aws.ec2.Vpc('vpc', {
    cidrBlock: snapshot.vpc.cidrBlock,
    tags: snapshot.vpc.tags
  });
}

Adjust the code to match your resource types. The key is that the stack reads the snapshot and constructs the same resources that existed before the failed apply.

3.2 Re‑Apply Terraform After Rollback

Once Pulumi has restored the state, run terraform apply again to bring the environment up to date with any subsequent changes. This can be triggered by a second Ansible task or by embedding the Terraform command within the Pulumi run script.

Step 4: CI/CD Integration

4.1 GitHub Actions Example

Below is a concise GitHub Actions workflow that triggers the rollback pipeline on push to the main branch:

name: IaC Apply

on:
  push:
    branches:
      - main

jobs:
  apply:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install Pulumi
        run: |
          curl -fsSL https://get.pulumi.com | sh
          echo "~/.pulumi/bin" >> $GITHUB_ENV

      - name: Install Ansible
        run: |
          sudo apt-get update
          sudo apt-get install -y ansible

      - name: Run apply-and-rollback playbook
        env:
          PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          ansible-playbook apply-and-rollback.yml

4.2 Secret Management

Never hard‑code credentials. Store Pulumi tokens, cloud provider keys, and any sensitive data in your CI platform’s secrets store. Use --var-file or environment variables to feed values into the Terraform and Pulumi configurations.

Step 5: Testing the Rollback Path

5.1 Simulate a Failure

Inject a deliberate syntax error into a Terraform file (e.g., miss a closing bracket). Run the workflow locally to ensure the Ansible playbook catches the error and triggers Pulumi rollback.

5.2 Verify Idempotence

After a rollback, re-run the playbook without changes. It should report no changes needed, confirming that the rollback restored the correct state.

5.3 Load Testing

Use terraform plan against a large, multi‑service environment to gauge the time it takes to rollback. Optimize by caching provider plugins and pre‑warming Terraform state.

Common Pitfalls & How to Avoid Them

Incomplete Snapshots: Ensure Pulumi captures all resources, including provider‑specific metadata. Use pulumi up --refresh before snapshotting if you suspect drift.
State Corruption: Regularly back up Pulumi state files to an encrypted S3 bucket. Use pulumi state pull and pulumi state push for redundancy.
Race Conditions: When multiple CI jobs run concurrently, lock the state directory using a shared lock (e.g., DynamoDB locks for Terraform). Pulumi can also enforce exclusive access with the --parallel flag.
Idempotency Violations: Ansible modules are idempotent by design. Double‑check that the terraform module is correctly configured to skip resources that already exist.

Extending the Rollback Workflow

For organizations with more complex requirements, consider the following extensions:

Multi‑Region Rollback: Deploy Pulumi stacks per region and coordinate rollbacks across them with a central orchestration service.
Slack/Teams Notifications: Add Ansible handlers that post rollback status to a communication channel.
Metrics & Alerting: Push rollback counts to Prometheus and set alerts for high failure rates.
Policy Enforcement: Use Open Policy Agent (OPA) in Pulumi to prevent certain destructive changes from being applied unless explicitly approved.

Conclusion

By integrating Pulumi’s state management with Ansible’s robust orchestration, you can build an automated, reliable rollback pipeline that protects your services from Terraform missteps. This approach not only reduces downtime but also enforces a disciplined IaC culture where failures are caught and resolved automatically. Implementing the steps outlined here will give your team confidence that a broken apply will never leave your infrastructure in an inconsistent state.

Choosing the Right IDE for Low‑Latency Go Microservices on Kubernetes

Automating Container Security in CI/CD: GitHub Actions vs. GitLab CI

Multi-tenant SaaS Database Schema Design: SQL vs NoSQL – Pick the Best Model for Tenant Isolation, Shared Data, and Scalability