Infrastructure As Code

Testing Infrastructure as Code

Test infrastructure code with unit tests, integration tests, policy checks, and compliance scanning for Terraform and CDK

Testing Infrastructure as Code

Infrastructure as Code brought software engineering discipline to infrastructure provisioning, but most teams skip the part that makes software engineering actually work: testing. Treating Terraform modules and CDK stacks as untested scripts is how you end up with a security group that allows 0.0.0.0/0 on port 22 in production. This article covers the full testing pyramid for IaC, from static analysis through integration testing, with practical examples you can implement today.

Prerequisites

  • Node.js 18+ installed
  • Terraform 1.6+ installed (for terraform test support)
  • AWS CDK v2 installed (npm install -g aws-cdk)
  • Basic familiarity with Terraform HCL and AWS CDK constructs
  • Go 1.21+ installed (for Terratest examples)
  • An AWS account with credentials configured for integration tests

Why Test Infrastructure Code

Infrastructure code has a unique failure profile. When application code breaks, you get an error message and a stack trace. When infrastructure code breaks, you get a $47,000 AWS bill, a data breach, or a four-hour outage at 3 AM. The blast radius is enormous.

There are three categories of defects in IaC that testing catches:

Correctness defects — the infrastructure does not do what you intended. Your VPC has no internet gateway, your Lambda function has the wrong runtime, your RDS instance is in a single AZ.

Security defects — the infrastructure is configured in a way that exposes attack surface. Public S3 buckets, overly permissive IAM policies, unencrypted EBS volumes.

Compliance defects — the infrastructure violates organizational or regulatory policies. Wrong region, missing tags, instance types outside the approved list, no encryption at rest.

Testing IaC is not optional. It is the mechanism that turns your infrastructure definitions from "scripts we hope work" into "validated specifications we can reason about."

The Testing Pyramid for Infrastructure as Code

The traditional testing pyramid applies to IaC, but the layers look different:

         /  E2E Tests  \          ← Deploy and validate in real environment
        / Integration    \        ← Terratest, deploy modules in isolation
       / Contract Tests    \      ← Module interface validation
      / Unit Tests           \    ← CDK assertions, terraform test
     / Policy & Compliance     \  ← OPA, Sentinel, Checkov
    / Static Analysis            \ ← tflint, cdk-nag, tfsec

The bottom layers run in seconds with no cloud credentials. The top layers require real infrastructure and take minutes. A healthy IaC pipeline runs all of them, but invests most heavily in the bottom three.

Static Analysis

Static analysis catches structural errors and security misconfigurations without executing anything. These tools parse your IaC files and check them against rule databases.

tflint for Terraform

tflint catches issues that terraform validate misses — deprecated syntax, invalid instance types, naming convention violations.

# Install tflint
curl -s https://raw.githubusercontent.com/terraform-linters/tflint/master/install_linux.sh | bash

# Initialize with AWS ruleset
cat > .tflint.hcl <<EOF
plugin "aws" {
  enabled = true
  version = "0.30.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

rule "terraform_documented_variables" {
  enabled = true
}
EOF

tflint --init
tflint --recursive

Checkov for Multi-Framework Scanning

Checkov is the Swiss Army knife of IaC security scanning. It supports Terraform, CloudFormation, Kubernetes manifests, Dockerfiles, and CDK.

pip install checkov

# Scan Terraform directory
checkov -d ./modules/networking --framework terraform

# Scan with custom policy directory
checkov -d ./modules/networking --external-checks-dir ./policies

# Output as JUnit XML for CI integration
checkov -d . --output junitxml > checkov-results.xml

cdk-nag for CDK Stacks

cdk-nag applies AWS Solutions Architect best practices directly to your CDK constructs. It runs during synthesis, catching issues before anything touches CloudFormation.

var cdk = require("aws-cdk-lib");
var nag = require("cdk-nag");
var MyStack = require("./lib/my-stack");

var app = new cdk.App();
var stack = new MyStack(app, "MyStack");

// Apply AWS Solutions checks
cdk.Aspects.of(app).add(new nag.AwsSolutionsChecks({ verbose: true }));

// Suppress specific rules when justified
nag.NagSuppressions.addStackSuppressions(stack, [
  {
    id: "AwsSolutions-S1",
    reason: "Access logging bucket does not need its own access logs"
  }
]);

app.synth();

Unit Testing CDK with Assertions

The aws-cdk-lib/assertions module lets you write unit tests that inspect the synthesized CloudFormation template without deploying anything. These tests run in milliseconds.

var cdk = require("aws-cdk-lib");
var assertions = require("aws-cdk-lib/assertions");
var assert = require("assert");

// The stack under test
var NetworkStack = require("../lib/network-stack");

function testVpcConfiguration() {
  var app = new cdk.App();
  var stack = new NetworkStack(app, "TestNetworkStack", {
    environment: "production",
    cidrBlock: "10.0.0.0/16"
  });

  var template = assertions.Template.fromStack(stack);

  // Assert VPC exists with correct CIDR
  template.hasResourceProperties("AWS::EC2::VPC", {
    CidrBlock: "10.0.0.0/16",
    EnableDnsHostnames: true,
    EnableDnsSupport: true
  });

  // Assert we have exactly 3 private and 3 public subnets
  template.resourceCountIs("AWS::EC2::Subnet", 6);

  // Assert NAT Gateway exists in production
  template.resourceCountIs("AWS::EC2::NatGateway", 3);

  // Assert flow logs are enabled
  template.hasResourceProperties("AWS::EC2::FlowLog", {
    ResourceType: "VPC",
    TrafficType: "ALL"
  });

  console.log("VPC configuration tests passed");
}

function testSecurityGroupRules() {
  var app = new cdk.App();
  var stack = new NetworkStack(app, "TestNetworkStack", {
    environment: "production",
    cidrBlock: "10.0.0.0/16"
  });

  var template = assertions.Template.fromStack(stack);

  // Assert no security group allows unrestricted SSH
  var securityGroups = template.findResources("AWS::EC2::SecurityGroup");
  Object.keys(securityGroups).forEach(function(key) {
    var sg = securityGroups[key];
    var ingress = sg.Properties.SecurityGroupIngress || [];
    ingress.forEach(function(rule) {
      if (rule.FromPort === 22 || rule.ToPort === 22) {
        assert.notStrictEqual(
          rule.CidrIp,
          "0.0.0.0/0",
          "SSH must not be open to the world: " + key
        );
      }
    });
  });

  console.log("Security group tests passed");
}

function testTaggingPolicy() {
  var app = new cdk.App();
  var stack = new NetworkStack(app, "TestNetworkStack", {
    environment: "production",
    cidrBlock: "10.0.0.0/16"
  });

  var template = assertions.Template.fromStack(stack);

  // Assert all taggable resources have required tags
  var allResources = template.toJSON().Resources;
  var taggableTypes = [
    "AWS::EC2::VPC",
    "AWS::EC2::Subnet",
    "AWS::EC2::SecurityGroup"
  ];

  Object.keys(allResources).forEach(function(key) {
    var resource = allResources[key];
    if (taggableTypes.indexOf(resource.Type) !== -1) {
      var tags = resource.Properties.Tags || [];
      var tagNames = tags.map(function(t) { return t.Key; });
      assert.ok(
        tagNames.indexOf("Environment") !== -1,
        "Missing Environment tag on " + key
      );
      assert.ok(
        tagNames.indexOf("ManagedBy") !== -1,
        "Missing ManagedBy tag on " + key
      );
    }
  });

  console.log("Tagging policy tests passed");
}

testVpcConfiguration();
testSecurityGroupRules();
testTaggingPolicy();

Unit Testing Terraform with terraform test

Terraform 1.6 introduced native testing with terraform test. Test files use .tftest.hcl extension and run against your modules without deploying real infrastructure (when using mocks).

# tests/networking.tftest.hcl

# Mock the AWS provider to avoid real API calls
mock_provider "aws" {}

variables {
  environment    = "production"
  vpc_cidr       = "10.0.0.0/16"
  azs            = ["us-east-1a", "us-east-1b", "us-east-1c"]
  enable_nat     = true
  enable_flowlog = true
}

run "vpc_creates_correct_cidr" {
  command = plan

  assert {
    condition     = aws_vpc.main.cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block does not match expected value"
  }
}

run "production_enables_nat_gateway" {
  command = plan

  assert {
    condition     = length(aws_nat_gateway.main) == 3
    error_message = "Production should have 3 NAT gateways, one per AZ"
  }
}

run "staging_disables_nat_gateway" {
  command = plan

  variables {
    environment = "staging"
    enable_nat  = false
  }

  assert {
    condition     = length(aws_nat_gateway.main) == 0
    error_message = "Staging should not have NAT gateways"
  }
}

run "flow_logs_enabled" {
  command = plan

  assert {
    condition     = aws_flow_log.main[0].traffic_type == "ALL"
    error_message = "Flow logs should capture all traffic"
  }
}

run "private_subnets_tagged_correctly" {
  command = plan

  assert {
    condition = alltrue([
      for subnet in aws_subnet.private :
        subnet.tags["Tier"] == "private"
    ])
    error_message = "All private subnets must have Tier=private tag"
  }
}

Run the tests with:

terraform test -verbose

Integration Testing with Terratest

Terratest is a Go library that deploys real infrastructure, validates it, and tears it down. These tests are slow (5-30 minutes) and cost money, but they catch issues that unit tests cannot — IAM permission errors, API quota limits, cross-service integration failures.

// test/networking_test.go
package test

import (
    "fmt"
    "testing"
    "time"

    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/gruntwork-io/terratest/modules/retry"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestNetworkingModule(t *testing.T) {
    t.Parallel()

    // Use a unique name to avoid collisions
    uniqueId := fmt.Sprintf("test-%d", time.Now().Unix())
    awsRegion := "us-east-1"

    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../modules/networking",
        Vars: map[string]interface{}{
            "environment": "test",
            "vpc_cidr":    "10.99.0.0/16",
            "name_prefix": uniqueId,
            "enable_nat":  true,
        },
        EnvVars: map[string]string{
            "AWS_DEFAULT_REGION": awsRegion,
        },
    })

    // Destroy infrastructure after test completes
    defer terraform.Destroy(t, terraformOptions)

    // Deploy the infrastructure
    terraform.InitAndApply(t, terraformOptions)

    // Validate outputs exist
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    assert.NotEmpty(t, vpcId)

    privateSubnetIds := terraform.OutputList(t, terraformOptions, "private_subnet_ids")
    assert.Equal(t, 3, len(privateSubnetIds))

    // Validate VPC properties via AWS API
    vpc := aws.GetVpcById(t, vpcId, awsRegion)
    assert.Equal(t, "10.99.0.0/16", vpc.CidrBlock)

    // Validate subnets have internet access through NAT
    for _, subnetId := range privateSubnetIds {
        routeTable := aws.GetRouteTableForSubnet(t, subnetId, awsRegion)
        hasNatRoute := false
        for _, route := range routeTable.Routes {
            if route.NatGatewayId != nil {
                hasNatRoute = true
                break
            }
        }
        assert.True(t, hasNatRoute,
            "Private subnet %s should route through NAT gateway", subnetId)
    }

    // Validate DNS resolution works inside the VPC
    retry.DoWithRetry(t, "Check DNS resolution", 5, 10*time.Second,
        func() (string, error) {
            // Verify VPC DNS attributes are enabled
            dnsSupport := aws.GetDnsSupport(t, vpcId, awsRegion)
            if !dnsSupport {
                return "", fmt.Errorf("DNS support not enabled")
            }
            return "DNS OK", nil
        },
    )
}

func TestNetworkingModuleWithoutNat(t *testing.T) {
    t.Parallel()

    uniqueId := fmt.Sprintf("test-%d", time.Now().Unix())

    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../modules/networking",
        Vars: map[string]interface{}{
            "environment": "dev",
            "vpc_cidr":    "10.98.0.0/16",
            "name_prefix": uniqueId,
            "enable_nat":  false,
        },
    })

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify no NAT gateways were created
    natGatewayIds := terraform.OutputList(t, terraformOptions, "nat_gateway_ids")
    assert.Equal(t, 0, len(natGatewayIds))
}

Contract Testing for Modules

Contract tests validate that a module's interface — its inputs and outputs — behaves according to its documented specification. This prevents breaking changes when modules are shared across teams.

// test/module-contract.test.js
var assert = require("assert");
var fs = require("fs");
var path = require("path");
var child_process = require("child_process");

function loadTerraformOutputs(moduleDir) {
  var result = child_process.execSync(
    "terraform output -json",
    { cwd: moduleDir, encoding: "utf-8" }
  );
  return JSON.parse(result);
}

function loadTerraformVariables(moduleDir) {
  var variablesFile = path.join(moduleDir, "variables.tf");
  var content = fs.readFileSync(variablesFile, "utf-8");

  // Parse variable blocks (simplified parser)
  var variables = [];
  var regex = /variable\s+"(\w+)"\s*\{/g;
  var match;
  while ((match = regex.exec(content)) !== null) {
    variables.push(match[1]);
  }
  return variables;
}

function testNetworkModuleContract() {
  var moduleDir = path.resolve(__dirname, "../modules/networking");

  // Contract: required input variables
  var requiredInputs = [
    "environment",
    "vpc_cidr",
    "name_prefix",
    "azs",
    "enable_nat",
    "enable_flowlog"
  ];

  var actualVariables = loadTerraformVariables(moduleDir);

  requiredInputs.forEach(function(input) {
    assert.ok(
      actualVariables.indexOf(input) !== -1,
      "Module must accept variable: " + input
    );
  });

  // Contract: required output values
  var requiredOutputs = [
    "vpc_id",
    "vpc_cidr",
    "private_subnet_ids",
    "public_subnet_ids",
    "nat_gateway_ids"
  ];

  var outputsFile = path.join(moduleDir, "outputs.tf");
  var outputContent = fs.readFileSync(outputsFile, "utf-8");

  requiredOutputs.forEach(function(output) {
    assert.ok(
      outputContent.indexOf('output "' + output + '"') !== -1,
      "Module must export output: " + output
    );
  });

  console.log("Contract tests passed: networking module");
}

testNetworkModuleContract();

Policy Testing with OPA and Sentinel

Policy-as-code tools let you write organizational rules that infrastructure must satisfy. Open Policy Agent (OPA) uses Rego, while HashiCorp Sentinel is Terraform Enterprise/Cloud specific.

OPA with Terraform Plan

# policies/terraform/networking.rego
package terraform.networking

import input as tfplan

# Deny public SSH access
deny[msg] {
  resource := tfplan.resource_changes[_]
  resource.type == "aws_security_group_rule"
  resource.change.after.type == "ingress"
  resource.change.after.from_port <= 22
  resource.change.after.to_port >= 22
  resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
  msg := sprintf("Security group rule %s allows SSH from 0.0.0.0/0", [resource.address])
}

# Require encryption on EBS volumes
deny[msg] {
  resource := tfplan.resource_changes[_]
  resource.type == "aws_ebs_volume"
  not resource.change.after.encrypted
  msg := sprintf("EBS volume %s must be encrypted", [resource.address])
}

# Enforce approved instance types
approved_types := {
  "t3.micro", "t3.small", "t3.medium", "t3.large",
  "m5.large", "m5.xlarge", "m5.2xlarge",
  "r5.large", "r5.xlarge"
}

deny[msg] {
  resource := tfplan.resource_changes[_]
  resource.type == "aws_instance"
  instance_type := resource.change.after.instance_type
  not approved_types[instance_type]
  msg := sprintf("Instance type %s is not approved. Resource: %s",
    [instance_type, resource.address])
}

# Require specific tags on all resources
required_tags := {"Environment", "Team", "CostCenter"}

deny[msg] {
  resource := tfplan.resource_changes[_]
  tags := resource.change.after.tags
  tag := required_tags[_]
  not tags[tag]
  msg := sprintf("Resource %s missing required tag: %s", [resource.address, tag])
}

Run OPA against a Terraform plan:

# Generate plan JSON
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json

# Evaluate policy
opa eval \
  --data policies/terraform/ \
  --input tfplan.json \
  "data.terraform.networking.deny" \
  --format pretty

Automating OPA Checks in Node.js

// scripts/check-policy.js
var child_process = require("child_process");
var path = require("path");

function runOpaEval(policyDir, planFile, policyPackage) {
  var cmd = [
    "opa", "eval",
    "--data", policyDir,
    "--input", planFile,
    "--format", "json",
    "data." + policyPackage + ".deny"
  ].join(" ");

  var result = child_process.execSync(cmd, { encoding: "utf-8" });
  var parsed = JSON.parse(result);

  var violations = parsed.result[0].expressions[0].value;
  return violations;
}

function checkPolicies(planFile) {
  var policyDir = path.resolve(__dirname, "../policies/terraform");

  var violations = runOpaEval(policyDir, planFile, "terraform.networking");

  if (violations.length > 0) {
    console.error("Policy violations found:");
    violations.forEach(function(v) {
      console.error("  - " + v);
    });
    process.exit(1);
  }

  console.log("All policies passed");
}

var planFile = process.argv[2];
if (!planFile) {
  console.error("Usage: node check-policy.js <plan.json>");
  process.exit(1);
}

checkPolicies(planFile);

Snapshot Testing for CloudFormation

Snapshot testing captures a known-good CloudFormation template and alerts you when the synthesized output changes. This catches unintended drift caused by CDK version upgrades or dependency changes.

// test/snapshot.test.js
var cdk = require("aws-cdk-lib");
var assertions = require("aws-cdk-lib/assertions");
var fs = require("fs");
var path = require("path");
var assert = require("assert");

var NetworkStack = require("../lib/network-stack");

function updateSnapshot(snapshotPath, template) {
  fs.writeFileSync(snapshotPath, JSON.stringify(template, null, 2));
  console.log("Snapshot updated: " + snapshotPath);
}

function testSnapshot() {
  var app = new cdk.App();
  var stack = new NetworkStack(app, "SnapshotStack", {
    environment: "production",
    cidrBlock: "10.0.0.0/16"
  });

  var template = assertions.Template.fromStack(stack);
  var templateJson = template.toJSON();

  var snapshotPath = path.join(__dirname, "__snapshots__", "network-stack.json");
  var shouldUpdate = process.env.UPDATE_SNAPSHOTS === "true";

  if (shouldUpdate || !fs.existsSync(snapshotPath)) {
    fs.mkdirSync(path.dirname(snapshotPath), { recursive: true });
    updateSnapshot(snapshotPath, templateJson);
    return;
  }

  var savedSnapshot = JSON.parse(fs.readFileSync(snapshotPath, "utf-8"));

  // Compare resource counts
  var currentResources = Object.keys(templateJson.Resources);
  var snapshotResources = Object.keys(savedSnapshot.Resources);

  assert.deepStrictEqual(
    currentResources.sort(),
    snapshotResources.sort(),
    "Resource list has changed. Run with UPDATE_SNAPSHOTS=true to update."
  );

  // Compare resource types
  currentResources.forEach(function(key) {
    assert.strictEqual(
      templateJson.Resources[key].Type,
      savedSnapshot.Resources[key].Type,
      "Resource type changed for " + key
    );
  });

  console.log("Snapshot test passed");
}

testSnapshot();

Compliance Testing

Compliance testing validates that infrastructure meets regulatory or organizational standards. Unlike policy tests that check individual resources, compliance tests verify cross-cutting concerns across your entire deployment.

// test/compliance.test.js
var assert = require("assert");
var child_process = require("child_process");

function getTerraformState(dir) {
  var result = child_process.execSync(
    "terraform show -json",
    { cwd: dir, encoding: "utf-8" }
  );
  return JSON.parse(result);
}

function checkEncryptionAtRest(state) {
  var violations = [];
  var resources = state.values.root_module.resources || [];

  resources.forEach(function(resource) {
    switch (resource.type) {
      case "aws_s3_bucket":
        // S3 default encryption is enforced at bucket level since Jan 2023
        break;
      case "aws_rds_instance":
        if (!resource.values.storage_encrypted) {
          violations.push("RDS instance " + resource.address + " is not encrypted");
        }
        break;
      case "aws_ebs_volume":
        if (!resource.values.encrypted) {
          violations.push("EBS volume " + resource.address + " is not encrypted");
        }
        break;
      case "aws_dynamodb_table":
        var sse = resource.values.server_side_encryption;
        if (!sse || !sse[0] || !sse[0].enabled) {
          violations.push("DynamoDB table " + resource.address + " lacks SSE");
        }
        break;
    }
  });

  return violations;
}

function checkNetworkSegmentation(state) {
  var violations = [];
  var resources = state.values.root_module.resources || [];

  // Databases must be in private subnets
  var dbSubnetGroups = resources.filter(function(r) {
    return r.type === "aws_db_subnet_group";
  });

  dbSubnetGroups.forEach(function(group) {
    var subnetIds = group.values.subnet_ids || [];
    subnetIds.forEach(function(subnetId) {
      var subnet = resources.find(function(r) {
        return r.type === "aws_subnet" && r.values.id === subnetId;
      });
      if (subnet && subnet.values.map_public_ip_on_launch) {
        violations.push(
          "DB subnet group " + group.address +
          " includes public subnet " + subnetId
        );
      }
    });
  });

  return violations;
}

function runComplianceChecks(dir) {
  var state = getTerraformState(dir);
  var allViolations = [];

  var encryptionViolations = checkEncryptionAtRest(state);
  allViolations = allViolations.concat(encryptionViolations);

  var networkViolations = checkNetworkSegmentation(state);
  allViolations = allViolations.concat(networkViolations);

  if (allViolations.length > 0) {
    console.error("Compliance violations:");
    allViolations.forEach(function(v) {
      console.error("  FAIL: " + v);
    });
    process.exit(1);
  }

  console.log("All compliance checks passed");
}

runComplianceChecks(process.argv[2] || ".");

Cost Estimation in Tests

Deploying infrastructure without understanding the cost is as reckless as deploying without testing. Infracost integrates into your test pipeline to catch cost surprises before they hit your bill.

// scripts/cost-check.js
var child_process = require("child_process");
var path = require("path");

function getInfracostBreakdown(terraformDir) {
  var cmd = [
    "infracost", "breakdown",
    "--path", terraformDir,
    "--format", "json",
    "--no-color"
  ].join(" ");

  var result = child_process.execSync(cmd, { encoding: "utf-8" });
  return JSON.parse(result);
}

function checkCostThresholds(breakdown, maxMonthlyCost, maxHourlyCost) {
  var totalMonthlyCost = parseFloat(breakdown.totalMonthlyCost);
  var totalHourlyCost = parseFloat(breakdown.totalHourlyCost);

  var violations = [];

  if (totalMonthlyCost > maxMonthlyCost) {
    violations.push(
      "Monthly cost $" + totalMonthlyCost.toFixed(2) +
      " exceeds threshold $" + maxMonthlyCost.toFixed(2)
    );
  }

  if (totalHourlyCost > maxHourlyCost) {
    violations.push(
      "Hourly cost $" + totalHourlyCost.toFixed(2) +
      " exceeds threshold $" + maxHourlyCost.toFixed(2)
    );
  }

  // Check for expensive individual resources
  var projects = breakdown.projects || [];
  projects.forEach(function(project) {
    var resources = project.breakdown.resources || [];
    resources.forEach(function(resource) {
      var resourceMonthlyCost = parseFloat(resource.monthlyCost || 0);
      if (resourceMonthlyCost > 500) {
        violations.push(
          "Resource " + resource.name + " costs $" +
          resourceMonthlyCost.toFixed(2) + "/month — needs review"
        );
      }
    });
  });

  return violations;
}

var terraformDir = process.argv[2] || ".";
var maxMonthly = parseFloat(process.argv[3]) || 1000;
var maxHourly = parseFloat(process.argv[4]) || 2;

var breakdown = getInfracostBreakdown(terraformDir);
var violations = checkCostThresholds(breakdown, maxMonthly, maxHourly);

if (violations.length > 0) {
  console.error("Cost policy violations:");
  violations.forEach(function(v) {
    console.error("  - " + v);
  });
  process.exit(1);
}

console.log("Cost check passed: $" +
  parseFloat(breakdown.totalMonthlyCost).toFixed(2) + "/month");

CI/CD Pipeline Integration

All these tests need to run automatically. Here is a GitHub Actions workflow that implements the full IaC testing pyramid:

# .github/workflows/iac-tests.yml
name: Infrastructure Tests

on:
  pull_request:
    paths:
      - 'infrastructure/**'
      - 'modules/**'
      - 'policies/**'

env:
  TF_VERSION: "1.7.0"
  AWS_REGION: "us-east-1"

jobs:
  static-analysis:
    name: Static Analysis
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run tflint
        uses: terraform-linters/setup-tflint@v4
        with:
          tflint_version: latest
      - run: |
          tflint --init
          tflint --recursive --format compact

      - name: Run Checkov
        uses: bridgecrewio/checkov-action@master
        with:
          directory: infrastructure/
          framework: terraform
          output_format: junitxml
          output_file_path: checkov-results.xml

      - name: Upload Checkov results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: checkov-results
          path: checkov-results.xml

  unit-tests:
    name: Unit Tests
    runs-on: ubuntu-latest
    needs: static-analysis
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Terraform Unit Tests
        run: |
          cd modules/networking
          terraform init
          terraform test -verbose

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: CDK Unit Tests
        run: |
          cd infrastructure/cdk
          npm ci
          npm test

  policy-check:
    name: Policy Evaluation
    runs-on: ubuntu-latest
    needs: unit-tests
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Generate plan
        run: |
          cd infrastructure
          terraform init -backend=false
          terraform plan -out=tfplan.binary
          terraform show -json tfplan.binary > tfplan.json

      - name: OPA policy check
        uses: open-policy-agent/setup-opa@v2
      - run: |
          opa eval \
            --data policies/ \
            --input infrastructure/tfplan.json \
            --format json \
            "data.terraform.networking.deny" | \
          node -e "
            var input = '';
            process.stdin.on('data', function(d){ input += d; });
            process.stdin.on('end', function(){
              var result = JSON.parse(input);
              var violations = result.result[0].expressions[0].value;
              if (violations.length > 0) {
                violations.forEach(function(v){ console.error(v); });
                process.exit(1);
              }
              console.log('All policies passed');
            });
          "

      - name: Cost estimation
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
      - run: |
          infracost breakdown --path infrastructure/ --format json > cost.json
          node scripts/cost-check.js infrastructure/ 2000 5

  integration-tests:
    name: Integration Tests
    runs-on: ubuntu-latest
    needs: [unit-tests, policy-check]
    if: github.event.pull_request.label == 'run-integration'
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-arn: ${{ secrets.AWS_TEST_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - uses: actions/setup-go@v5
        with:
          go-version: '1.21'

      - name: Run Terratest
        run: |
          cd test
          go test -v -timeout 30m ./...
        env:
          AWS_DEFAULT_REGION: ${{ env.AWS_REGION }}

Complete Working Example

Here is a complete test suite for a Terraform networking module. This ties together unit tests, contract tests, and policy tests into a single runnable project.

Module Structure

modules/
  networking/
    main.tf
    variables.tf
    outputs.tf
    tests/
      networking.tftest.hcl
test/
  contract.test.js
  policy.test.js
  networking_test.go
policies/
  terraform/
    networking.rego

The Module Under Test

# modules/networking/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.common_tags, {
    Name = "${var.name_prefix}-vpc"
  })
}

resource "aws_subnet" "private" {
  count             = length(var.azs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone = var.azs[count.index]

  tags = merge(var.common_tags, {
    Name = "${var.name_prefix}-private-${var.azs[count.index]}"
    Tier = "private"
  })
}

resource "aws_subnet" "public" {
  count                   = length(var.azs)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + length(var.azs))
  availability_zone       = var.azs[count.index]
  map_public_ip_on_launch = true

  tags = merge(var.common_tags, {
    Name = "${var.name_prefix}-public-${var.azs[count.index]}"
    Tier = "public"
  })
}

resource "aws_nat_gateway" "main" {
  count         = var.enable_nat ? length(var.azs) : 0
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = merge(var.common_tags, {
    Name = "${var.name_prefix}-nat-${var.azs[count.index]}"
  })
}

resource "aws_eip" "nat" {
  count  = var.enable_nat ? length(var.azs) : 0
  domain = "vpc"

  tags = merge(var.common_tags, {
    Name = "${var.name_prefix}-eip-${var.azs[count.index]}"
  })
}

resource "aws_flow_log" "main" {
  count                = var.enable_flowlog ? 1 : 0
  vpc_id               = aws_vpc.main.id
  traffic_type         = "ALL"
  log_destination_type = "cloud-watch-logs"
  log_destination      = aws_cloudwatch_log_group.flow_log[0].arn
  iam_role_arn         = aws_iam_role.flow_log[0].arn
}

Test Runner Script

// test/run-tests.js
var child_process = require("child_process");
var path = require("path");

var results = {
  passed: 0,
  failed: 0,
  errors: []
};

function runTest(name, command, cwd) {
  console.log("\n=== " + name + " ===");
  try {
    child_process.execSync(command, {
      cwd: cwd || process.cwd(),
      stdio: "inherit",
      encoding: "utf-8"
    });
    results.passed++;
    console.log("PASS: " + name);
  } catch (err) {
    results.failed++;
    results.errors.push(name + ": " + (err.message || "unknown error"));
    console.error("FAIL: " + name);
  }
}

// Layer 1: Static Analysis
runTest(
  "tflint",
  "tflint --recursive",
  path.resolve(__dirname, "../modules")
);

runTest(
  "checkov",
  "checkov -d . --framework terraform --quiet",
  path.resolve(__dirname, "../modules/networking")
);

// Layer 2: Unit Tests
runTest(
  "terraform test",
  "terraform test -verbose",
  path.resolve(__dirname, "../modules/networking")
);

// Layer 3: Contract Tests
runTest(
  "contract tests",
  "node " + path.resolve(__dirname, "contract.test.js")
);

// Layer 4: Policy Tests
runTest(
  "OPA policy evaluation",
  "node " + path.resolve(__dirname, "policy.test.js")
);

// Summary
console.log("\n=== Test Summary ===");
console.log("Passed: " + results.passed);
console.log("Failed: " + results.failed);

if (results.errors.length > 0) {
  console.error("\nFailures:");
  results.errors.forEach(function(e) {
    console.error("  - " + e);
  });
  process.exit(1);
}

console.log("\nAll tests passed");

Common Issues and Troubleshooting

1. Terratest tests leave orphaned resources after failure

When a test panics or times out, defer terraform.Destroy() may not execute. Use a scheduled cleanup job that scans for resources tagged with Environment=test older than 4 hours and deletes them. AWS Resource Groups Tag Editor helps find these. Also set TF_CLI_ARGS_apply="-lock-timeout=5m" to handle lock contention from parallel tests.

2. CDK snapshot tests break on every CDK version upgrade

CDK generates logical IDs and metadata that change between versions. Filter out aws:cdk:path and CDKMetadata resources from your snapshots. Compare only the resource types, properties, and dependency graph — not the entire template. Some teams skip snapshot testing entirely and rely on property-based assertions instead.

3. terraform test mock provider does not simulate all behaviors

The mock_provider in terraform test returns zero values for all computed attributes. If your module logic depends on computed values like ARNs or IDs, the assertions will fail. Use override_resource blocks to provide realistic mock values for computed attributes that matter to your test logic.

4. OPA policies pass locally but fail in CI

This usually happens because the Terraform plan JSON structure differs between Terraform versions. Pin your Terraform version in CI to match local development. Also verify that terraform show -json output format matches what your Rego policies expect — the schema changed between Terraform 0.12, 0.13, and 1.x.

5. Checkov false positives block the pipeline

Checkov is aggressive by default. Use .checkov.yaml to skip rules that do not apply to your environment. Document every suppression with a reason. Consider running Checkov in "soft-fail" mode during initial adoption and gradually making it a hard gate as you fix existing violations.

6. Integration tests are too slow for PR workflows

Run integration tests only on merge to main, or trigger them with a PR label (run-integration). Use Terraform workspaces or unique naming prefixes to enable parallel test execution. Cache Terraform provider binaries in CI to save 30-60 seconds per run.

Best Practices

  • Test at the lowest possible level first. Static analysis and unit tests run in seconds with zero cloud cost. Push as much validation as possible into these layers before reaching for integration tests.

  • Use unique naming in all test infrastructure. Append timestamps or random strings to resource names. Parallel test runs will collide on hardcoded names, producing flaky failures that are painful to debug.

  • Tag all test resources consistently. Use Environment=test and CreatedBy=ci tags on everything. This makes cleanup trivial and prevents accidental deletion of real resources.

  • Run integration tests in an isolated AWS account. Never run Terratest against your production or staging account. Use AWS Organizations to create a dedicated test account with restricted permissions and budget alerts.

  • Pin tool versions in CI. Terraform, tflint, Checkov, OPA, and Infracost all release frequently. Version drift between local development and CI causes false failures. Use lockfiles, version constraints, and pinned action versions.

  • Treat policy violations as build failures. Once a policy is adopted, make it a hard gate in the pipeline. "Soft fail" modes create technical debt because nobody goes back to fix warnings.

  • Test module upgrades in isolation before rolling them out. When you update a shared module version, run the full test suite against the new version in a feature branch before merging. Breaking changes in modules cascade across every consumer.

  • Keep test infrastructure costs visible. Run Infracost on every PR and post the cost diff as a PR comment. Engineers make better decisions when cost impact is visible at review time, not 30 days later on the AWS bill.

  • Version your policy rules alongside your infrastructure code. Store Rego policies, Checkov configurations, and Sentinel rules in the same repository as your Terraform modules. Policy and infrastructure should evolve together.

References

Powered by Contentful