Terraform Modules: Reusable Infrastructure Patterns
Build reusable Terraform modules for Node.js infrastructure with composition patterns, versioning, and testing strategies
Terraform Modules: Reusable Infrastructure Patterns
Overview
Terraform modules are self-contained packages of Terraform configuration that encapsulate a set of resources into a single, reusable unit. They solve the same problem that functions solve in programming -- you define the logic once, parameterize the inputs, and call it as many times as you need. If you are copying and pasting .tf files between projects, you need modules.
I have managed Terraform codebases ranging from a handful of resources to thousands across dozens of AWS accounts. The teams that invested early in well-structured modules shipped infrastructure changes faster, had fewer drift issues, and onboarded new engineers in days instead of weeks. The teams that treated Terraform like a pile of flat configuration files ended up with unmaintainable sprawl. This article covers how to build, structure, version, test, and compose Terraform modules for real production infrastructure.
Prerequisites
- Terraform 1.3+ installed locally
- AWS CLI configured with valid credentials
- Basic understanding of Terraform resources, providers, and state
- Git for module versioning
- Go 1.19+ (for Terratest examples)
- Node.js 18+ (for the microservice deployment example)
What Modules Are and When to Create Them
A Terraform module is any directory containing .tf files. Every Terraform configuration is technically a module -- the root directory you run terraform apply in is the "root module." When people say "module," they usually mean a child module: a reusable package called from the root module using a module block.
Create a module when:
- You deploy the same resource pattern more than twice
- Multiple teams need the same infrastructure with different parameters
- You want to enforce organizational standards (tagging, naming, security groups)
- A logical grouping of resources represents a single concept (a VPC, a microservice, a database cluster)
Do not create a module when:
- You have a one-off resource that will never be reused
- The abstraction adds complexity without reducing duplication
- You are wrapping a single resource with no additional logic
The rule of thumb: if you would write a function for it in application code, write a module for it in Terraform.
Module Structure
A well-structured module follows a consistent file layout. Every module I write uses this structure:
modules/
vpc/
main.tf # Primary resource definitions
variables.tf # Input variable declarations
outputs.tf # Output value declarations
versions.tf # Required providers and Terraform version
README.md # Usage documentation
examples/
simple/
main.tf # Minimal usage example
complete/
main.tf # Full-featured usage example
tests/
vpc_test.go # Terratest integration tests
variables.tf
Declare every input with a description, type constraint, and sensible default where appropriate:
# variables.tf
variable "name" {
description = "Name prefix for all resources in this module"
type = string
validation {
condition = length(var.name) >= 3 && length(var.name) <= 24
error_message = "Name must be between 3 and 24 characters."
}
}
variable "vpc_cidr" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
validation {
condition = can(cidrhost(var.vpc_cidr, 0))
error_message = "vpc_cidr must be a valid CIDR block."
}
}
variable "availability_zones" {
description = "List of availability zones to deploy into"
type = list(string)
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
variable "enable_nat_gateway" {
description = "Whether to create NAT gateways for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use a single NAT gateway instead of one per AZ (cost savings for non-production)"
type = bool
default = false
}
variable "tags" {
description = "Map of tags to apply to all resources"
type = map(string)
default = {}
}
outputs.tf
Expose the values that callers need. Think of outputs as the module's public API:
# outputs.tf
output "vpc_id" {
description = "ID of the created VPC"
value = aws_vpc.this.id
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "nat_gateway_ips" {
description = "Elastic IP addresses of the NAT gateways"
value = aws_eip.nat[*].public_ip
}
versions.tf
Pin the Terraform version and provider versions. This prevents surprise breakage:
# versions.tf
terraform {
required_version = ">= 1.3.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0, < 6.0"
}
}
}
Local vs Remote Modules
Local Modules
Local modules live in your repository and are referenced by relative path:
module "vpc" {
source = "../../modules/vpc"
name = "production"
vpc_cidr = "10.0.0.0/16"
enable_nat_gateway = true
}
Use local modules when you are iterating rapidly or when the module is only used within a single repository. The downside is that every consumer of the module is pinned to whatever version is in the repo -- there is no independent versioning.
Remote Modules
Remote modules are published to a registry or stored in a Git repository. They support versioning:
# From a Git repository with a tag
module "vpc" {
source = "git::https://github.com/your-org/terraform-modules.git//modules/vpc?ref=v2.1.0"
name = "production"
vpc_cidr = "10.0.0.0/16"
enable_nat_gateway = true
}
# From the Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.1.0"
name = "production"
cidr = "10.0.0.0/16"
}
Use remote modules when multiple teams or repositories consume the same module. The version pinning gives you controlled rollouts -- team A can upgrade to v3.0.0 while team B stays on v2.1.0 until they are ready.
Terraform Registry Modules
The Terraform Registry at registry.terraform.io hosts thousands of community and official modules. Before writing your own module, check if a well-maintained one already exists. The official AWS VPC module (terraform-aws-modules/vpc/aws) has been battle-tested by thousands of organizations and handles edge cases you have not thought of yet.
That said, registry modules are often more complex than you need. If you only need a VPC with public and private subnets, a 50-line module you wrote is easier to understand and debug than a 2000-line community module with 80 input variables. Use registry modules for complex, well-standardized patterns (VPC, EKS, RDS). Write your own for organization-specific patterns.
Creating a VPC Module
Here is a production VPC module that creates public subnets, private subnets, NAT gateways, and route tables:
# modules/vpc/main.tf
resource "aws_vpc" "this" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(var.tags, {
Name = "${var.name}-vpc"
})
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = merge(var.tags, {
Name = "${var.name}-igw"
})
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = merge(var.tags, {
Name = "${var.name}-public-${var.availability_zones[count.index]}"
Tier = "public"
})
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + length(var.availability_zones))
availability_zone = var.availability_zones[count.index]
tags = merge(var.tags, {
Name = "${var.name}-private-${var.availability_zones[count.index]}"
Tier = "private"
})
}
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
domain = "vpc"
tags = merge(var.tags, {
Name = "${var.name}-nat-eip-${count.index}"
})
}
resource "aws_nat_gateway" "this" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = merge(var.tags, {
Name = "${var.name}-nat-${count.index}"
})
depends_on = [aws_internet_gateway.this]
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
tags = merge(var.tags, {
Name = "${var.name}-public-rt"
})
}
resource "aws_route_table_association" "public" {
count = length(var.availability_zones)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table" "private" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.this[var.single_nat_gateway ? 0 : count.index].id
}
tags = merge(var.tags, {
Name = "${var.name}-private-rt-${count.index}"
})
}
resource "aws_route_table_association" "private" {
count = length(var.availability_zones)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[var.single_nat_gateway ? 0 : count.index].id
}
Creating an Application Deployment Module
This module deploys a Node.js application to ECS Fargate with an ALB, auto-scaling, and CloudWatch logging:
# modules/node-service/main.tf
resource "aws_ecs_task_definition" "this" {
family = var.service_name
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.cpu
memory = var.memory
execution_role_arn = aws_iam_role.execution.arn
task_role_arn = aws_iam_role.task.arn
container_definitions = jsonencode([
{
name = var.service_name
image = "${var.ecr_repository_url}:${var.image_tag}"
cpu = var.cpu
memory = var.memory
essential = true
portMappings = [
{
containerPort = var.container_port
protocol = "tcp"
}
]
environment = [
for key, value in var.environment_variables : {
name = key
value = value
}
]
secrets = [
for key, arn in var.secret_arns : {
name = key
valueFrom = arn
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.this.name
"awslogs-region" = data.aws_region.current.name
"awslogs-stream-prefix" = var.service_name
}
}
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:${var.container_port}${var.health_check_path} || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}
])
tags = var.tags
}
resource "aws_ecs_service" "this" {
name = var.service_name
cluster = var.ecs_cluster_id
task_definition = aws_ecs_task_definition.this.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.service.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.this.arn
container_name = var.service_name
container_port = var.container_port
}
deployment_circuit_breaker {
enable = true
rollback = true
}
depends_on = [aws_lb_listener_rule.this]
tags = var.tags
}
resource "aws_appautoscaling_target" "this" {
max_capacity = var.max_count
min_capacity = var.min_count
resource_id = "service/${var.ecs_cluster_name}/${aws_ecs_service.this.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "cpu" {
name = "${var.service_name}-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.this.resource_id
scalable_dimension = aws_appautoscaling_target.this.scalable_dimension
service_namespace = aws_appautoscaling_target.this.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = var.cpu_scaling_target
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
resource "aws_cloudwatch_log_group" "this" {
name = "/ecs/${var.service_name}"
retention_in_days = var.log_retention_days
tags = var.tags
}
data "aws_region" "current" {}
Module Composition Patterns
The real power of modules appears when you compose them together. There are three patterns I use consistently.
Pattern 1: Layered Composition
Stack modules in layers where each layer's outputs feed the next layer's inputs:
# environments/production/main.tf
module "network" {
source = "../../modules/vpc"
name = "prod"
vpc_cidr = "10.0.0.0/16"
enable_nat_gateway = true
single_nat_gateway = false
tags = local.common_tags
}
module "cluster" {
source = "../../modules/ecs-cluster"
name = "prod"
vpc_id = module.network.vpc_id
private_subnet_ids = module.network.private_subnet_ids
tags = local.common_tags
}
module "api_service" {
source = "../../modules/node-service"
service_name = "api"
ecs_cluster_id = module.cluster.cluster_id
ecs_cluster_name = module.cluster.cluster_name
private_subnet_ids = module.network.private_subnet_ids
vpc_id = module.network.vpc_id
ecr_repository_url = "123456789.dkr.ecr.us-east-1.amazonaws.com/api"
image_tag = var.api_image_tag
container_port = 3000
desired_count = 3
cpu = 512
memory = 1024
environment_variables = {
NODE_ENV = "production"
PORT = "3000"
}
tags = local.common_tags
}
Pattern 2: Factory Pattern
Use for_each to stamp out multiple instances of a module from a configuration map:
# Service definitions
locals {
services = {
api = {
cpu = 512
memory = 1024
desired_count = 3
container_port = 3000
image_tag = "v1.4.2"
health_path = "/health"
}
worker = {
cpu = 1024
memory = 2048
desired_count = 2
container_port = 8080
image_tag = "v2.0.1"
health_path = "/ready"
}
gateway = {
cpu = 256
memory = 512
desired_count = 2
container_port = 4000
image_tag = "v1.1.0"
health_path = "/health"
}
}
}
module "services" {
source = "../../modules/node-service"
for_each = local.services
service_name = each.key
ecs_cluster_id = module.cluster.cluster_id
ecs_cluster_name = module.cluster.cluster_name
private_subnet_ids = module.network.private_subnet_ids
vpc_id = module.network.vpc_id
ecr_repository_url = "123456789.dkr.ecr.us-east-1.amazonaws.com/${each.key}"
image_tag = each.value.image_tag
container_port = each.value.container_port
desired_count = each.value.desired_count
cpu = each.value.cpu
memory = each.value.memory
health_check_path = each.value.health_path
tags = merge(local.common_tags, {
Service = each.key
})
}
Pattern 3: Wrapper Module
Create a higher-level module that composes lower-level modules internally:
# modules/microservice-stack/main.tf
# This module creates everything a Node.js microservice needs
module "database" {
source = "../rds-postgres"
name = var.service_name
vpc_id = var.vpc_id
private_subnet_ids = var.private_subnet_ids
instance_class = var.db_instance_class
engine_version = "15.4"
tags = var.tags
}
module "service" {
source = "../node-service"
service_name = var.service_name
ecs_cluster_id = var.ecs_cluster_id
ecs_cluster_name = var.ecs_cluster_name
private_subnet_ids = var.private_subnet_ids
vpc_id = var.vpc_id
ecr_repository_url = var.ecr_repository_url
image_tag = var.image_tag
container_port = var.container_port
desired_count = var.desired_count
cpu = var.cpu
memory = var.memory
environment_variables = merge(var.environment_variables, {
DATABASE_URL = module.database.connection_string
})
secret_arns = merge(var.secret_arns, {
DB_PASSWORD = module.database.password_secret_arn
})
tags = var.tags
}
module "monitoring" {
source = "../cloudwatch-alarms"
service_name = var.service_name
cluster_name = var.ecs_cluster_name
alarm_actions = var.alarm_sns_topic_arns
tags = var.tags
}
Versioning Modules with Git Tags
When your modules live in a Git repository, use semantic versioning with Git tags:
# Tag a release
git tag -a v1.0.0 -m "Initial release of VPC module"
git push origin v1.0.0
# Tag a breaking change
git tag -a v2.0.0 -m "Remove deprecated single_nat_gateway variable"
git push origin v2.0.0
In your module source, reference the tag:
module "vpc" {
source = "git::https://github.com/your-org/terraform-modules.git//modules/vpc?ref=v1.0.0"
}
Follow these versioning rules:
- Patch (v1.0.1): Bug fixes, documentation updates, adding a new output
- Minor (v1.1.0): New optional variables with defaults, new resources that do not affect existing ones
- Major (v2.0.0): Removing variables, renaming resources (causes destroy/recreate), changing variable types
A major version bump in a Terraform module can mean resources get destroyed and recreated. Communicate these changes clearly in a CHANGELOG and give teams time to migrate. I have seen a "minor" module update destroy a production RDS instance because a resource was renamed -- treat resource naming changes as breaking.
Module Testing with Terratest
Terratest is a Go library that lets you write automated tests for Terraform modules. Tests deploy real infrastructure, validate it, and tear it down:
// tests/vpc_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVpcModule(t *testing.T) {
t.Parallel()
awsRegion := "us-east-1"
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../examples/simple",
Vars: map[string]interface{}{
"name": "test-vpc",
"vpc_cidr": "10.99.0.0/16",
"availability_zones": []string{
"us-east-1a",
"us-east-1b",
},
"enable_nat_gateway": true,
"single_nat_gateway": true,
},
EnvVars: map[string]string{
"AWS_DEFAULT_REGION": awsRegion,
},
})
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
// Validate VPC was created
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
// Validate VPC CIDR
vpc := aws.GetVpcById(t, vpcId, awsRegion)
assert.Equal(t, "10.99.0.0/16", vpc.CidrBlock)
// Validate subnets were created
publicSubnetIds := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
assert.Equal(t, 2, len(publicSubnetIds))
privateSubnetIds := terraform.OutputList(t, terraformOptions, "private_subnet_ids")
assert.Equal(t, 2, len(privateSubnetIds))
// Validate subnets are in different AZs
subnets := aws.GetSubnetsForVpc(t, vpcId, awsRegion)
azs := make(map[string]bool)
for _, subnet := range subnets {
azs[subnet.AvailabilityZone] = true
}
assert.Equal(t, 2, len(azs))
}
func TestVpcModuleWithoutNat(t *testing.T) {
t.Parallel()
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../examples/simple",
Vars: map[string]interface{}{
"name": "test-no-nat",
"vpc_cidr": "10.98.0.0/16",
"enable_nat_gateway": false,
},
})
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
natGatewayIps := terraform.OutputList(t, terraformOptions, "nat_gateway_ips")
assert.Equal(t, 0, len(natGatewayIps))
}
Run tests with:
cd modules/vpc/tests
go test -v -timeout 30m
Terratest creates real AWS resources, so run tests in a dedicated sandbox account with cost alerts. Tests typically take 5-15 minutes because NAT gateways and load balancers take time to provision.
Input Validation
Terraform 1.x supports custom validation rules on variables. Use them to catch mistakes at plan time instead of at apply time:
variable "cpu" {
description = "CPU units for the Fargate task (256, 512, 1024, 2048, 4096)"
type = number
validation {
condition = contains([256, 512, 1024, 2048, 4096], var.cpu)
error_message = "CPU must be one of: 256, 512, 1024, 2048, 4096."
}
}
variable "memory" {
description = "Memory in MB for the Fargate task"
type = number
validation {
condition = var.memory >= 512 && var.memory <= 30720
error_message = "Memory must be between 512 and 30720 MB."
}
}
variable "service_name" {
description = "Name of the ECS service"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{2,28}[a-z0-9]$", var.service_name))
error_message = "Service name must be 4-30 characters, lowercase alphanumeric and hyphens, starting with a letter."
}
}
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
Validation catches errors before Terraform talks to AWS. Without it, you get cryptic API errors 5 minutes into an apply. With it, you get a clear message immediately after terraform plan.
Conditional Resource Creation
Use the count meta-argument to conditionally create resources based on input variables:
variable "create_dns_record" {
description = "Whether to create a Route53 DNS record for the service"
type = bool
default = true
}
variable "enable_waf" {
description = "Whether to attach a WAF web ACL to the ALB"
type = bool
default = false
}
resource "aws_route53_record" "this" {
count = var.create_dns_record ? 1 : 0
zone_id = var.dns_zone_id
name = "${var.service_name}.${var.domain_name}"
type = "A"
alias {
name = aws_lb.this.dns_name
zone_id = aws_lb.this.zone_id
evaluate_target_health = true
}
}
resource "aws_wafv2_web_acl_association" "this" {
count = var.enable_waf ? 1 : 0
resource_arn = aws_lb.this.arn
web_acl_arn = var.waf_acl_arn
}
# When referencing conditional resources, use the splat syntax
output "dns_name" {
description = "DNS name of the service (empty if DNS record not created)"
value = try(aws_route53_record.this[0].fqdn, "")
}
For more complex conditions, use for_each with a conditional map:
variable "enable_alarms" {
description = "Whether to create CloudWatch alarms"
type = bool
default = true
}
locals {
alarms = var.enable_alarms ? {
high_cpu = {
metric = "CPUUtilization"
threshold = 80
comparison = "GreaterThanThreshold"
}
high_memory = {
metric = "MemoryUtilization"
threshold = 85
comparison = "GreaterThanThreshold"
}
error_rate = {
metric = "HTTPCode_Target_5XX_Count"
threshold = 10
comparison = "GreaterThanThreshold"
}
} : {}
}
resource "aws_cloudwatch_metric_alarm" "this" {
for_each = local.alarms
alarm_name = "${var.service_name}-${each.key}"
comparison_operator = each.value.comparison
evaluation_periods = 2
metric_name = each.value.metric
namespace = "AWS/ECS"
period = 300
statistic = "Average"
threshold = each.value.threshold
alarm_actions = var.alarm_sns_topic_arns
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = var.service_name
}
tags = var.tags
}
Module Documentation
Every module should have a README that shows how to call it. I generate these with terraform-docs:
# Install terraform-docs
brew install terraform-docs
# Generate README from your module's variables and outputs
terraform-docs markdown table ./modules/vpc > ./modules/vpc/README.md
This produces a table of all inputs, outputs, and resources. But do not rely solely on auto-generated docs. Add a usage example at the top:
## Usage
```hcl
module "vpc" {
source = "git::https://github.com/your-org/terraform-modules.git//modules/vpc?ref=v1.0.0"
name = "production"
vpc_cidr = "10.0.0.0/16"
enable_nat_gateway = true
single_nat_gateway = false
availability_zones = [
"us-east-1a",
"us-east-1b",
"us-east-1c",
]
tags = {
Environment = "production"
Team = "platform"
}
}
```
Complete Working Example
Here is a reusable Terraform module for deploying Node.js microservices with configurable compute (Lambda or ECS Fargate), an RDS PostgreSQL database, and CloudWatch monitoring. This is the kind of module I use to let application teams self-service their infrastructure.
Module Structure
modules/
nodejs-microservice/
main.tf
variables.tf
outputs.tf
versions.tf
compute_ecs.tf
compute_lambda.tf
database.tf
monitoring.tf
variables.tf
# modules/nodejs-microservice/variables.tf
variable "service_name" {
description = "Name of the microservice"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{2,28}[a-z0-9]$", var.service_name))
error_message = "Service name must be 4-30 characters, lowercase, alphanumeric and hyphens."
}
}
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "compute_type" {
description = "Compute platform: 'ecs' for ECS Fargate or 'lambda' for Lambda"
type = string
default = "ecs"
validation {
condition = contains(["ecs", "lambda"], var.compute_type)
error_message = "Compute type must be 'ecs' or 'lambda'."
}
}
# VPC Configuration
variable "vpc_id" {
description = "VPC ID for resource placement"
type = string
}
variable "private_subnet_ids" {
description = "Private subnet IDs for compute and database"
type = list(string)
}
variable "public_subnet_ids" {
description = "Public subnet IDs for load balancer (ECS only)"
type = list(string)
default = []
}
# ECS Configuration
variable "ecs_cluster_id" {
description = "ECS cluster ID (required when compute_type = 'ecs')"
type = string
default = ""
}
variable "ecs_cluster_name" {
description = "ECS cluster name (required when compute_type = 'ecs')"
type = string
default = ""
}
variable "ecr_repository_url" {
description = "ECR repository URL for the container image (ECS only)"
type = string
default = ""
}
variable "image_tag" {
description = "Container image tag (ECS only)"
type = string
default = "latest"
}
variable "cpu" {
description = "CPU units for Fargate task"
type = number
default = 256
validation {
condition = contains([256, 512, 1024, 2048, 4096], var.cpu)
error_message = "CPU must be one of: 256, 512, 1024, 2048, 4096."
}
}
variable "memory" {
description = "Memory in MB for Fargate task"
type = number
default = 512
}
variable "desired_count" {
description = "Desired number of ECS tasks"
type = number
default = 2
}
variable "container_port" {
description = "Port the Node.js application listens on"
type = number
default = 3000
}
# Lambda Configuration
variable "lambda_handler" {
description = "Lambda function handler (Lambda only)"
type = string
default = "index.handler"
}
variable "lambda_runtime" {
description = "Lambda runtime"
type = string
default = "nodejs20.x"
}
variable "lambda_timeout" {
description = "Lambda function timeout in seconds"
type = number
default = 30
}
variable "lambda_memory_size" {
description = "Lambda function memory in MB"
type = number
default = 256
}
variable "lambda_s3_bucket" {
description = "S3 bucket containing the Lambda deployment package"
type = string
default = ""
}
variable "lambda_s3_key" {
description = "S3 key for the Lambda deployment package"
type = string
default = ""
}
# Database Configuration
variable "enable_database" {
description = "Whether to create an RDS PostgreSQL instance"
type = bool
default = true
}
variable "db_instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.micro"
}
variable "db_allocated_storage" {
description = "RDS allocated storage in GB"
type = number
default = 20
}
variable "db_name" {
description = "PostgreSQL database name"
type = string
default = "app"
}
# Monitoring Configuration
variable "enable_monitoring" {
description = "Whether to create CloudWatch alarms"
type = bool
default = true
}
variable "alarm_sns_topic_arns" {
description = "SNS topic ARNs for alarm notifications"
type = list(string)
default = []
}
# General
variable "environment_variables" {
description = "Environment variables for the application"
type = map(string)
default = {}
}
variable "tags" {
description = "Tags to apply to all resources"
type = map(string)
default = {}
}
compute_ecs.tf
# modules/nodejs-microservice/compute_ecs.tf
resource "aws_ecs_task_definition" "this" {
count = var.compute_type == "ecs" ? 1 : 0
family = "${var.service_name}-${var.environment}"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.cpu
memory = var.memory
execution_role_arn = aws_iam_role.ecs_execution[0].arn
task_role_arn = aws_iam_role.ecs_task[0].arn
container_definitions = jsonencode([
{
name = var.service_name
image = "${var.ecr_repository_url}:${var.image_tag}"
essential = true
portMappings = [{
containerPort = var.container_port
protocol = "tcp"
}]
environment = concat(
[for k, v in var.environment_variables : { name = k, value = v }],
var.enable_database ? [
{ name = "DATABASE_HOST", value = aws_db_instance.this[0].address },
{ name = "DATABASE_PORT", value = tostring(aws_db_instance.this[0].port) },
{ name = "DATABASE_NAME", value = var.db_name },
] : []
)
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.this.name
"awslogs-region" = data.aws_region.current.name
"awslogs-stream-prefix" = "ecs"
}
}
}
])
tags = var.tags
}
resource "aws_ecs_service" "this" {
count = var.compute_type == "ecs" ? 1 : 0
name = "${var.service_name}-${var.environment}"
cluster = var.ecs_cluster_id
task_definition = aws_ecs_task_definition.this[0].arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.compute.id]
assign_public_ip = false
}
deployment_circuit_breaker {
enable = true
rollback = true
}
tags = var.tags
}
resource "aws_iam_role" "ecs_execution" {
count = var.compute_type == "ecs" ? 1 : 0
name = "${var.service_name}-${var.environment}-ecs-exec"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}]
})
tags = var.tags
}
resource "aws_iam_role_policy_attachment" "ecs_execution" {
count = var.compute_type == "ecs" ? 1 : 0
role = aws_iam_role.ecs_execution[0].name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
resource "aws_iam_role" "ecs_task" {
count = var.compute_type == "ecs" ? 1 : 0
name = "${var.service_name}-${var.environment}-ecs-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}]
})
tags = var.tags
}
compute_lambda.tf
# modules/nodejs-microservice/compute_lambda.tf
resource "aws_lambda_function" "this" {
count = var.compute_type == "lambda" ? 1 : 0
function_name = "${var.service_name}-${var.environment}"
role = aws_iam_role.lambda[0].arn
handler = var.lambda_handler
runtime = var.lambda_runtime
timeout = var.lambda_timeout
memory_size = var.lambda_memory_size
s3_bucket = var.lambda_s3_bucket
s3_key = var.lambda_s3_key
vpc_config {
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.compute.id]
}
environment {
variables = merge(
var.environment_variables,
var.enable_database ? {
DATABASE_HOST = aws_db_instance.this[0].address
DATABASE_PORT = tostring(aws_db_instance.this[0].port)
DATABASE_NAME = var.db_name
} : {},
{
NODE_ENV = var.environment
}
)
}
tags = var.tags
depends_on = [aws_cloudwatch_log_group.lambda]
}
resource "aws_cloudwatch_log_group" "lambda" {
count = var.compute_type == "lambda" ? 1 : 0
name = "/aws/lambda/${var.service_name}-${var.environment}"
retention_in_days = 30
tags = var.tags
}
resource "aws_iam_role" "lambda" {
count = var.compute_type == "lambda" ? 1 : 0
name = "${var.service_name}-${var.environment}-lambda"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
tags = var.tags
}
resource "aws_iam_role_policy_attachment" "lambda_basic" {
count = var.compute_type == "lambda" ? 1 : 0
role = aws_iam_role.lambda[0].name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
}
database.tf
# modules/nodejs-microservice/database.tf
resource "aws_db_subnet_group" "this" {
count = var.enable_database ? 1 : 0
name = "${var.service_name}-${var.environment}"
subnet_ids = var.private_subnet_ids
tags = merge(var.tags, {
Name = "${var.service_name}-${var.environment}-db-subnet"
})
}
resource "aws_db_instance" "this" {
count = var.enable_database ? 1 : 0
identifier = "${var.service_name}-${var.environment}"
engine = "postgres"
engine_version = "15.4"
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_allocated_storage * 2
db_name = var.db_name
username = "${var.service_name}_admin"
password = random_password.db[0].result
db_subnet_group_name = aws_db_subnet_group.this[0].name
vpc_security_group_ids = [aws_security_group.database[0].id]
backup_retention_period = var.environment == "production" ? 7 : 1
skip_final_snapshot = var.environment != "production"
deletion_protection = var.environment == "production"
tags = var.tags
}
resource "random_password" "db" {
count = var.enable_database ? 1 : 0
length = 32
special = false
}
resource "aws_secretsmanager_secret" "db_password" {
count = var.enable_database ? 1 : 0
name = "${var.service_name}-${var.environment}-db-password"
tags = var.tags
}
resource "aws_secretsmanager_secret_version" "db_password" {
count = var.enable_database ? 1 : 0
secret_id = aws_secretsmanager_secret.db_password[0].id
secret_string = random_password.db[0].result
}
resource "aws_security_group" "database" {
count = var.enable_database ? 1 : 0
name_prefix = "${var.service_name}-${var.environment}-db-"
vpc_id = var.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.compute.id]
}
tags = merge(var.tags, {
Name = "${var.service_name}-${var.environment}-db"
})
}
monitoring.tf
# modules/nodejs-microservice/monitoring.tf
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
count = var.enable_monitoring && var.compute_type == "ecs" ? 1 : 0
alarm_name = "${var.service_name}-${var.environment}-high-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "HTTPCode_Target_5XX_Count"
namespace = "AWS/ApplicationELB"
period = 300
statistic = "Sum"
threshold = 10
alarm_actions = var.alarm_sns_topic_arns
tags = var.tags
}
resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
count = var.enable_monitoring && var.compute_type == "lambda" ? 1 : 0
alarm_name = "${var.service_name}-${var.environment}-lambda-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "Errors"
namespace = "AWS/Lambda"
period = 300
statistic = "Sum"
threshold = 5
alarm_actions = var.alarm_sns_topic_arns
dimensions = {
FunctionName = aws_lambda_function.this[0].function_name
}
tags = var.tags
}
resource "aws_cloudwatch_metric_alarm" "db_cpu" {
count = var.enable_monitoring && var.enable_database ? 1 : 0
alarm_name = "${var.service_name}-${var.environment}-db-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 3
metric_name = "CPUUtilization"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 80
alarm_actions = var.alarm_sns_topic_arns
dimensions = {
DBInstanceIdentifier = aws_db_instance.this[0].identifier
}
tags = var.tags
}
resource "aws_cloudwatch_metric_alarm" "db_storage" {
count = var.enable_monitoring && var.enable_database ? 1 : 0
alarm_name = "${var.service_name}-${var.environment}-db-storage"
comparison_operator = "LessThanThreshold"
evaluation_periods = 1
metric_name = "FreeStorageSpace"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 5000000000 # 5 GB
alarm_actions = var.alarm_sns_topic_arns
dimensions = {
DBInstanceIdentifier = aws_db_instance.this[0].identifier
}
tags = var.tags
}
main.tf and outputs.tf
# modules/nodejs-microservice/main.tf
data "aws_region" "current" {}
resource "aws_security_group" "compute" {
name_prefix = "${var.service_name}-${var.environment}-compute-"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(var.tags, {
Name = "${var.service_name}-${var.environment}-compute"
})
}
resource "aws_cloudwatch_log_group" "this" {
name = "/${var.compute_type}/${var.service_name}-${var.environment}"
retention_in_days = var.environment == "production" ? 90 : 14
tags = var.tags
}
# modules/nodejs-microservice/outputs.tf
output "compute_type" {
description = "The compute platform used (ecs or lambda)"
value = var.compute_type
}
output "service_endpoint" {
description = "Service endpoint URL"
value = var.compute_type == "lambda" ? try(aws_lambda_function.this[0].invoke_arn, "") : ""
}
output "ecs_service_name" {
description = "ECS service name (empty if Lambda)"
value = var.compute_type == "ecs" ? try(aws_ecs_service.this[0].name, "") : ""
}
output "lambda_function_name" {
description = "Lambda function name (empty if ECS)"
value = var.compute_type == "lambda" ? try(aws_lambda_function.this[0].function_name, "") : ""
}
output "database_endpoint" {
description = "RDS endpoint (empty if database not enabled)"
value = var.enable_database ? aws_db_instance.this[0].endpoint : ""
}
output "database_password_secret_arn" {
description = "ARN of the Secrets Manager secret containing the database password"
value = var.enable_database ? aws_secretsmanager_secret.db_password[0].arn : ""
}
output "log_group_name" {
description = "CloudWatch log group name"
value = aws_cloudwatch_log_group.this.name
}
output "security_group_id" {
description = "Security group ID for the compute resources"
value = aws_security_group.compute.id
}
Calling the Module
Here is how application teams consume this module. An Express.js API deployed to ECS with a database and monitoring:
# environments/production/main.tf
module "user_api" {
source = "../../modules/nodejs-microservice"
service_name = "user-api"
environment = "production"
compute_type = "ecs"
# Network
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
public_subnet_ids = module.vpc.public_subnet_ids
# ECS
ecs_cluster_id = module.cluster.cluster_id
ecs_cluster_name = module.cluster.cluster_name
ecr_repository_url = "123456789.dkr.ecr.us-east-1.amazonaws.com/user-api"
image_tag = "v2.3.1"
cpu = 512
memory = 1024
desired_count = 3
container_port = 3000
# Database
enable_database = true
db_instance_class = "db.t3.small"
db_allocated_storage = 50
db_name = "users"
# Monitoring
enable_monitoring = true
alarm_sns_topic_arns = [aws_sns_topic.alerts.arn]
environment_variables = {
NODE_ENV = "production"
PORT = "3000"
LOG_LEVEL = "info"
REDIS_URL = module.redis.endpoint
}
tags = {
Team = "backend"
Environment = "production"
Service = "user-api"
}
}
# A lightweight webhook handler deployed to Lambda (no database needed)
module "webhook_handler" {
source = "../../modules/nodejs-microservice"
service_name = "webhook-handler"
environment = "production"
compute_type = "lambda"
# Network
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
# Lambda
lambda_s3_bucket = "my-deployments"
lambda_s3_key = "webhook-handler/v1.0.0.zip"
lambda_timeout = 15
lambda_memory_size = 128
# No database
enable_database = false
# Monitoring
enable_monitoring = true
alarm_sns_topic_arns = [aws_sns_topic.alerts.arn]
environment_variables = {
NODE_ENV = "production"
WEBHOOK_SECRET = "ssm:/production/webhook-handler/secret"
}
tags = {
Team = "integrations"
Environment = "production"
Service = "webhook-handler"
}
}
And here is a simple Node.js health check endpoint that works with both compute types:
// index.js -- health check handler compatible with both ECS and Lambda
var http = require("http");
var port = process.env.PORT || 3000;
function handleRequest(req, res) {
if (req.url === "/health" && req.method === "GET") {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({
status: "healthy",
service: process.env.SERVICE_NAME || "unknown",
environment: process.env.NODE_ENV || "development",
uptime: process.uptime(),
timestamp: new Date().toISOString()
}));
return;
}
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ message: "OK" }));
}
// ECS mode: run as HTTP server
if (process.env.COMPUTE_TYPE !== "lambda") {
var server = http.createServer(handleRequest);
server.listen(port, function() {
console.log("Server listening on port " + port);
});
}
// Lambda mode: export handler
exports.handler = function(event, context, callback) {
var response = {
statusCode: 200,
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
status: "healthy",
service: process.env.SERVICE_NAME || "unknown",
requestId: context.awsRequestId
})
};
callback(null, response);
};
Common Issues and Troubleshooting
1. Module Source Not Found
Error: Failed to download module
Could not download module "vpc" (main.tf:3) source code from
"git::https://github.com/your-org/terraform-modules.git//modules/vpc?ref=v1.0.0":
error downloading
'https://github.com/your-org/terraform-modules.git?ref=v1.0.0': /usr/bin/git
exited with 128: fatal: could not read Username for 'https://github.com':
terminal prompts disabled
Fix: Configure Git credentials for HTTPS, or use SSH source URLs:
# Use SSH instead of HTTPS
module "vpc" {
source = "git::[email protected]:your-org/terraform-modules.git//modules/vpc?ref=v1.0.0"
}
In CI pipelines, configure the GIT_SSH_COMMAND environment variable or use a GitHub App token with git config.
2. Module Version Conflicts with State
Error: Resource instance managed by newer provider version
The current state of module.vpc.aws_vpc.this was created by a newer provider
version than is currently selected. Upgrade the hashicorp/aws provider to work
with this resource.
Fix: This happens when a module upgrade bumps the provider version and someone runs terraform apply, then another team member with an older provider tries to plan. Pin provider versions in your module's versions.tf and ensure all team members use the same version. Use a .terraform-version file with tfenv to enforce the Terraform version.
3. Count vs For_Each State Migration
Error: Invalid index
on main.tf line 15, in module "service":
15: vpc_id = module.vpc[0].vpc_id
module.vpc does not have an element with a count of 0
Fix: If you change a module from using count to for_each (or vice versa), Terraform sees the resources as new and wants to destroy the old ones and create new ones. Use terraform state mv to migrate:
# Moving from count to for_each
terraform state mv 'module.service[0]' 'module.service["api"]'
terraform state mv 'module.service[1]' 'module.service["worker"]'
Always plan these migrations carefully. Run terraform plan first to confirm no resources will be destroyed.
4. Circular Dependency Between Modules
Error: Cycle: module.service.aws_security_group.compute,
module.database.aws_security_group.db, module.service.aws_ecs_service.this
Fix: Circular dependencies happen when module A references module B's output, and module B references module A's output. Break the cycle by extracting the shared resource (usually a security group) into its own module or the root module:
# Create the security group in the root module
resource "aws_security_group" "service" {
name_prefix = "service-"
vpc_id = module.vpc.vpc_id
}
# Pass it to both modules
module "service" {
source = "./modules/service"
security_group_id = aws_security_group.service.id
}
module "database" {
source = "./modules/database"
allowed_security_groups = [aws_security_group.service.id]
}
5. Provider Configuration in Modules
Error: Module module.vpc contains provider configuration
Providers cannot be configured within modules using count, for_each or depends_on.
Fix: Never put provider blocks inside reusable modules. Instead, pass providers from the root module:
# Root module
provider "aws" {
region = "us-east-1"
}
provider "aws" {
alias = "west"
region = "us-west-2"
}
module "vpc_west" {
source = "./modules/vpc"
providers = {
aws = aws.west
}
name = "west-vpc"
vpc_cidr = "10.1.0.0/16"
}
Best Practices
Keep modules focused. A module should represent one logical concept. A VPC module should not also create an ECS cluster. If you cannot describe what a module does in one sentence, it is doing too much.
Use semantic versioning for every release. Tag your modules in Git with
v1.0.0format. Never use branch references likeref=mainin production -- a branch is a moving target that can break your infrastructure without warning.Always pin provider versions in modules. Use version constraints like
>= 5.0, < 6.0to allow patch updates but prevent major version surprises. Your CI pipeline should test with the latest allowed version weekly.Validate inputs aggressively. Every variable that has constraints should have a
validationblock. Catching a bad CIDR block at plan time saves 10 minutes of waiting for an API error during apply.Write examples, not just documentation. A working example in
examples/simple/main.tfis worth more than a page of README. New users copy-paste examples; they skim documentation.Test with real infrastructure. Unit testing Terraform with mocks catches syntax errors but misses the API interactions that actually break. Invest in Terratest running against a sandbox AWS account. The cost of a few test VPCs per day is negligible compared to the cost of a broken production deployment.
Use
for_eachovercountfor resources that might change. Withcount, removing an item from the middle of a list causes all subsequent resources to be destroyed and recreated. Withfor_eachusing a map, each resource is independently addressable by key.Expose the minimum necessary outputs. Every output is a public API contract. Once another team depends on an output, removing it is a breaking change. Start with fewer outputs and add more when requested.
Never hardcode AWS account IDs, regions, or ARN partitions. Use data sources like
data.aws_caller_identity.currentanddata.aws_region.currentto make modules portable across accounts and regions.Run
terraform fmtandterraform validatein CI. Format enforcement eliminates style debates in code reviews. Validation catches syntax errors before they reach plan.
References
- Terraform Module Documentation -- Official guide to module syntax and usage
- Terraform Registry -- Community and official module registry
- Terratest Documentation -- Go library for infrastructure testing
- terraform-docs -- Auto-generate documentation from Terraform modules
- Terraform Best Practices -- Community-maintained best practices guide
- Semantic Versioning -- Versioning specification for module releases
- AWS Provider Documentation -- Terraform AWS provider reference