Secret Management in IaC
Secure secrets in infrastructure as code with Vault, AWS Secrets Manager, SOPS, and automated rotation strategies
Secret Management in IaC
Managing secrets in infrastructure as code is the problem that separates production-grade deployments from ticking time bombs. The moment you define infrastructure in code, every database password, API key, and TLS certificate needs a path from secure storage into your running system without touching plain text in version control, state files, or CI logs. This article covers the full spectrum of secret management strategies for Terraform-based infrastructure, from Vault integration to automated rotation, with working examples you can deploy today.
Prerequisites
- Terraform 1.5+ installed
- Basic understanding of Terraform providers and state
- Node.js 18+ for scripting examples
- Docker installed (for running Vault locally)
- An AWS or Azure account for cloud-specific sections
- Familiarity with Git and CI/CD concepts
The Secret Sprawl Problem
Secret sprawl is what happens when credentials replicate across your infrastructure like bacteria in a petri dish. It starts innocently. A database password in a .env file. An API key hardcoded in a Terraform variable. A TLS certificate copied to three different servers. Before you know it, you have the same secret in twelve places and no idea who has access to any of them.
The root causes are predictable:
- No centralized secret store. Teams store secrets wherever is convenient: environment variables, config files, Terraform
.tfvars, CI/CD platform secrets, sticky notes. - Static secrets with no rotation. A database password set two years ago is still the same password today.
- Overly broad access. The CI/CD pipeline has admin credentials because someone needed to debug a deployment at 2 AM and never scoped it back down.
- Terraform state as a secret leak. Every
terraform applywrites sensitive values into the state file in plain text.
The cost of sprawl is real. When a secret leaks, you cannot rotate it in one place. You have to hunt through every system that references it, update each one, and pray you did not miss a Lambda function someone deployed from their laptop six months ago.
Secrets in Terraform State
This is the problem most teams discover too late. Terraform state files contain every resource attribute, including sensitive ones. Run terraform show after provisioning an RDS instance and you will see the master password in plain text.
# This password ends up in plain text in terraform.tfstate
resource "aws_db_instance" "main" {
identifier = "production-db"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.medium"
username = "admin"
password = "SuperSecret123!" # DO NOT DO THIS
}
The state file is the single biggest secret leak vector in Terraform. Mitigations:
Use remote state with encryption. S3 with server-side encryption and DynamoDB locking is the baseline for AWS:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-lock"
kms_key_id = "arn:aws:kms:us-east-1:123456789:key/abcd-1234"
}
}
Restrict state access. The S3 bucket policy should allow only the CI/CD role and a break-glass admin role. Nobody else reads state directly.
Never commit state files. Your .gitignore must include *.tfstate and *.tfstate.backup. This is non-negotiable.
HashiCorp Vault Integration
Vault is the gold standard for secret management in Terraform workflows. It provides centralized storage, fine-grained access control, audit logging, and dynamic secret generation.
Setting Up the Vault Provider
provider "vault" {
address = "https://vault.internal.company.com:8200"
# Auth via VAULT_TOKEN env var or AppRole
}
# Read a static secret from Vault KV v2
data "vault_kv_secret_v2" "database" {
mount = "secret"
name = "production/database"
}
resource "aws_db_instance" "main" {
identifier = "production-db"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.medium"
username = data.vault_kv_secret_v2.database.data["username"]
password = data.vault_kv_secret_v2.database.data["password"]
}
AppRole Authentication for CI/CD
Never use root tokens in automation. AppRole gives your pipeline a scoped identity:
provider "vault" {
address = "https://vault.internal.company.com:8200"
auth_login {
path = "auth/approle/login"
parameters = {
role_id = var.vault_role_id
secret_id = var.vault_secret_id
}
}
}
The role_id is not secret and can live in your Terraform variables. The secret_id is sensitive and should be injected by the CI/CD platform at runtime. In GitHub Actions:
- name: Terraform Apply
env:
TF_VAR_vault_role_id: ${{ vars.VAULT_ROLE_ID }}
TF_VAR_vault_secret_id: ${{ secrets.VAULT_SECRET_ID }}
run: terraform apply -auto-approve
AWS Secrets Manager with Terraform
If you are already on AWS and do not want to operate Vault, Secrets Manager is a solid managed alternative:
# Create a secret
resource "aws_secretsmanager_secret" "db_password" {
name = "production/db-password"
recovery_window_in_days = 7
kms_key_id = aws_kms_key.secrets.arn
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = jsonencode({
username = "admin"
password = random_password.db.result
})
}
# Generate a random password - never hardcode
resource "random_password" "db" {
length = 32
special = true
}
# Reference in another resource
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
}
locals {
db_creds = jsondecode(data.aws_secretsmanager_secret_version.db_password.secret_string)
}
The critical detail: even with Secrets Manager, the random_password value still lands in Terraform state. The difference is that the canonical source of truth is Secrets Manager, and applications read from there at runtime rather than from Terraform outputs.
Azure Key Vault with Terraform
Azure Key Vault integrates cleanly with Terraform through the azurerm provider:
data "azurerm_client_config" "current" {}
resource "azurerm_key_vault" "main" {
name = "prod-secrets-kv"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
purge_protection_enabled = true
}
resource "azurerm_key_vault_access_policy" "terraform" {
key_vault_id = azurerm_key_vault.main.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = data.azurerm_client_config.current.object_id
secret_permissions = ["Get", "List", "Set", "Delete"]
}
resource "azurerm_key_vault_secret" "db_password" {
name = "db-password"
value = random_password.db.result
key_vault_id = azurerm_key_vault.main.id
depends_on = [azurerm_key_vault_access_policy.terraform]
}
For applications running on Azure, use Managed Identity to read secrets at runtime. No credentials needed in the application at all:
var { SecretClient } = require("@azure/keyvault-secrets");
var { DefaultAzureCredential } = require("@azure/identity");
var credential = new DefaultAzureCredential();
var client = new SecretClient("https://prod-secrets-kv.vault.azure.net", credential);
function getDbPassword() {
return client.getSecret("db-password").then(function(secret) {
return secret.value;
});
}
SOPS for Encrypted Files
Mozilla SOPS encrypts values in structured files while keeping keys visible. This lets you store encrypted secrets in Git and review diffs meaningfully.
Install SOPS and create a .sops.yaml configuration:
# .sops.yaml
creation_rules:
- path_regex: \.enc\.json$
kms: "arn:aws:kms:us-east-1:123456789:key/abcd-1234"
- path_regex: \.enc\.yaml$
pgp: "FBC7B9E2A4F9289AC0C1D4843D16CEE4A27381B4"
Encrypt a secrets file:
sops --encrypt secrets.json > secrets.enc.json
Use the sops Terraform provider to decrypt at plan time:
terraform {
required_providers {
sops = {
source = "carlpett/sops"
version = "~> 0.7"
}
}
}
data "sops_file" "secrets" {
source_file = "secrets.enc.json"
}
resource "aws_db_instance" "main" {
identifier = "production-db"
engine = "postgres"
instance_class = "db.t3.medium"
username = data.sops_file.secrets.data["db_username"]
password = data.sops_file.secrets.data["db_password"]
}
SOPS is excellent for small teams that need encrypted secrets in Git without running a Vault cluster. The tradeoff is that secrets are still static and require manual rotation.
git-crypt for Repository Encryption
git-crypt provides transparent encryption for specific files in a Git repository. Unlike SOPS, it encrypts entire files rather than individual values.
# Initialize git-crypt in your repo
git-crypt init
# Add GPG keys for team members
git-crypt add-gpg-user [email protected]
# Define which files to encrypt
# .gitattributes
# secrets/** filter=git-crypt diff=git-crypt
# *.tfvars filter=git-crypt diff=git-crypt
The .gitattributes file controls what gets encrypted:
secrets/** filter=git-crypt diff=git-crypt
*.secret.tfvars filter=git-crypt diff=git-crypt
Now terraform.secret.tfvars is encrypted at rest in Git but decrypted transparently on developer machines and CI runners that have the key.
The limitation: git-crypt encrypts entire files, so diffs are meaningless for encrypted content. Use SOPS if you need reviewable diffs on encrypted data.
Environment Variable Injection in CI/CD
The CI/CD platform is the last mile of secret delivery. Every major platform provides encrypted secret storage, but the implementation details matter.
GitHub Actions
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Terraform Apply
env:
TF_VAR_vault_token: ${{ secrets.VAULT_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ env.AWS_ACCESS_KEY_ID }}
run: |
terraform init
terraform apply -auto-approve
GitLab CI
deploy:
stage: deploy
environment: production
variables:
TF_VAR_vault_token: $VAULT_TOKEN
script:
- terraform init
- terraform apply -auto-approve
rules:
- if: $CI_COMMIT_BRANCH == "main"
Critical rule: Never echo, print, or log environment variables in CI scripts. A single env or printenv command dumps every secret to the build log. Mask variables in your CI platform settings and audit your scripts for accidental exposure.
Dynamic Secrets with Vault
Static secrets are a liability. Dynamic secrets are generated on demand, scoped to a specific consumer, and automatically revoked after a TTL. This is where Vault shines.
Database Dynamic Secrets
Configure Vault to generate short-lived database credentials:
# Enable the database secrets engine
vault secrets enable database
# Configure PostgreSQL connection
vault write database/config/production \
plugin_name=postgresql-database-plugin \
connection_url="postgresql://{{username}}:{{password}}@db.internal:5432/production" \
allowed_roles="app-readonly","app-readwrite" \
username="vault_admin" \
password="vault_admin_password"
# Create a role that generates read-only credentials
vault write database/roles/app-readonly \
db_name=production \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
In Terraform, request a dynamic credential:
data "vault_generic_secret" "db_creds" {
path = "database/creds/app-readonly"
}
output "db_username" {
value = data.vault_generic_secret.db_creds.data["username"]
sensitive = true
}
In your Node.js application, request credentials directly from Vault:
var vault = require("node-vault")({
apiVersion: "v1",
endpoint: process.env.VAULT_ADDR,
token: process.env.VAULT_TOKEN
});
function getDatabaseCredentials() {
return vault.read("database/creds/app-readonly").then(function(result) {
var creds = result.data;
console.log("Lease ID:", result.lease_id);
console.log("Lease duration:", result.lease_duration, "seconds");
return {
username: creds.username,
password: creds.password,
leaseId: result.lease_id
};
});
}
function renewLease(leaseId) {
return vault.write("sys/leases/renew", {
lease_id: leaseId,
increment: 3600
});
}
// Renew before expiry
function startLeaseRenewal(leaseId, intervalSeconds) {
var renewalInterval = setInterval(function() {
renewLease(leaseId).catch(function(err) {
console.error("Lease renewal failed:", err.message);
clearInterval(renewalInterval);
// Re-acquire credentials
getDatabaseCredentials();
});
}, intervalSeconds * 1000);
}
Dynamic secrets eliminate the rotation problem entirely. When a credential expires, the application requests a new one. When a credential leaks, it expires automatically. No human intervention required.
Secret Rotation Automation
For systems that require static secrets (third-party API keys, legacy database passwords), automated rotation is essential. AWS Secrets Manager provides built-in rotation with Lambda:
resource "aws_secretsmanager_secret_rotation" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
rotation_lambda_arn = aws_lambda_function.rotate_secret.arn
rotation_rules {
automatically_after_days = 30
}
}
resource "aws_lambda_function" "rotate_secret" {
filename = "rotate_secret.zip"
function_name = "rotate-db-password"
role = aws_iam_role.rotation_lambda.arn
handler = "index.handler"
runtime = "nodejs20.x"
timeout = 30
environment {
variables = {
DB_HOST = aws_db_instance.main.address
}
}
}
The rotation Lambda follows a four-step protocol: createSecret, setSecret, testSecret, finishSecret. Here is a minimal Node.js implementation:
var AWS = require("aws-sdk");
var { Client } = require("pg");
var crypto = require("crypto");
var secretsManager = new AWS.SecretsManager();
exports.handler = function(event, context, callback) {
var secretId = event.SecretId;
var step = event.Step;
switch (step) {
case "createSecret":
createSecret(secretId).then(function() {
callback(null, "createSecret complete");
}).catch(callback);
break;
case "setSecret":
setSecret(secretId).then(function() {
callback(null, "setSecret complete");
}).catch(callback);
break;
case "testSecret":
testSecret(secretId).then(function() {
callback(null, "testSecret complete");
}).catch(callback);
break;
case "finishSecret":
finishSecret(secretId).then(function() {
callback(null, "finishSecret complete");
}).catch(callback);
break;
default:
callback(new Error("Unknown step: " + step));
}
};
function createSecret(secretId) {
var newPassword = crypto.randomBytes(32).toString("base64url");
return secretsManager.getSecretValue({
SecretId: secretId,
VersionStage: "AWSCURRENT"
}).promise().then(function(current) {
var secret = JSON.parse(current.SecretString);
secret.password = newPassword;
return secretsManager.putSecretValue({
SecretId: secretId,
SecretString: JSON.stringify(secret),
VersionStages: ["AWSPENDING"],
ClientRequestToken: current.VersionId
}).promise();
});
}
function setSecret(secretId) {
return secretsManager.getSecretValue({
SecretId: secretId,
VersionStage: "AWSPENDING"
}).promise().then(function(pending) {
var secret = JSON.parse(pending.SecretString);
var client = new Client({
host: process.env.DB_HOST,
user: secret.username,
password: secret.password,
database: "production"
});
// ALTER ROLE to set the new password on the database
return secretsManager.getSecretValue({
SecretId: secretId,
VersionStage: "AWSCURRENT"
}).promise().then(function(current) {
var currentSecret = JSON.parse(current.SecretString);
var adminClient = new Client({
host: process.env.DB_HOST,
user: currentSecret.username,
password: currentSecret.password,
database: "production"
});
return adminClient.connect().then(function() {
return adminClient.query(
"ALTER ROLE " + secret.username + " WITH PASSWORD '" + secret.password + "'"
);
}).then(function() {
return adminClient.end();
});
});
});
}
function testSecret(secretId) {
return secretsManager.getSecretValue({
SecretId: secretId,
VersionStage: "AWSPENDING"
}).promise().then(function(pending) {
var secret = JSON.parse(pending.SecretString);
var client = new Client({
host: process.env.DB_HOST,
user: secret.username,
password: secret.password,
database: "production"
});
return client.connect().then(function() {
return client.query("SELECT 1");
}).then(function() {
return client.end();
});
});
}
function finishSecret(secretId) {
return secretsManager.describeSecret({ SecretId: secretId })
.promise().then(function(metadata) {
var versions = metadata.VersionIdsToStages;
var currentVersion, pendingVersion;
Object.keys(versions).forEach(function(versionId) {
if (versions[versionId].indexOf("AWSCURRENT") !== -1) {
currentVersion = versionId;
}
if (versions[versionId].indexOf("AWSPENDING") !== -1) {
pendingVersion = versionId;
}
});
return secretsManager.updateSecretVersionStage({
SecretId: secretId,
VersionStage: "AWSCURRENT",
MoveToVersionId: pendingVersion,
RemoveFromVersionId: currentVersion
}).promise();
});
}
Terraform Sensitive Variables and Outputs
Terraform provides the sensitive flag to prevent values from appearing in plan output and CLI logs. Use it everywhere secrets are involved:
variable "db_password" {
type = string
sensitive = true
}
variable "api_key" {
type = string
sensitive = true
validation {
condition = length(var.api_key) >= 20
error_message = "API key must be at least 20 characters."
}
}
output "connection_string" {
value = "postgresql://${local.db_user}:${var.db_password}@${aws_db_instance.main.endpoint}/production"
sensitive = true
}
The sensitive flag is not encryption. It only controls display. The values are still in state. It prevents accidental exposure in terraform plan output and CI logs, but anyone with state access can read them.
Preventing Secret Leaks in Plan Output
Even with sensitive = true, secrets can leak through interpolation in non-sensitive contexts. Terraform 1.0+ catches most of these, but older configurations may still expose values.
Additional safeguards:
# Use nonsensitive() deliberately when you need to expose
# a derived value that is not itself secret
output "db_endpoint" {
value = aws_db_instance.main.endpoint
# This is not sensitive - it's just a hostname
}
# NEVER do this - it defeats the purpose
output "db_password_unsafe" {
value = nonsensitive(var.db_password) # DO NOT DO THIS
}
Redirect plan output to a file and restrict access:
terraform plan -out=tfplan
# The binary plan file contains secrets
# Treat it like the state file
chmod 600 tfplan
Secret Scanning in CI Pipelines
Prevention beats detection, but detection catches what prevention misses. Add secret scanning to your CI pipeline:
# GitHub Actions example with multiple scanners
name: Security Scan
on: [pull_request]
jobs:
secret-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for git-based scanning
- name: TruffleHog Scan
uses: trufflesecurity/trufflehog@main
with:
extra_args: --only-verified
- name: Gitleaks Scan
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Custom Terraform Secret Check
run: |
# Check for hardcoded secrets in Terraform files
if grep -rn 'password\s*=\s*"[^"]*"' --include="*.tf" .; then
echo "ERROR: Hardcoded passwords found in Terraform files"
exit 1
fi
if grep -rn 'secret_key\s*=\s*"[^"]*"' --include="*.tf" .; then
echo "ERROR: Hardcoded secret keys found in Terraform files"
exit 1
fi
For a Node.js-based pre-commit hook:
#!/usr/bin/env node
var { execSync } = require("child_process");
var fs = require("fs");
var PATTERNS = [
{ name: "AWS Access Key", regex: /AKIA[0-9A-Z]{16}/ },
{ name: "AWS Secret Key", regex: /[0-9a-zA-Z/+]{40}/ },
{ name: "Private Key", regex: /-----BEGIN (RSA |EC )?PRIVATE KEY-----/ },
{ name: "Generic Password", regex: /password\s*[:=]\s*["'][^"']{8,}["']/i },
{ name: "Connection String", regex: /postgresql:\/\/[^:]+:[^@]+@/ },
{ name: "Vault Token", regex: /hvs\.[a-zA-Z0-9]{24,}/ }
];
var stagedFiles = execSync("git diff --cached --name-only --diff-filter=ACMR")
.toString()
.trim()
.split("\n")
.filter(function(f) { return f.length > 0; });
var violations = [];
stagedFiles.forEach(function(file) {
if (file.match(/\.(tf|tfvars|js|json|yaml|yml|env)$/)) {
var content = fs.readFileSync(file, "utf8");
var lines = content.split("\n");
lines.forEach(function(line, index) {
PATTERNS.forEach(function(pattern) {
if (pattern.regex.test(line)) {
violations.push({
file: file,
line: index + 1,
pattern: pattern.name,
content: line.trim().substring(0, 80)
});
}
});
});
}
});
if (violations.length > 0) {
console.error("\nPotential secrets detected in staged files:\n");
violations.forEach(function(v) {
console.error(" " + v.file + ":" + v.line + " - " + v.pattern);
console.error(" " + v.content + "\n");
});
console.error("Commit blocked. Remove secrets before committing.");
process.exit(1);
}
console.log("No secrets detected in staged files.");
process.exit(0);
Complete Working Example
This example provisions a complete infrastructure stack with secrets managed through Vault, automatic rotation, and zero plain-text exposure.
Project Structure
infrastructure/
main.tf
providers.tf
vault.tf
database.tf
application.tf
variables.tf
outputs.tf
providers.tf
terraform {
required_version = ">= 1.5"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
vault = {
source = "hashicorp/vault"
version = "~> 3.20"
}
random = {
source = "hashicorp/random"
version = "~> 3.5"
}
}
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-lock"
}
}
provider "aws" {
region = var.aws_region
}
provider "vault" {
address = var.vault_address
auth_login {
path = "auth/approle/login"
parameters = {
role_id = var.vault_role_id
secret_id = var.vault_secret_id
}
}
}
vault.tf
# Enable the database secrets engine for dynamic credentials
resource "vault_mount" "database" {
path = "database"
type = "database"
}
# Configure Vault to talk to our RDS instance
resource "vault_database_secret_backend_connection" "postgres" {
backend = vault_mount.database.path
name = "production"
allowed_roles = ["app-readonly", "app-readwrite"]
postgresql {
connection_url = "postgresql://{{username}}:{{password}}@${aws_db_instance.main.endpoint}/${var.db_name}"
username = "vault_admin"
password = random_password.vault_db_admin.result
}
depends_on = [aws_db_instance.main]
}
# Read-only role for application queries
resource "vault_database_secret_backend_role" "app_readonly" {
backend = vault_mount.database.path
name = "app-readonly"
db_name = vault_database_secret_backend_connection.postgres.name
default_ttl = 3600 # 1 hour
max_ttl = 86400 # 24 hours
creation_statements = [
"CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';",
"GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";",
"ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO \"{{name}}\";"
]
revocation_statements = [
"REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM \"{{name}}\";",
"DROP ROLE IF EXISTS \"{{name}}\";"
]
}
# Read-write role for migrations and admin tasks
resource "vault_database_secret_backend_role" "app_readwrite" {
backend = vault_mount.database.path
name = "app-readwrite"
db_name = vault_database_secret_backend_connection.postgres.name
default_ttl = 1800 # 30 minutes
max_ttl = 3600 # 1 hour
creation_statements = [
"CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';",
"GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO \"{{name}}\";",
"GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO \"{{name}}\";"
]
}
# Store the application API keys in Vault KV
resource "vault_kv_secret_v2" "app_secrets" {
mount = "secret"
name = "production/app"
data_json = jsonencode({
stripe_api_key = var.stripe_api_key
sendgrid_api_key = var.sendgrid_api_key
jwt_signing_key = random_password.jwt_key.result
})
}
# Vault policy for the application
resource "vault_policy" "app" {
name = "production-app"
policy = <<-EOT
path "database/creds/app-readonly" {
capabilities = ["read"]
}
path "secret/data/production/app" {
capabilities = ["read"]
}
EOT
}
database.tf
resource "random_password" "vault_db_admin" {
length = 32
special = false # Some RDS configurations struggle with special chars
}
resource "random_password" "jwt_key" {
length = 64
special = true
}
resource "aws_db_subnet_group" "main" {
name = "production-db"
subnet_ids = var.private_subnet_ids
}
resource "aws_security_group" "database" {
name_prefix = "production-db-"
vpc_id = var.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.application.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_db_instance" "main" {
identifier = "production-db"
engine = "postgres"
engine_version = "15.4"
instance_class = var.db_instance_class
db_name = var.db_name
username = "vault_admin"
password = random_password.vault_db_admin.result
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.database.id]
storage_encrypted = true
kms_key_id = aws_kms_key.database.arn
backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = "production-db-final"
# Prevent password from triggering replacement
lifecycle {
ignore_changes = [password]
}
}
resource "aws_kms_key" "database" {
description = "Encryption key for production database"
deletion_window_in_days = 30
enable_key_rotation = true
}
application.tf
resource "aws_security_group" "application" {
name_prefix = "production-app-"
vpc_id = var.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_ecs_task_definition" "app" {
family = "production-app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = 512
memory = 1024
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([{
name = "app"
image = "${var.ecr_repo_url}:${var.app_version}"
portMappings = [{
containerPort = 3000
protocol = "tcp"
}]
environment = [
{ name = "NODE_ENV", value = "production" },
{ name = "VAULT_ADDR", value = var.vault_address },
{ name = "DB_HOST", value = aws_db_instance.main.address },
{ name = "DB_NAME", value = var.db_name }
]
secrets = [
{
name = "VAULT_TOKEN"
valueFrom = aws_secretsmanager_secret.vault_app_token.arn
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/production-app"
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "app"
}
}
}])
}
variables.tf
variable "aws_region" {
type = string
default = "us-east-1"
}
variable "vault_address" {
type = string
}
variable "vault_role_id" {
type = string
}
variable "vault_secret_id" {
type = string
sensitive = true
}
variable "stripe_api_key" {
type = string
sensitive = true
}
variable "sendgrid_api_key" {
type = string
sensitive = true
}
variable "vpc_id" {
type = string
}
variable "private_subnet_ids" {
type = list(string)
}
variable "db_name" {
type = string
default = "production"
}
variable "db_instance_class" {
type = string
default = "db.t3.medium"
}
variable "ecr_repo_url" {
type = string
}
variable "app_version" {
type = string
default = "latest"
}
outputs.tf
output "db_endpoint" {
value = aws_db_instance.main.endpoint
description = "Database endpoint (not sensitive)"
}
output "vault_db_role" {
value = vault_database_secret_backend_role.app_readonly.name
description = "Vault role name for database access"
}
# No passwords, no tokens, no keys in outputs
Application Startup (Node.js)
var vault = require("node-vault")({
apiVersion: "v1",
endpoint: process.env.VAULT_ADDR,
token: process.env.VAULT_TOKEN
});
var express = require("express");
var { Pool } = require("pg");
var pool = null;
var leaseId = null;
var renewalTimer = null;
function initializeDatabase() {
return vault.read("database/creds/app-readonly").then(function(result) {
leaseId = result.lease_id;
pool = new Pool({
host: process.env.DB_HOST,
database: process.env.DB_NAME,
user: result.data.username,
password: result.data.password,
max: 10,
idleTimeoutMillis: 30000
});
// Renew lease every 30 minutes
renewalTimer = setInterval(function() {
vault.write("sys/leases/renew", {
lease_id: leaseId,
increment: 3600
}).catch(function(err) {
console.error("Lease renewal failed, re-acquiring credentials");
clearInterval(renewalTimer);
if (pool) pool.end();
initializeDatabase();
});
}, 30 * 60 * 1000);
console.log("Database connection established with dynamic credentials");
return pool;
});
}
function getAppSecrets() {
return vault.read("secret/data/production/app").then(function(result) {
return result.data.data;
});
}
function startServer() {
Promise.all([initializeDatabase(), getAppSecrets()])
.then(function(results) {
var secrets = results[1];
var app = express();
app.get("/health", function(req, res) {
pool.query("SELECT 1").then(function() {
res.json({ status: "healthy", db: "connected" });
}).catch(function() {
res.status(503).json({ status: "unhealthy", db: "disconnected" });
});
});
app.listen(3000, function() {
console.log("Server running on port 3000");
});
// Graceful shutdown
process.on("SIGTERM", function() {
console.log("Shutting down gracefully");
clearInterval(renewalTimer);
if (leaseId) {
vault.write("sys/leases/revoke", { lease_id: leaseId })
.then(function() {
console.log("Vault lease revoked");
process.exit(0);
});
}
});
})
.catch(function(err) {
console.error("Failed to start:", err.message);
process.exit(1);
});
}
startServer();
This setup ensures that no secret ever appears in plain text in your Terraform code, state files, or application configuration. Database credentials are dynamic and expire automatically. Application secrets live in Vault and are fetched at runtime. The state file is encrypted at rest in S3 with KMS.
Common Issues and Troubleshooting
1. "Error reading Vault secret: permission denied"
The Vault token or AppRole does not have the required policy. Check the policy path carefully. Vault paths are exact matches by default. If your secret is at secret/data/production/app (KV v2), the policy must reference secret/data/production/app, not secret/production/app. KV v2 adds the data/ prefix automatically in the API but it must be explicit in policies.
2. "Terraform wants to replace the database because the password changed"
Add lifecycle { ignore_changes = [password] } to your aws_db_instance resource. When Vault rotates the admin password or you change it manually, Terraform should not try to recreate the database. The ignore_changes directive tells Terraform to skip that attribute during plan comparison.
3. "Secrets Manager rotation Lambda times out"
The Lambda function needs network access to both Secrets Manager (via VPC endpoint or NAT gateway) and the database. Common fix: add a Secrets Manager VPC endpoint in the same VPC as your database. Also verify the Lambda security group allows outbound traffic on port 5432 to the database security group.
4. "SOPS cannot decrypt: no key available"
SOPS decryption requires access to the KMS key (for AWS) or the PGP private key. In CI/CD, ensure the pipeline IAM role has kms:Decrypt permission on the KMS key ARN referenced in .sops.yaml. Locally, verify your AWS CLI is configured with the correct profile and region.
5. "Terraform state contains sensitive values after applying sensitive flag"
The sensitive = true flag does not retroactively remove values from existing state. If secrets were previously stored without the flag, they remain in state history. You need to: (a) add the flag, (b) apply, (c) consider using terraform state pull | terraform state push to refresh, and most importantly (d) rotate the exposed secrets since they were in state without the sensitive marker.
6. "Dynamic database credentials expire mid-request"
Your application must handle credential expiry gracefully. Implement a connection pool wrapper that catches authentication errors, requests new credentials from Vault, creates a new pool, and retries the failed query. The lease TTL should be longer than your longest expected transaction, and renewal should happen well before expiry.
Best Practices
Never hardcode secrets in Terraform files. Use variables with
sensitive = true, Vault data sources, or cloud secret managers. No exceptions.Encrypt state at rest and in transit. Use S3 with KMS encryption, Azure Blob with customer-managed keys, or GCS with CMEK. Enable versioning on the state bucket for audit trails.
Prefer dynamic secrets over static secrets. Dynamic database credentials from Vault eliminate rotation complexity and limit blast radius. A leaked credential that expires in one hour is vastly less dangerous than one that lives forever.
Implement least-privilege access for CI/CD. The pipeline AppRole should only read the secrets it needs. Use separate roles for plan (read-only) and apply (read-write) stages. Never give the pipeline Vault root access.
Run secret scanning on every pull request. Tools like TruffleHog, Gitleaks, and custom regex scanners catch secrets before they reach the main branch. Make the scan a required check that blocks merge on failure.
Rotate secrets immediately upon suspected exposure. Have a documented runbook for secret rotation. Know which systems reference each secret. Practice rotation in staging before you need it in production at 3 AM.
Separate secret management from infrastructure provisioning. Secrets should be created and managed through a dedicated process, not inline with infrastructure code. Use data sources to reference secrets, not resources to create them alongside infrastructure.
Audit secret access continuously. Vault provides detailed audit logs showing who accessed what and when. Enable audit logging and route it to your SIEM. Set up alerts for unusual access patterns.
Use short-lived credentials everywhere possible. AWS STS temporary credentials, Vault dynamic secrets, and OIDC federation all provide time-limited access that reduces risk without operational burden.
Version your secrets. Both Vault KV v2 and AWS Secrets Manager support versioning. When a rotation goes wrong, you need the ability to roll back to the previous version within seconds.