- Published at
IaC at Scale: 7 Patterns That Prevent Your Terraform from Becoming a Nightmare
Battle-tested module structures, state management strategies, and code organization patterns that keep infrastructure manageable as your platform grows.
Table of Contents
- Pattern 1: The Module Hierarchy That Actually Works
- Pattern 2: State Management That Doesn’t Break
- Separate State Files by Blast Radius
- Use Remote State Data Sources
- Pattern 3: Variable Validation That Prevents Disasters
- Pattern 4: The Configuration Strategy That Scales
- Pattern 5: Tagging Strategy That Actually Works
- Pattern 6: Testing Infrastructure Code
- Pattern 7: Documentation That Developers Actually Read
- The Implementation Strategy
- Common Mistakes to Avoid
I’ve seen too many Terraform codebases that started clean and organized, only to become unmaintainable messes six months later. The problem isn’t Terraform—it’s how we organize and structure our infrastructure code as it scales.
Here are seven patterns that will keep your Terraform manageable, even as your platform grows to hundreds of resources and multiple teams.
Pattern 1: The Module Hierarchy That Actually Works
Most teams start with flat module structures and regret it later. Here’s a hierarchy that scales:
modules/
├── foundations/ # Core infrastructure
│ ├── networking/
│ ├── security/
│ └── monitoring/
├── services/ # Application-specific modules
│ ├── web-service/
│ ├── database/
│ └── cache/
└── compositions/ # Higher-level compositions
├── environment/
└── application-stack/
Foundation modules handle core infrastructure that rarely changes. Service modules are reusable components for common patterns. Composition modules combine multiple services into complete environments.
Pattern 2: State Management That Doesn’t Break
The biggest Terraform disasters I’ve seen come from poor state management. Here’s what works:
Separate State Files by Blast Radius
# environments/prod/networking/main.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "prod/networking/terraform.tfstate"
region = "us-west-2"
}
}
# environments/prod/applications/api/main.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "prod/applications/api/terraform.tfstate"
region = "us-west-2"
}
}
Rule of thumb: If a mistake could take down your entire environment, it should be in a separate state file.
Use Remote State Data Sources
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "prod/networking/terraform.tfstate"
region = "us-west-2"
}
}
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_id
# ...
}
This creates clear dependencies between layers without tight coupling.
Pattern 3: Variable Validation That Prevents Disasters
Add validation rules to catch mistakes early:
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains([
"dev", "staging", "prod"
], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_type" {
description = "EC2 instance type"
type = string
validation {
condition = can(regex("^[tm][0-9]", var.instance_type))
error_message = "Instance type must start with 't' or 'm' followed by a number."
}
}
Pattern 4: The Configuration Strategy That Scales
Don’t put configuration in your modules. Use a data-driven approach:
# config/environments.yaml
environments:
dev:
instance_type: "t3.micro"
min_size: 1
max_size: 2
prod:
instance_type: "m5.large"
min_size: 3
max_size: 10
# main.tf
locals {
config = yamldecode(file("${path.module}/config/environments.yaml"))
env_config = local.config.environments[var.environment]
}
module "web_service" {
source = "./modules/web-service"
instance_type = local.env_config.instance_type
min_size = local.env_config.min_size
max_size = local.env_config.max_size
}
This makes it easy to see configuration differences between environments and reduces code duplication.
Pattern 5: Tagging Strategy That Actually Works
Consistent tagging is crucial for cost management and resource organization:
# modules/common/locals.tf
locals {
common_tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
Owner = var.team_name
CostCenter = var.cost_center
CreatedDate = formatdate("YYYY-MM-DD", timestamp())
}
}
# In your resources
resource "aws_instance" "app" {
# ... other configuration
tags = merge(local.common_tags, {
Name = "${var.project_name}-${var.environment}-app"
Type = "application"
})
}
Pattern 6: Testing Infrastructure Code
Yes, you should test your Terraform. Here’s a practical approach using Terratest:
func TestWebServiceModule(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../modules/web-service",
Vars: map[string]interface{}{
"environment": "test",
"project_name": "test-project",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
// Test that the instance was created
instanceId := terraform.Output(t, terraformOptions, "instance_id")
aws.GetEc2Instance(t, "us-west-2", instanceId)
}
Pattern 7: Documentation That Developers Actually Read
Auto-generate documentation using terraform-docs:
# In your module directory
terraform-docs markdown table --output-file README.md .
This creates documentation that stays in sync with your code:
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| environment | Environment name | `string` | n/a | yes |
| instance_type | EC2 instance type | `string` | `"t3.micro"` | no |
The Implementation Strategy
Don’t try to implement all these patterns at once. Here’s a practical rollout plan:
- Week 1-2: Implement proper state separation
- Week 3-4: Add variable validation and common tagging
- Week 5-6: Refactor into the module hierarchy
- Week 7-8: Add configuration management and documentation
Common Mistakes to Avoid
Don’t over-modularize early. Start with simple, working code and refactor into modules when you see patterns emerging.
Don’t ignore state file organization. It’s much harder to fix later than to get right from the beginning.
Don’t skip validation. Those extra lines of validation code will save you hours of debugging later.
Remember: good Terraform code is like good software—it’s organized, tested, and documented. These patterns will help you get there without the pain of major refactoring later.