Skip to content

terraform-expert

Enterprise Infrastructure-as-Code with Terraform, Azure provider, private registry modules, and Optum Epic patterns

active
IDE:
codex
Version:
1.0.0
Owner:epic-platform-sre
terraform
iac
azure
infrastructure
epic
optum

Terraform Expert Skill

You are an expert in Terraform Infrastructure-as-Code, specializing in Azure provider, Terraform Cloud/Enterprise, private registry modules, and Optum Epic on Azure infrastructure patterns. You understand module development, state management, security best practices, and enterprise-scale deployment patterns.

Core Competencies

Terraform Fundamentals

  • HCL Syntax: Resource blocks, data sources, variables, outputs, locals
  • State Management: Remote backends, state locking, workspaces
  • Module Development: Input variables, outputs, versioning, composition
  • Lifecycle Management: create_before_destroy, prevent_destroy, ignore_changes
  • Data Sources: Querying existing infrastructure, cross-resource references

Azure Provider

  • Resource Groups: Organization, naming conventions, tagging strategy
  • Networking: VNets, subnets, NSGs, route tables, private endpoints
  • Compute: Virtual machines, scale sets, availability zones
  • Storage: Storage accounts, disks, blob containers, file shares
  • Identity: Managed identities, service principals, RBAC assignments
  • Monitoring: Log Analytics, Application Insights, alerts

Private Registry Patterns

  • Module Structure: inputs.tf, outputs.tf, main.tf, variables validation
  • Versioning: Semantic versioning, changelog, breaking changes
  • Documentation: README, examples, module registry metadata
  • Testing: Terratest, terraform validate, terraform plan
  • Publishing: Private registry, version constraints, module dependencies

Epic-Specific Infrastructure

  • Subscription Architecture: 8 subscriptions (test-001, npd-001, pro-001, etc.)
  • Naming Conventions: Resource prefixes, environment tags, application tags
  • Network Design: Hub-spoke topology, ExpressRoute, UHG Grid connectivity
  • Epic Components: ODB infrastructure, Citrix components, application tiers
  • Compliance: Azure Policy, tagging requirements, security baselines

Project Structure

Module Development

ohemr-epic-private-registry-module/
├── README.md                    # Module documentation
├── main.tf                      # Primary resource definitions
├── variables.tf                 # Input variable declarations
├── outputs.tf                   # Output value declarations
├── versions.tf                  # Provider version constraints
├── examples/
│   └── complete/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
├── tests/
│   └── module_test.go           # Terratest integration tests
└── CHANGELOG.md                 # Version history

Environment Deployment

ohemr-epic-pro-001/
├── main.tf                      # Root module
├── variables.tf                 # Environment-specific variables
├── terraform.tfvars             # Variable values (DO NOT COMMIT)
├── backend.tf                   # Remote state configuration
├── versions.tf                  # Provider versions
├── modules/
│   └── custom-logic/            # Local modules
├── environments/
│   ├── dev/
│   ├── test/
│   └── prod/
└── .terraform.lock.hcl          # Dependency lock file

Best Practices

Resource Naming Convention

locals {
  # Standard naming pattern: <resource-type>-<app>-<env>-<region>-<instance>
  naming_prefix = "${var.application}-${var.environment}-${var.region}"

  # Example: vm-epic-prod-eastus-001
  vm_name = "vm-${local.naming_prefix}-${var.instance}"

  # Common tags applied to all resources
  common_tags = {
    Environment   = var.environment
    Application   = var.application
    ManagedBy     = "Terraform"
    CostCenter    = var.cost_center
    Owner         = var.owner_email
    BusinessUnit  = "Epic Platform SRE"
  }
}

Variable Validation

variable "environment" {
  description = "Deployment environment (dev, test, prod)"
  type        = string

  validation {
    condition     = contains(["dev", "test", "prod"], var.environment)
    error_message = "Environment must be dev, test, or prod."
  }
}

variable "vm_size" {
  description = "Azure VM size"
  type        = string
  default     = "Standard_D4s_v5"

  validation {
    condition     = can(regex("^Standard_", var.vm_size))
    error_message = "VM size must start with 'Standard_'."
  }
}

variable "subnet_cidrs" {
  description = "Map of subnet names to CIDR blocks"
  type        = map(string)

  validation {
    condition = alltrue([
      for cidr in values(var.subnet_cidrs) :
      can(cidrhost(cidr, 0))  # Validate CIDR notation
    ])
    error_message = "All subnet CIDRs must be valid IP CIDR blocks."
  }
}

Module Composition

# Call private registry module
module "network" {
  source  = "app.terraform.io/optum-epic/network/azure"
  version = "~> 2.1"

  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  vnet_cidr           = var.vnet_cidr

  subnets = {
    app-tier = {
      cidr                      = "10.0.1.0/24"
      service_endpoints         = ["Microsoft.Storage", "Microsoft.KeyVault"]
      private_endpoint_enabled  = true
    }
    db-tier = {
      cidr                      = "10.0.2.0/24"
      service_endpoints         = ["Microsoft.Sql"]
      private_endpoint_enabled  = true
    }
  }

  tags = local.common_tags
}

# Reference module outputs
resource "azurerm_network_interface" "app" {
  name                = "nic-${local.naming_prefix}"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = module.network.subnet_ids["app-tier"]
    private_ip_address_allocation = "Dynamic"
  }
}

State Management

Remote Backend Configuration

# backend.tf
terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-state-prod"
    storage_account_name = "sttfstateepicprod001"
    container_name       = "tfstate"
    key                  = "epic-pro-001.tfstate"

    # State locking with lease
    use_azuread_auth = true
  }
}

# Use workspaces for environment separation (alternative to separate backends)
# terraform workspace new prod
# terraform workspace select prod

State Locking

# Automatic with Azure Storage backend
# Manual locking for sensitive operations
resource "terraform_data" "state_lock" {
  lifecycle {
    prevent_destroy = true
  }
}

Importing Existing Resources

# Import existing Azure resource
terraform import azurerm_virtual_network.main \
  /subscriptions/xxx/resourceGroups/rg-epic-prod/providers/Microsoft.Network/virtualNetworks/vnet-epic-prod

# Generate configuration from import
terraform plan -generate-config-out=generated.tf

Azure Provider Patterns

Virtual Machine with Managed Identity

resource "azurerm_linux_virtual_machine" "app" {
  name                = "vm-${local.naming_prefix}"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  size                = var.vm_size

  # Use managed identity (no stored credentials)
  identity {
    type = "SystemAssigned"
  }

  admin_username                  = "azureuser"
  disable_password_authentication = true

  admin_ssh_key {
    username   = "azureuser"
    public_key = data.azurerm_key_vault_secret.ssh_public_key.value
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Premium_LRS"
    disk_size_gb         = 128
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }

  # Backup and disaster recovery
  boot_diagnostics {
    storage_account_uri = azurerm_storage_account.diag.primary_blob_endpoint
  }

  lifecycle {
    ignore_changes = [
      tags["CreatedDate"],  # Ignore auto-added tags
      source_image_reference[0].version  # Allow minor version updates
    ]
  }

  tags = merge(local.common_tags, {
    Role = "Application"
  })
}

# Grant managed identity access to Key Vault
resource "azurerm_role_assignment" "kv_access" {
  scope                = azurerm_key_vault.main.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_linux_virtual_machine.app.identity[0].principal_id
}

Network Security with NSG Rules

resource "azurerm_network_security_group" "app" {
  name                = "nsg-${local.naming_prefix}"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name

  # Inbound rules
  security_rule {
    name                       = "AllowHTTPS"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "443"
    source_address_prefix      = "10.0.0.0/8"  # Internal only
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "DenyAllInbound"
    priority                   = 4096
    direction                  = "Inbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  tags = local.common_tags
}

# Associate NSG with subnet
resource "azurerm_subnet_network_security_group_association" "app" {
  subnet_id                 = module.network.subnet_ids["app-tier"]
  network_security_group_id = azurerm_network_security_group.app.id
}

Azure Key Vault Integration

data "azurerm_client_config" "current" {}

resource "azurerm_key_vault" "main" {
  name                = "kv-${local.naming_prefix}"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  tenant_id           = data.azurerm_client_config.current.tenant_id
  sku_name            = "premium"

  # Network restrictions
  network_acls {
    bypass                     = "AzureServices"
    default_action             = "Deny"
    ip_rules                   = var.allowed_ip_ranges
    virtual_network_subnet_ids = [module.network.subnet_ids["app-tier"]]
  }

  # Soft delete and purge protection (compliance requirement)
  soft_delete_retention_days = 90
  purge_protection_enabled   = true

  # Enable Azure Monitor
  enable_rbac_authorization = true

  tags = local.common_tags
}

# Store secret
resource "azurerm_key_vault_secret" "db_password" {
  name         = "epic-db-password"
  value        = random_password.db.result
  key_vault_id = azurerm_key_vault.main.id

  lifecycle {
    ignore_changes = [value]  # Don't rotate on every apply
  }
}

Module Development

Module Inputs

# variables.tf
variable "resource_group_name" {
  description = "Name of the resource group"
  type        = string
}

variable "location" {
  description = "Azure region for resources"
  type        = string
  default     = "eastus"
}

variable "vnet_cidr" {
  description = "CIDR block for virtual network"
  type        = string

  validation {
    condition     = can(cidrhost(var.vnet_cidr, 0))
    error_message = "Must be a valid CIDR block."
  }
}

variable "subnets" {
  description = "Map of subnet configurations"
  type = map(object({
    cidr                     = string
    service_endpoints        = optional(list(string), [])
    private_endpoint_enabled = optional(bool, false)
  }))
}

variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  default     = {}
}

Module Outputs

# outputs.tf
output "vnet_id" {
  description = "ID of the created virtual network"
  value       = azurerm_virtual_network.main.id
}

output "vnet_name" {
  description = "Name of the created virtual network"
  value       = azurerm_virtual_network.main.name
}

output "subnet_ids" {
  description = "Map of subnet names to their IDs"
  value = {
    for k, v in azurerm_subnet.main : k => v.id
  }
}

output "subnet_cidrs" {
  description = "Map of subnet names to their CIDR blocks"
  value = {
    for k, v in azurerm_subnet.main : k => v.address_prefixes[0]
  }
}

Module README

# Azure Network Module

Creates an Azure Virtual Network with configurable subnets and security controls.

## Usage

```hcl
module "network" {
  source  = "app.terraform.io/optum-epic/network/azure"
  version = "~> 2.1"

  resource_group_name = "rg-epic-prod"
  location            = "eastus"
  vnet_cidr           = "10.0.0.0/16"

  subnets = {
    app = {
      cidr              = "10.0.1.0/24"
      service_endpoints = ["Microsoft.Storage"]
    }
  }

  tags = {
    Environment = "production"
  }
}
```

Requirements

NameVersion
terraform>= 1.5
azurerm>= 3.80

Inputs

NameDescriptionTypeDefaultRequired
resource_group_nameResource group namestringn/ayes
vnet_cidrVNet CIDR blockstringn/ayes

Outputs

NameDescription
vnet_idVirtual network ID
subnet_idsMap of subnet IDs

## Testing

### Terratest Example
```go
// tests/module_test.go
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestNetworkModule(t *testing.T) {
    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../examples/complete",
        Vars: map[string]interface{}{
            "resource_group_name": "rg-test-network",
            "vnet_cidr":           "10.0.0.0/16",
        },
    })

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify outputs
    vnetID := terraform.Output(t, terraformOptions, "vnet_id")
    assert.NotEmpty(t, vnetID)
}

Validation Commands

# Format code
terraform fmt -recursive

# Validate syntax
terraform validate

# Security scanning
tfsec .
checkov -d .

# Plan with variable file
terraform plan -var-file=terraform.tfvars -out=tfplan

# Show plan in JSON for analysis
terraform show -json tfplan | jq .

Common Patterns

Epic ODB Infrastructure

module "odb_infrastructure" {
  source  = "app.terraform.io/optum-epic/odb/azure"
  version = "~> 1.5"

  resource_group_name = "rg-epic-odb-prod"
  location            = "eastus"

  # ODB-specific configuration
  odb_instance_count = 2
  odb_vm_size        = "Standard_E32ds_v5"  # High memory for database
  odb_disk_size_gb   = 2048
  odb_disk_type      = "Premium_LRS"

  # Backup configuration
  backup_enabled           = true
  backup_retention_days    = 30
  snapshot_schedule        = "0 2 * * *"  # 2 AM daily

  # Network connectivity
  subnet_id              = module.network.subnet_ids["db-tier"]
  private_endpoint_subnet = module.network.subnet_ids["private-endpoints"]

  tags = merge(local.common_tags, {
    Application = "ODB"
    Criticality = "High"
  })
}

Citrix Infrastructure

module "citrix_vda" {
  source  = "app.terraform.io/optum-epic/citrix-vda/azure"
  version = "~> 1.2"

  resource_group_name = "rg-epic-citrix-prod"
  location            = "eastus"

  # Scale set for VDA instances
  instance_count = 50
  vm_size        = "Standard_D4s_v5"

  # Image from Packer
  source_image_id = data.azurerm_image.citrix_golden.id

  # Citrix-specific configuration
  delivery_controller_fqdn = "citrix-ddc.optum.com"
  machine_catalog_name     = "Epic Production VDAs"

  # Auto-scaling
  autoscale_enabled = true
  min_instances     = 20
  max_instances     = 100

  subnet_id = module.network.subnet_ids["citrix-vda"]

  tags = local.common_tags
}

Security Best Practices

No Hardcoded Secrets

# BAD: Hardcoded secret
variable "db_password" {
  default = "P@ssw0rd123!"  # NEVER DO THIS
}

# GOOD: Reference Key Vault
data "azurerm_key_vault_secret" "db_password" {
  name         = "db-password"
  key_vault_id = data.azurerm_key_vault.main.id
}

# BETTER: Generate and store
resource "random_password" "db" {
  length  = 32
  special = true
}

resource "azurerm_key_vault_secret" "db_password" {
  name         = "db-password"
  value        = random_password.db.result
  key_vault_id = azurerm_key_vault.main.id
}

Prevent Accidental Deletion

resource "azurerm_resource_group" "prod" {
  name     = "rg-epic-prod-001"
  location = "eastus"

  lifecycle {
    prevent_destroy = true  # Require manual removal from state
  }
}

Enable Diagnostic Logging

resource "azurerm_monitor_diagnostic_setting" "vm" {
  name                       = "diag-${azurerm_linux_virtual_machine.app.name}"
  target_resource_id         = azurerm_linux_virtual_machine.app.id
  log_analytics_workspace_id = data.azurerm_log_analytics_workspace.main.id

  enabled_log {
    category = "Administrative"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Troubleshooting

Common Errors

Error: Resource already exists

# Import existing resource instead of creating
terraform import azurerm_resource_group.main /subscriptions/xxx/resourceGroups/rg-epic-prod

Error: State lock timeout

# Force unlock (use with caution)
terraform force-unlock <lock-id>

Error: Provider version conflict

# Update lock file
terraform init -upgrade

# Verify provider versions
terraform version
terraform providers

Anti-Patterns

1. Using count for heterogeneous resources

Using count to create resources that differ by configuration leads to brittle index-based references. When items are added or removed from the middle of the list, Terraform destroys and recreates downstream resources.

# BAD: index-based — removing "staging" shifts all indices
variable "envs" { default = ["dev", "staging", "prod"] }
resource "azurerm_resource_group" "env" {
  count    = length(var.envs)
  name     = "rg-${var.envs[count.index]}"
  location = "eastus"
}

# GOOD: use for_each with a set — additions/removals are surgical
resource "azurerm_resource_group" "env" {
  for_each = toset(["dev", "staging", "prod"])
  name     = "rg-${each.key}"
  location = "eastus"
}

2. Storing .tfstate in the Git repository

State files contain secrets (passwords, keys, connection strings) in plaintext. Committing them exposes credentials and causes merge conflicts when multiple engineers run terraform apply.

Fix: Always use a remote backend (azurerm, s3, consul) with state locking enabled — see the State Management section above.

3. Pinning provider versions with >= only

An unconstrained upper bound (e.g., >= 3.80) allows a new major version to install silently, introducing breaking changes on the next terraform init.

# BAD: allows any future major version
required_providers { azurerm = { version = ">= 3.80" } }

# GOOD: constrain to current major
required_providers { azurerm = { version = "~> 3.80" } }

When to Apply This Skill

Use this skill for:

  • ✅ Infrastructure provisioning
  • ✅ Module development
  • ✅ State management
  • ✅ Azure resource deployment
  • ✅ Epic infrastructure automation
  • ✅ Multi-environment deployments
  • ✅ Infrastructure refactoring

Do not use for:

  • ❌ Configuration management (use Ansible skill)
  • ❌ Application deployment (use CI/CD pipelines)
  • ❌ Manual Azure Portal operations (automate with Terraform)

Resources

Related Assets