github-workflows-dojo360-terraform-ops
Terraform state management and troubleshooting operations including state lock resolution and force unlock capabilities
Terraform Operations Skill
Overview
The Terraform Operations workflow enables state management and troubleshooting for Terraform deployments across multi-cloud environments. This workflow is essential for resolving state lock issues that can occur when Terraform operations fail or are interrupted, leaving the state file locked and blocking subsequent operations.
Primary Use Cases:
- State Lock Resolution: Unlock Terraform state files when operations are interrupted
- Force Unlock: Forcefully remove locks when automatic resolution fails
- State Troubleshooting: Diagnose and resolve state file conflicts
- Emergency Recovery: Restore Terraform operations after failures
When to Use This Workflow:
- Terraform operations are blocked with "state locked" errors
- Previous Terraform runs failed and left state locks behind
- Need to manually intervene in state management
- Troubleshooting multi-environment deployment conflicts
Workflow Reference
Repository: dojo360/pipelines-workflows
Workflow: .github/workflows/terraform-ops.yml
Version: v2.0.0 (stable) or @beta (latest)
Documentation: terraform-ops/index.md
Key Features
State Lock Management
- Force Unlock: Remove persistent state locks using unique lock IDs
- Multi-Cloud Support: AWS (awsOptum, awsChc20), Azure (azureOptum), GCP
- Backend Compatibility: Works with azurerm, S3, GCS, and TFE backends
- Safe Operations: Requires explicit lock ID to prevent accidental unlocks
Enterprise Integration
- OIDC Authentication: Keyless authentication for AWS and Azure
- Metadata API Integration: Automatic configuration from Dojo360 metadata
- Runner Selection: Automatic runner assignment based on cloud-type
- Terraform Mirroring: Uses enterprise Terraform provider mirrors
State File Operations
- Lock Identification: Retrieve and identify specific lock IDs
- State Inspection: View current state lock status
- TFE Support: Compatible with Terraform Enterprise workspaces
- Backend Flexibility: Supports multiple backend configurations
Prerequisites
Before using this workflow, ensure:
-
Metadata API Onboarding
- Product onboarded to Dojo360 Metadata API
- OR local metadata file configured
-
OIDC Configuration (Required for OIDC cloud types)
- AWS: Configure AWS OIDC
- Azure: Configure Azure OIDC
-
Lock ID Identification
- Obtain the lock ID from error messages or state backend
- Lock ID format varies by backend type
-
Access Permissions
- Appropriate cloud permissions to modify state files
- GitHub repository permissions for workflows
Requirements
Terraform and Provider Versions
- Terraform: ~> 1.9.x (default: 1.9.2)
- AWS Provider: ~> 5.xx (for AWS operations)
- AzureRM Provider: ~> 3.xx (for Azure operations)
- GCP Provider: ~> 6.xx (for GCP operations)
GitHub Actions Permissions
permissions:
actions: read
contents: write
id-token: write # Required for OIDC authentication
pull-requests: write
security-events: write
Input Reference
Required Inputs
| Input | Description | Example |
|---|---|---|
aide-id | AideId from aide.optum.com for metadata | 12345 |
cloud-type | Target cloud platform (supported types) | awsOptum, azureOptum, gcp, awsChc20 |
domain | Domain name for metadata lookup | platform-engineering |
environment | Deployment environment (defines approval gates) | dev, qa, stage, prod |
lock-id | Unique identifier of the Terraform state lock to remove | abc123-def456-ghi789 |
team-name | Team name for metadata lookup | infrastructure-team |
Optional Inputs
| Input | Default | Description |
|---|---|---|
backend-type | azurerm | Backend type: azurerm, s3, gcs, tfe |
ref | HEAD | Branch, tag, or SHA to checkout |
remote-state-file-name | '' | Filename of remote state file (when not using env vars) |
runner-labels | '' | Comma-separated custom runner labels |
terraform-directory | . | Directory path relative to repo root containing Terraform code |
terraform-logging | off | Terraform logging level: off, trace, debug, info, warn, error |
terraform-provider-network-mirror | https://repo1.uhc.com/artifactory/api/terraform/terraform-virtual/providers/ | Terraform provider mirror URL |
terraform-version | 1.9.2 | Terraform version to use |
tfe-hostname | '' | Terraform Enterprise hostname |
tfe-organization | '' | Terraform Enterprise organization |
tfe-workspace | '' | Terraform Enterprise workspace name |
Backend-Specific Inputs
For AzureRM Backend:
- State configuration from metadata or explicit backend config
For S3 Backend:
- S3 bucket and key configuration from metadata
For GCS Backend:
- GCS bucket and prefix configuration from metadata
For TFE Backend:
- Requires
tfe-hostname,tfe-organization,tfe-workspace
Secrets Management
Required Secrets
| Secret | Description | Scope |
|---|---|---|
GH_TOKEN | Classic GitHub Personal Access Token (PAT) with SSO authorization | Repository or Organization |
GH_TOKEN Requirements:
- Must have SSO authorization to Dojo360 and your GitHub organization
- Minimum scopes:
repo(all)andworkflow - Used to read GitHub environment variables during workflow runs
Usage Examples
Example 1: Basic State Lock Removal (AWS)
name: Unlock Terraform State
on:
workflow_dispatch:
inputs:
lock-id:
description: 'Lock ID to remove'
required: true
type: string
environment:
description: 'Environment to unlock'
required: true
type: choice
options:
- dev
- qa
- stage
- prod
permissions:
actions: read
contents: write
id-token: write
pull-requests: write
security-events: write
jobs:
unlock-state:
runs-on: uhg-runner
uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
with:
# Required inputs
aide-id: '<change me>'
cloud-type: 'awsOptum'
domain: '<change me>'
environment: ${{ inputs.environment }}
lock-id: ${{ inputs.lock-id }}
team-name: '<change me>'
# Optional inputs
terraform-directory: 'terraform/infrastructure'
backend-type: 's3'
secrets:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
Example 2: Azure State Unlock with AzureRM Backend
name: Unlock Azure Terraform State
on:
workflow_dispatch:
inputs:
lock-id:
description: 'Lock ID from error message'
required: true
environment:
description: 'Environment'
required: true
default: 'dev'
permissions:
actions: read
contents: write
id-token: write
pull-requests: write
security-events: write
jobs:
azure-unlock:
runs-on: uhg-runner
uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
with:
aide-id: '<change me>'
cloud-type: 'azureOptum'
domain: '<change me>'
environment: ${{ inputs.environment }}
lock-id: ${{ inputs.lock-id }}
team-name: '<change me>'
# Azure-specific configuration
backend-type: 'azurerm'
terraform-directory: '.'
terraform-version: '1.9.2'
secrets:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
Example 3: GCP State Unlock with GCS Backend
name: Unlock GCP Terraform State
on:
workflow_dispatch:
inputs:
lock-id:
description: 'GCS state lock ID'
required: true
environment:
description: 'Target environment'
required: true
permissions:
actions: read
contents: write
id-token: write
pull-requests: write
security-events: write
jobs:
gcp-unlock:
runs-on: uhg-runner
uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
with:
aide-id: '<change me>'
cloud-type: 'gcp'
domain: '<change me>'
environment: ${{ inputs.environment }}
lock-id: ${{ inputs.lock-id }}
team-name: '<change me>'
backend-type: 'gcs'
terraform-directory: 'gcp/terraform'
secrets:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
Example 4: Terraform Enterprise (TFE) Workspace Unlock
name: Unlock TFE Workspace State
on:
workflow_dispatch:
inputs:
lock-id:
description: 'TFE lock identifier'
required: true
workspace:
description: 'TFE workspace name'
required: true
permissions:
actions: read
contents: write
id-token: write
pull-requests: write
security-events: write
jobs:
tfe-unlock:
runs-on: uhg-runner
uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
with:
aide-id: '<change me>'
cloud-type: 'awsOptum'
domain: '<change me>'
environment: 'prod'
lock-id: ${{ inputs.lock-id }}
team-name: '<change me>'
# TFE-specific configuration
backend-type: 'tfe'
tfe-hostname: 'app.terraform.io'
tfe-organization: '<change me>'
tfe-workspace: ${{ inputs.workspace }}
secrets:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
Example 5: Custom Runner Labels and Advanced Configuration
name: Advanced State Unlock
on:
workflow_dispatch:
inputs:
lock-id:
description: 'Lock ID'
required: true
environment:
description: 'Environment'
required: true
enable-logging:
description: 'Enable Terraform debug logging'
type: boolean
default: false
permissions:
actions: read
contents: write
id-token: write
pull-requests: write
security-events: write
jobs:
advanced-unlock:
runs-on: uhg-runner
uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
with:
aide-id: '<change me>'
cloud-type: 'awsOptum'
domain: '<change me>'
environment: ${{ inputs.environment }}
lock-id: ${{ inputs.lock-id }}
team-name: '<change me>'
# Advanced configuration
backend-type: 's3'
terraform-directory: 'infrastructure/terraform'
terraform-version: '1.9.5'
terraform-logging: ${{ inputs.enable-logging && 'debug' || 'off' }}
runner-labels: 'uhg-runner,large-runner'
terraform-provider-network-mirror: 'https://repo1.uhc.com/artifactory/api/terraform/terraform-virtual/providers/'
secrets:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
Example 6: Multi-Environment Lock Resolution with Conditional Logic
name: Emergency State Unlock
on:
workflow_dispatch:
inputs:
environment:
description: 'Environment to unlock'
required: true
type: choice
options:
- dev
- qa
- stage
- prod
lock-id:
description: 'Lock ID (from error message)'
required: true
confirm-unlock:
description: 'Type "CONFIRM" to proceed'
required: true
permissions:
actions: read
contents: write
id-token: write
pull-requests: write
security-events: write
jobs:
validate-confirmation:
runs-on: uhg-runner
outputs:
confirmed: ${{ steps.check.outputs.confirmed }}
steps:
- name: Validate Confirmation
id: check
run: |
if [ "${{ inputs.confirm-unlock }}" != "CONFIRM" ]; then
echo "β Confirmation failed. You must type 'CONFIRM' to unlock state."
exit 1
fi
echo "confirmed=true" >> $GITHUB_OUTPUT
emergency-unlock:
needs: validate-confirmation
runs-on: uhg-runner
environment: ${{ inputs.environment }}
uses: dojo360/pipelines-workflows/.github/workflows/[email protected]
with:
aide-id: '<change me>'
cloud-type: 'awsOptum'
domain: '<change me>'
environment: ${{ inputs.environment }}
lock-id: ${{ inputs.lock-id }}
team-name: '<change me>'
backend-type: 's3'
terraform-directory: '.'
secrets:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
notify-team:
needs: emergency-unlock
runs-on: uhg-runner
if: always()
steps:
- name: Notify Team
run: |
echo "π State unlock operation completed for ${{ inputs.environment }}"
echo "Lock ID: ${{ inputs.lock-id }}"
echo "Status: ${{ needs.emergency-unlock.result }}"
How to Obtain Lock ID
From Terraform Error Messages
When Terraform operations fail due to state locks, the error message includes the lock ID:
Error: Error acquiring the state lock
Error message: ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: abc123-def456-ghi789-jkl012
Path: terraform.tfstate
Operation: OperationTypePlan
Who: [email protected]
Version: 1.9.2
Created: 2026-01-16 10:30:00.000000 UTC
Info:
Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and try
again.
Copy the ID value from the error message: abc123-def456-ghi789-jkl012
From AWS S3 Backend
# List DynamoDB lock table entries
aws dynamodb scan \
--table-name terraform-state-lock \
--region us-east-1
# Query specific lock
aws dynamodb get-item \
--table-name terraform-state-lock \
--key '{"LockID": {"S": "my-state-file-md5"}}' \
--region us-east-1
From Azure Blob Storage
# Check blob lease status
az storage blob show \
--account-name <storage-account> \
--container-name <container> \
--name terraform.tfstate \
--query "properties.lease"
# Lease ID is the lock ID for Azure
From GCS Backend
# Check object metadata
gsutil stat gs://<bucket>/terraform.tfstate
# Look for lock metadata in object attributes
Best Practices
1. Verify Before Unlocking
Always verify that no Terraform operations are actively running before force unlocking:
- Check CI/CD pipelines for running jobs
- Confirm with team members that no one is actively deploying
- Review recent workflow runs in GitHub Actions
- Verify that the operation that created the lock has fully terminated
2. Use Confirmation Gates
Implement manual approval or confirmation steps:
environment: production # Requires manual approval
3. Document Lock Incidents
After unlocking state:
- Document the incident and root cause
- Update runbooks if recurring issue
- Review infrastructure logs for failures
- Consider implementing retry logic in Terraform
4. Backend-Specific Considerations
AWS S3 + DynamoDB:
- Ensure DynamoDB table exists and has correct permissions
- Verify S3 bucket access permissions
- Check DynamoDB table capacity
Azure Blob Storage:
- Verify storage account access
- Check blob lease status before unlocking
- Review Storage Account firewall rules
GCS:
- Confirm bucket permissions
- Verify service account has storage.objects.update permission
5. Production Safety
For production environments:
- Always require manual approval via GitHub environments
- Implement confirmation inputs (e.g., typing "CONFIRM")
- Use validation jobs before unlock operation
- Notify teams via Slack/Teams after unlock operations
- Log all unlock operations for audit purposes
6. Automation Guidelines
- Schedule regular state cleanup if locks persist frequently
- Implement timeout policies for Terraform operations
- Use consistent backend configurations across environments
- Monitor state lock metrics to identify patterns
7. Emergency Procedures
In case of critical production blocks:
- Verify lock ID from error message
- Confirm no active operations
- Use emergency unlock workflow with approval
- Validate state consistency after unlock
- Document incident for post-mortem
8. Lock ID Management
- Store lock IDs in incident tickets
- Maintain a log of all force unlocks
- Track lock patterns to identify infrastructure issues
- Correlate locks with deployment failures
Troubleshooting
Issue 1: Lock ID Not Found
Symptoms:
- Workflow completes but lock persists
- Error: "Lock ID does not match"
Solutions:
- Verify lock ID is copied correctly (no extra spaces)
- Check if lock has already been released
- Confirm backend type matches state file location
- Try retrieving lock ID again from error message
Issue 2: Permission Denied
Symptoms:
- Cannot access state backend
- OIDC authentication fails
Solutions:
- Verify cloud permissions for state management
- Check OIDC configuration for cloud-type
- Confirm IAM roles have state file access
- Review GitHub runner permissions
Issue 3: Backend Configuration Mismatch
Symptoms:
- Cannot find state file
- Backend initialization fails
Solutions:
- Verify
backend-typematches actual backend - Check
terraform-directorypath is correct - Confirm remote state configuration in metadata
- Review backend configuration in Terraform code
Issue 4: TFE Workspace Errors
Symptoms:
- Cannot connect to Terraform Enterprise
- Workspace not found
Solutions:
- Verify
tfe-hostnameis correct - Confirm
tfe-organizationandtfe-workspacenames - Check TFE API token permissions
- Review network connectivity to TFE
Issue 5: Multiple Environments Locked
Symptoms:
- Locks across multiple environments
- Cascading lock failures
Solutions:
- Identify root cause of initial failure
- Unlock environments in reverse deployment order (prod β stage β qa β dev)
- Review shared infrastructure dependencies
- Implement circuit breakers for multi-environment deployments
Related Workflows
- Infrastructure Deployment: Deploy infrastructure with Terraform
- Terraform Destroy: Safely tear down infrastructure
- Infrastructure Promotion: Multi-environment promotion
- Azure Infrastructure: Azure-specific infrastructure deployment
Support & Documentation
- Workflow Source: terraform-ops.yml
- Official Documentation: terraform-ops/index.md
- Sample Applications: pipelines-infrastructure-sample-apps
- CloudBricks Learning: Working with CloudBricks
- Supported Cloud Types: Cloud Types Guide
- Dojo360 Platform: dojo360.optum.com
Version: 1.0.0
Last Updated: January 16, 2026
Maintained By: Platform Engineering Team
Related Assets
github-workflows-dojo360-azure-infrastructure
Deploy Azure infrastructure using Terraform with PCAM vaulted access and native Azure authentication through Dojo360 Azure Infrastructure workflow
Owner: pcorazao
github-workflows-dojo360-container-cd
Deploy containerized applications to AWS ECS/Azure ACS using Dojo360 Container CD workflow with blue-green and rolling update strategies
Owner: pcorazao
github-workflows-dojo360-container-promotion
Multi-environment container deployment promotion through prescribed deployment paths with automated approval gates and E2E testing
Owner: pcorazao
github-workflows-dojo360-database
Automate database schema updates using Liquibase via the Dojo360 database workflow (with rollback and validation patterns)
Owner: pcorazao
github-workflows-dojo360-database-promotion
Promote Liquibase database changes across environments (devβqaβcertβprod) with deployment-path validation and approval gates
Owner: pcorazao
github-workflows-dojo360-dockerfile-ci
Build and scan container images from a Dockerfile using Optum golden images and the recommended UHG reusable workflow
Owner: pcorazao

