Overview#
Built an automated IAM policy lifecycle management system for a multi-account AWS environment supporting 200+ engineers. The framework enforces least-privilege access, automates policy drift detection, and reduces manual IAM management overhead by 80%.
Problem#
The organization struggled with IAM at scale:
- Policy Drift: Manual policy changes across 50+ AWS accounts led to inconsistent access controls
- Over-Permissioned Roles: Lack of automated access reviews resulted in privilege creep
- Compliance Gaps: Unable to demonstrate least-privilege adherence for SOC 2 audit
- Operational Overhead: IAM teams spent 60% of time on manual policy reviews and updates
Solution#
Designed and implemented a centralized IAM automation framework with four key components:
- Policy-as-Code Repository: Centralized Terraform modules defining all IAM roles, policies, and permission boundaries
- Automated Drift Detection: AWS Config rules monitoring for manual IAM changes with auto-remediation
- Least-Privilege Analyzer: Python service analyzing CloudTrail logs to identify unused permissions and recommend policy tightening
- Multi-Account Deployment: CI/CD pipeline deploying IAM changes across AWS Organizations with approval gates
Architecture#
Core Components:
- Terraform Cloud for state management and multi-account deployment
- AWS Organizations for centralized account management
- AWS Config for compliance monitoring and drift detection
- Lambda functions for automated remediation
- DynamoDB for tracking policy versions and approvals
- SNS for alerting on policy violations
Data Flow:
- Engineers submit IAM policy changes via GitLab MR
- Terraform Cloud runs plan showing impact across all accounts
- Security team reviews and approves changes
- CI/CD pipeline deploys to non-prod → staging → production
- AWS Config monitors for drift and triggers auto-remediation
- CloudTrail logs feed into analyzer for least-privilege recommendations
- Monthly reports generated showing unused permissions per role
Security Controls#
IAM & Access:
- Service Control Policies (SCPs) preventing manual IAM modifications
- Permission boundaries on all developer roles
- Cross-account roles with MFA enforcement
- Automated credential rotation for service accounts
Logging & Monitoring:
- CloudTrail enabled on all accounts with log file validation
- AWS Config recording all IAM resource changes
- CloudWatch alarms for privilege escalation attempts
- Security Hub aggregating findings across accounts
Network:
- VPC endpoints for AWS Config/CloudTrail to avoid internet egress
- Private subnets for Lambda remediation functions
Secrets:
- Terraform Cloud workspace variables for credentials
- AWS Secrets Manager for API keys with rotation
- No hardcoded credentials in code repository
CI/CD:
- Branch protection requiring security team approval
- Automated policy linting (cfn-nag, checkov)
- Terraform plan must pass security checks before apply
- Rollback procedures for failed deployments
Monitoring:
- Real-time alerting on policy changes
- Dashboard showing policy compliance scores
- Automated weekly access review reports
Results#
Measurable Outcomes:
- ✅ 80% reduction in manual IAM management time (example metric: from 30 hours/week to 6 hours/week)
- ✅ 95% policy compliance rate across all accounts (up from 62%)
- ✅ Zero manual IAM changes detected in production for 6 months
- ✅ 40% reduction in over-permissioned roles (identified 1,200 unused permissions)
- ✅ SOC 2 audit pass with zero IAM-related findings
- ✅ 3-hour average policy deployment time (down from 2 weeks)
Business Impact:
- Enabled rapid onboarding of new services while maintaining security posture
- Reduced compliance audit preparation time by 70%
- Improved engineer satisfaction (self-service IAM requests via GitLab)
Lessons Learned#
What Worked Well:
- Starting with a pilot account before multi-account rollout reduced risk
- Automated testing caught 90% of policy errors before production
- Permission boundaries prevented accidental privilege escalation
- Weekly reports created visibility and drove adoption
Challenges:
- Legacy applications required custom policy migration paths
- Change management resistance from teams used to manual IAM
- Initial CloudTrail log analysis required tuning to reduce noise
- Third-party integrations needed special handling for cross-account roles
Would Do Differently:
- Implement gradual rollout per team vs. big-bang migration
- Build self-service portal earlier to improve developer experience
- Add more granular RBAC for policy approval workflows
- Include cost analysis for IAM API calls in initial design
Tech Stack#
- IaC: Terraform, Terragrunt
- Cloud: AWS (IAM, Organizations, Config, CloudTrail, Lambda, DynamoDB, SNS, CloudWatch)
- Languages: Python (boto3), Bash
- CI/CD: GitLab CI, Terraform Cloud
- Security: AWS Security Hub, cfn-nag, checkov, tfsec
- Monitoring: Datadog, CloudWatch Dashboards
Links#
- GitHub Repository (public modules and documentation)
- Technical Blog Post (implementation patterns)