Terraform, ECS Fargate & RDS: Modernize Your AWS Infra

by Admin 55 views
Terraform, ECS Fargate & RDS: Modernize Your AWS Infra

Hey guys! Today, we're diving deep into a super important topic for anyone running applications on AWS: moving your infrastructure to Infrastructure as Code (IaC) using Terraform, and leveraging powerful managed services like ECS Fargate and RDS. If you're still manually clicking around in the AWS console or managing your own EC2 instances, you're in for a treat. We're going to ditch the manual hassle and embrace automation, scalability, and reliability. This isn't just about upgrading; it's about building a future-proof foundation for your applications. Let's get this party started!

Why Bother Migrating? The Pain of Manual Infrastructure

So, why are we even talking about this? Let's be real, managing infrastructure manually, especially with EC2 instances, is a pain. Think about it: you're spending ages setting up servers, configuring them, making sure they're patched, and then crossing your fingers that everything stays up. The current manual EC2 setup often leads to a bunch of headaches we're all too familiar with. First off, manual server management is a time sink. Every time you need to scale, update, or fix something, it's a hands-on process. And when it comes to automatic scaling, forget about it. Your app might get slammed with traffic, and your EC2 setup won't magically add more servers to handle the load. This often leads to a single point of failure. If that one server goes down, your whole application is toast. Plus, you're stuck with the joy of OS maintenance – patching, updates, security fixes – all on you. High availability and backups? Yep, that's manual too, which is prone to human error and downtime. And the biggest kicker? Your infrastructure isn't versioned. You can't easily go back to a previous state, track changes, or collaborate effectively because there's no code to review.

The Awesome Benefits of IaC with ECS Fargate and RDS

Now, let's talk about the shiny new world we're moving into. By adopting Infrastructure as Code (IaC) with Terraform, combined with ECS Fargate and RDS, we unlock a treasure trove of benefits. First and foremost, you get infrastructure as code. This means your entire setup is versioned, auditable, and perfectly reproducible. Imagine deploying the exact same environment across development, staging, and production with a single command! Automatic scalability is a game-changer, especially with ECS Fargate. It handles scaling your containers up and down based on demand, so you don't have to. Say goodbye to downtime worries because high availability is built-in, often with multi-AZ deployments. Your database is now managed with RDS, meaning automatic backups, patching, and failover. The best part? With ECS Fargate, you're dealing with serverless containers. No more server management! You pay only for what you use, leading to optimized costs. And finally, deploying new versions of your application becomes a breeze with zero-downtime deployments. It's a complete paradigm shift, folks!

Our Proposed Architecture: A Bird's-Eye View

Let's visualize the slick new setup we're aiming for. At the top, we have the Internet, hitting our Route 53 for DNS resolution. Optionally, we can add CloudFront for content delivery network (CDN) magic to speed things up. Then, the traffic flows into an Application Load Balancer (ALB), which intelligently distributes requests. From the ALB, traffic is directed to our ECS Services running on Fargate. We'll have separate services, perhaps one for the API (astro/api) and another for the web frontend (astro/web), both running as serverless containers. For asynchronous tasks, we'll have a celery-worker service. These containers can leverage ElastiCache (Redis) for caching and connect to our managed PostgreSQL database running on RDS. The entire network is defined within a VPC, with a clear subnet strategy: public subnets for the ALB, private subnets for the ECS tasks, and dedicated database subnets for RDS. This tiered approach ensures security and proper traffic flow. We're talking about a well-defined VPC CIDR block (10.0.0.0/16), split across multiple Availability Zones (AZs) for high availability. This architecture is designed for scalability, resilience, and ease of management, all driven by code.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Internet                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
              β”‚  Route 53   β”‚ (DNS)
              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
              β”‚ CloudFront  β”‚ (CDN - opcional)
              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Application Load     β”‚
         β”‚  Balancer (ALB)       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                        β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”              β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚  ECS    β”‚              β”‚  ECS   β”‚
    β”‚ Service β”‚              β”‚ Serviceβ”‚
    β”‚  API    β”‚              β”‚  Web   β”‚
    β”‚(Fargate)β”‚              β”‚(Fargate)β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
         β”‚                       β”‚
         β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
         └───►│  Redis   β”œβ”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚ElastiCacheβ”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         └───►│   RDS    β”‚
              β”‚PostgreSQLβ”‚
              β”‚ Multi-AZ β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

VPC: 10.0.0.0/16
β”œβ”€β”€ Public Subnets (2 AZs)
β”‚   β”œβ”€β”€ 10.0.1.0/24 (us-east-1a)
β”‚   └── 10.0.2.0/24 (us-east-1b)
β”œβ”€β”€ Private Subnets (2 AZs)
β”‚   β”œβ”€β”€ 10.0.10.0/24 (us-east-1a)
β”‚   └── 10.0.20.0/24 (us-east-1b)
└── Database Subnets (2 AZs)
    β”œβ”€β”€ 10.0.100.0/24 (us-east-1a)
    └── 10.0.200.0/24 (us-east-1b)

Terraform Project Structure: Organized for Success

To manage this complexity, a well-defined Terraform project structure is crucial. We're adopting a modular approach, separating concerns into reusable components. At the top level, we have environments/ where each subdirectory (dev, staging, production) holds the specific configuration for that environment, including main.tf, environment-specific variables (terraform.tfvars), and backend configuration (backend.tf) for storing Terraform state remotely (usually in an S3 bucket). This keeps our configurations DRY (Don't Repeat Yourself) and manageable.

Beneath environments/, we have the modules/ directory. This is where the magic of reusability happens. We've broken down our infrastructure into logical modules:

  • networking/: Handles everything VPC-related – VPC itself, subnets (public, private, database), Internet Gateway, NAT Gateways, route tables, and security groups.
  • ecs/: Manages the ECS cluster, Fargate services, task definitions, and auto-scaling configurations. This is where our containers live.
  • rds/: Provisions and configures our managed PostgreSQL database instances, including Multi-AZ deployments, backups, and storage settings.
  • elasticache/: Sets up our Redis cache clusters for faster data retrieval.
  • alb/: Configures the Application Load Balancer, including listeners, rules, and target groups for routing traffic.
  • ecr/: Creates and manages Amazon Elastic Container Registry (ECR) repositories for storing our Docker images.
  • secrets/: Integrates with AWS Secrets Manager to securely store sensitive information like database credentials and API keys.
  • cloudwatch/: Sets up logging, metrics, and alarms for monitoring our infrastructure's health and performance.

Finally, we have a scripts/ directory for handy utility scripts like deploy.sh, destroy.sh, and plan.sh to streamline common Terraform operations. This organized structure not only makes our Terraform code maintainable and scalable but also promotes collaboration among team members.

terraform/
β”œβ”€β”€ environments/
β”‚   β”œβ”€β”€ dev/
β”‚   β”‚   β”œβ”€β”€ main.tf
β”‚   β”‚   β”œβ”€β”€ terraform.tfvars
β”‚   β”‚   └── backend.tf
β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   β”œβ”€β”€ main.tf
β”‚   β”‚   β”œβ”€β”€ terraform.tfvars
β”‚   β”‚   └── backend.tf
β”‚   └── production/
β”‚       β”œβ”€β”€ main.tf
β”‚       β”œβ”€β”€ terraform.tfvars
β”‚       └── backend.tf
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ networking/
β”‚   β”‚   β”œβ”€β”€ main.tf        # VPC, Subnets, IGW, NAT
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   β”œβ”€β”€ ecs/
β”‚   β”‚   β”œβ”€β”€ main.tf        # ECS Cluster, Services, Tasks
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   β”œβ”€β”€ rds/
β”‚   β”‚   β”œβ”€β”€ main.tf        # RDS PostgreSQL
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   β”œβ”€β”€ elasticache/
β”‚   β”‚   β”œβ”€β”€ main.tf        # Redis cluster
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   β”œβ”€β”€ alb/
β”‚   β”‚   β”œβ”€β”€ main.tf        # Application Load Balancer
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   β”œβ”€β”€ ecr/
β”‚   β”‚   β”œβ”€β”€ main.tf        # Container Registry
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   β”œβ”€β”€ secrets/
β”‚   β”‚   β”œβ”€β”€ main.tf        # Secrets Manager
β”‚   β”‚   β”œβ”€β”€ variables.tf
β”‚   β”‚   └── outputs.tf
β”‚   └── cloudwatch/
β”‚       β”œβ”€β”€ main.tf        # Logs, Metrics, Alarms
β”‚       β”œβ”€β”€ variables.tf
β”‚       └── outputs.tf
└── scripts/
    β”œβ”€β”€ deploy.sh
    β”œβ”€β”€ destroy.sh
    └── plan.sh

Key Infrastructure Components: The Building Blocks

Let's break down the essential pieces of our infrastructure, guys. Each component plays a vital role in delivering a scalable, reliable, and secure application. We're using Terraform modules to define and manage these resources, ensuring consistency and reusability across different environments.

1. Networking (VPC): The Foundation

Our entire AWS infrastructure will reside within a Virtual Private Cloud (VPC). This provides a logically isolated network space. We'll define a clear IP addressing scheme, like 10.0.0.0/16, and divide it into specific subnets. We need public subnets for resources that need direct internet access, such as our Application Load Balancer (ALB). Then, we have private subnets where our ECS Fargate tasks will run – these shouldn't be directly accessible from the internet. Finally, database subnets are designated for our RDS instance, ensuring it's isolated and secure. We'll configure an Internet Gateway for the VPC, NAT Gateways to allow instances in private subnets to access the internet for outbound traffic (like pulling dependencies), and robust Route Tables to control traffic flow. Security Groups will act as virtual firewalls, controlling inbound and outbound traffic to our resources. The networking module handles all of this, making it easy to spin up a secure and well-structured network.

module "vpc" {
  source = "./modules/networking"
  
  project_name = "astro-natal-chart"
  environment  = var.environment
  
  vpc_cidr             = "10.0.0.0/16"
  availability_zones   = ["us-east-1a", "us-east-1b"]
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnet_cidrs = ["10.0.10.0/24", "10.0.20.0/24"]
  database_subnet_cidrs = ["10.0.100.0/24", "10.0.200.0/24"]
  
  enable_nat_gateway = true
  single_nat_gateway = var.environment == "dev" ? true : false
}

Resources created:

  • VPC with CIDR 10.0.0.0/16
  • 2 Public Subnets (for ALB)
  • 2 Private Subnets (for ECS tasks)
  • 2 Database Subnets (for RDS)
  • Internet Gateway
  • NAT Gateway(s)
  • Route Tables
  • Security Groups

2. ECR (Container Registry): Your Image Hub

Before we can run our applications in containers, we need a place to store those container images. That's where Amazon Elastic Container Registry (ECR) comes in. Terraform will set up ECR repositories for each of our services (e.g., astro/api, astro/web, astro/celery-worker). We'll configure image tag mutability and enable scan on push for security vulnerability detection. We can also implement lifecycle policies to automatically clean up old, unused images, keeping our registry tidy and cost-effective. This module ensures our Docker images are securely stored and accessible by ECS.

module "ecr" {
  source = "./modules/ecr"
  
  repositories = [
    "astro/api",
    "astro/web",
    "astro/celery-worker"
  ]
  
  image_tag_mutability = "MUTABLE"
  scan_on_push         = true
  lifecycle_policy     = {
    keep_last_n_images = 10
  }
}

3. RDS PostgreSQL: Managed Database Power

Moving our database to Amazon RDS (Relational Database Service) is a massive win. We're opting for PostgreSQL, a robust and popular open-source database. Terraform will provision a managed PostgreSQL instance. For production environments, we'll enable Multi-AZ deployments for high availability and automatic failover. We'll configure automatic backups with a retention period (e.g., 7 days for production) and set maintenance windows for applying patches with minimal disruption. Encryption at rest will be enabled to protect sensitive data. We'll also configure storage auto-scaling so the database can grow as needed. The instance will be placed in our private database subnets and associated with a strict security group. Performance Insights can also be enabled for better database performance monitoring. This takes a huge burden off our shoulders compared to managing self-hosted databases on EC2.

module "rds" {
  source = "./modules/rds"
  
  identifier     = "astro-postgres-${var.environment}"
  engine_version = "16.1"
  instance_class = var.environment == "production" ? "db.t4g.medium" : "db.t4g.micro"
  
  allocated_storage     = 20
  max_allocated_storage = 100
  storage_encrypted     = true
  
  database_name = "astro_${var.environment}"
  master_username = "astro_admin"
  
  multi_az               = var.environment == "production" ? true : false
  backup_retention_period = var.environment == "production" ? 7 : 1
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  deletion_protection = var.environment == "production" ? true : false
  skip_final_snapshot = var.environment != "production"
  
  vpc_security_group_ids = [module.vpc.rds_security_group_id]
  db_subnet_group_name   = module.vpc.database_subnet_group_name
}

ConfiguraΓ§Γ΅es:

  • PostgreSQL 16
  • Multi-AZ for production (HA)
  • Automatic backups (7 days in production)
  • Encryption at rest
  • Storage auto-scaling
  • Performance Insights enabled

4. ElastiCache Redis: Blazing Fast Caching

To improve application performance and reduce load on our database, we'll implement Amazon ElastiCache for Redis. Redis is an incredibly fast, in-memory data store often used for caching, session management, and real-time use cases. Terraform will provision a Redis cluster, configured with the appropriate node type and number of nodes. For production, we'll enable automatic failover and multi-AZ deployments to ensure high availability for our cache. The Redis cluster will be placed within our private network and secured with a dedicated security group. This module allows us to easily add a caching layer to our architecture.

module "redis" {
  source = "./modules/elasticache"
  
  cluster_id           = "astro-redis-${var.environment}"
  engine_version       = "7.0"
  node_type            = var.environment == "production" ? "cache.t4g.medium" : "cache.t4g.micro"
  num_cache_nodes      = var.environment == "production" ? 2 : 1
  
  parameter_group_family = "redis7"
  port                   = 6379
  
  subnet_group_name      = module.vpc.elasticache_subnet_group_name
  security_group_ids     = [module.vpc.redis_security_group_id]
  
  automatic_failover_enabled = var.environment == "production"
  multi_az_enabled           = var.environment == "production"
}

5. ECS Fargate Cluster: Serverless Containers

This is where our applications will actually run! Amazon Elastic Container Service (ECS) with Fargate is our chosen platform. Fargate is a serverless compute engine for containers, meaning we don't have to manage underlying EC2 instances. Terraform will set up the ECS Cluster and define our services (api, web, celery-worker). For each service, we'll specify the Docker image from ECR, CPU and memory requirements, desired number of tasks, and importantly, environment variables and secrets. We'll integrate with AWS Secrets Manager to securely inject sensitive data like database credentials into our containers. Autoscaling is configured here, defining minimum and maximum capacities and target utilization metrics (CPU and memory) to automatically adjust the number of running tasks. This module is the heart of our containerized deployment.

module "ecs" {
  source = "./modules/ecs"
  
  cluster_name = "astro-cluster-${var.environment}"
  
  services = {
    api = {
      name             = "astro-api"
      image            = "${module.ecr.repository_urls["astro/api"]}:latest"
      cpu              = var.environment == "production" ? 512 : 256
      memory           = var.environment == "production" ? 1024 : 512
      desired_count    = var.environment == "production" ? 2 : 1
      container_port   = 8000
      health_check_path = "/health"
      
      environment_variables = {
        ENVIRONMENT = var.environment
      }
      
      secrets = {
        DATABASE_URL    = "${module.secrets.secret_arns["database_url"]}"
        SECRET_KEY      = "${module.secrets.secret_arns["secret_key"]}"
        REDIS_URL       = "${module.secrets.secret_arns["redis_url"]}"
        GOOGLE_CLIENT_ID = "${module.secrets.secret_arns["google_client_id"]}"
        # ... outros secrets
      }
      
      autoscaling = {
        min_capacity = var.environment == "production" ? 2 : 1
        max_capacity = var.environment == "production" ? 10 : 2
        target_cpu_utilization    = 70
        target_memory_utilization = 80
      }
    }
    
    web = {
      name             = "astro-web"
      image            = "${module.ecr.repository_urls["astro/web"]}:latest"
      cpu              = 256
      memory           = 512
      desired_count    = var.environment == "production" ? 2 : 1
      container_port   = 80
      health_check_path = "/"
    }
    
    celery = {
      name             = "astro-celery-worker"
      image            = "${module.ecr.repository_urls["astro/celery-worker"]}:latest"
      cpu              = 256
      memory           = 512
      desired_count    = 1
      # Sem load balancer (worker assΓ­ncrono)
    }
  }
  
  vpc_id              = module.vpc.vpc_id
  private_subnet_ids  = module.vpc.private_subnet_ids
  alb_target_group_arns = module.alb.target_group_arns
}

6. Application Load Balancer: Traffic Director

Our Application Load Balancer (ALB) acts as the single entry point for external traffic. Terraform will configure the ALB within our public subnets, associate it with our VPC, and set up listeners for HTTP and HTTPS. Crucially, we'll configure SSL/TLS termination using an ACM certificate for secure HTTPS connections. The ALB will have target groups pointing to our ECS services (API and Web). Listener rules will define how traffic is routed based on path patterns (e.g., /api/* goes to the API service, / goes to the web service). We'll also set up a rule to redirect HTTP traffic to HTTPS for enhanced security. Health checks are configured for each target group to ensure the ALB only sends traffic to healthy instances.

module "alb" {
  source = "./modules/alb"
  
  name               = "astro-alb-${var.environment}"
  vpc_id             = module.vpc.vpc_id
  public_subnet_ids  = module.vpc.public_subnet_ids
  
  # Certificado SSL para HTTPS
  certificate_arn    = var.acm_certificate_arn
  
  target_groups = {
    api = {
      port              = 8000
      protocol          = "HTTP"
      health_check_path = "/health"
      health_check_interval = 30
      health_check_timeout  = 5
      healthy_threshold     = 2
      unhealthy_threshold   = 3
    }
    
    web = {
      port              = 80
      protocol          = "HTTP"
      health_check_path = "/"
    }
  }
  
  listeners = [
    {
      port     = 443
      protocol = "HTTPS"
      default_action = "forward to web"
      
      rules = [
        {
          path_pattern = "/api/*"
          target_group = "api"
        },
        {
          path_pattern = "/docs*"
          target_group = "api"
        }
      ]
    },
    {
      port     = 80
      protocol = "HTTP"
      default_action = "redirect to HTTPS"
    }
  ]
}

7. Secrets Manager: Secure Credential Handling

Storing sensitive information like database passwords, API keys, and secret keys directly in your Terraform code or environment variables is a big no-no. AWS Secrets Manager provides a secure and centralized way to manage these secrets. Terraform will create secrets within Secrets Manager and store them securely. Our ECS tasks will then reference these secrets via their ARNs, and AWS will inject them as environment variables into the running containers. This ensures that sensitive data is never exposed in code repositories or logs, significantly enhancing our security posture.

module "secrets" {
  source = "./modules/secrets"
  
  secrets = {
    database_url = {
      description = "PostgreSQL connection string"
      secret_string = "postgresql+asyncpg://${module.rds.master_username}:${var.db_password}@${module.rds.endpoint}/${module.rds.database_name}"
    }
    
    redis_url = {
      description = "Redis connection string"
      secret_string = "redis://${module.redis.primary_endpoint_address}:6379/0"
    }
    
    secret_key = {
      description = "JWT secret key"
      secret_string = var.secret_key
    }
    
    google_client_secret = {
      description = "Google OAuth client secret"
      secret_string = var.google_client_secret
    }
    
    # ... outros secrets
  }
}

8. CloudWatch Monitoring: Keeping an Eye on Things

What good is a robust infrastructure if you don't know if it's healthy? AWS CloudWatch is our go-to for monitoring. Terraform will configure log groups for our ECS services, ensuring that application logs are captured and retained for a specified period (longer for production). We'll also define alarms based on key metrics like CPU utilization, memory usage, database connections, and ALB request counts (e.g., 5xx errors). These alarms can trigger actions, such as sending notifications to an SNS topic, allowing us to proactively respond to issues before they impact users.

module "cloudwatch" {
  source = "./modules/cloudwatch"
  
  log_groups = [
    "/ecs/astro-api",
    "/ecs/astro-web",
    "/ecs/astro-celery-worker"
  ]
  
  retention_in_days = var.environment == "production" ? 30 : 7
  
  alarms = {
    api_high_cpu = {
      metric_name         = "CPUUtilization"
      comparison_operator = "GreaterThanThreshold"
      threshold           = 80
      evaluation_periods  = 2
      alarm_actions       = [var.sns_topic_arn]
    }
    
    rds_high_connections = {
      metric_name         = "DatabaseConnections"
      comparison_operator = "GreaterThanThreshold"
      threshold           = 80
      evaluation_periods  = 2
    }
    
    alb_5xx_errors = {
      metric_name         = "HTTPCode_Target_5XX_Count"
      comparison_operator = "GreaterThanThreshold"
      threshold           = 10
      evaluation_periods  = 1
    }
  }
}

Automating Everything: GitHub Actions Workflows

To truly embrace IaC and DevOps, we need automation. GitHub Actions will be our CI/CD engine. We'll set up several workflows:

1. Terraform Plan Workflow (terraform-plan.yml)

This workflow runs automatically on pull requests targeting our Terraform code. It performs terraform init and terraform plan, generating an execution plan. The plan output is then commented directly on the pull request, allowing developers to review the proposed infrastructure changes before they are merged. This is a crucial step for preventing unintended changes and ensuring code quality.

name: Terraform Plan

on: 
  pull_request:
    paths:
      - 'terraform/**'

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      
      - name: Terraform Init
        run: |
          cd terraform/environments/${{ matrix.environment }}
          terraform init
        
      - name: Terraform Plan
        run: |
          cd terraform/environments/${{ matrix.environment }}
          terraform plan -out=tfplan
      
      - name: Comment PR
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan πŸ“–
            ```
            ${process.env.PLAN}
            ```
            `;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

2. Deploy Infrastructure Workflow (deploy-infrastructure.yml)

This workflow handles the actual deployment of our infrastructure. It can be triggered manually (workflow_dispatch) or on a push to the main branch. It performs terraform init and then terraform apply -auto-approve for the selected environment (dev, staging, or production). This ensures that our infrastructure is consistently provisioned and managed through code.

name: Deploy Infrastructure

on:
  push:
    branches: [main]
    paths:
      - 'terraform/**'
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy'
        required: true
        type: choice
        options:
          - dev
          - staging
          - production

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      
      - name: Terraform Apply
        run: |
          cd terraform/environments/${{ inputs.environment || 'dev' }}
          terraform init
          terraform apply -auto-approve

3. Deploy Application Workflow (deploy-application.yml)

This workflow focuses on deploying our application code. It consists of two main jobs:

  • build-and-push: Checks out the code, logs into ECR, builds the Docker images for our services (API, Web), tags them with the Git commit SHA and latest, and pushes them to ECR.
  • deploy: After the images are pushed, this job triggers an ECS service update using the AWS CLI. It forces a new deployment for the astro-api and astro-web services, causing ECS to pull the new images and deploy them. A wait step ensures the deployment completes successfully before the workflow finishes. This workflow automates the entire application deployment pipeline.
name: Deploy Application

on:
  push:
    branches: [main]
  workflow_dispatch:

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
      
      - name: Build and push API image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          ECR_REPOSITORY: astro/api
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest -f apps/api/Dockerfile apps/api
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
      
      - name: Build and push Web image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          ECR_REPOSITORY: astro/web
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest -f apps/web/Dockerfile apps/web
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
  
  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to ECS
        run: |
          aws ecs update-service \
            --cluster astro-cluster-production \
            --service astro-api \
            --force-new-deployment
          
          aws ecs update-service \
            --cluster astro-cluster-production \
            --service astro-web \
            --force-new-deployment
      
      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster astro-cluster-production \
            --services astro-api astro-web

Estimated Costs: Budgeting for Your Infrastructure

Let's talk about the elephant in the room: cost. It's essential to have a realistic estimate of what this new infrastructure will cost. We've broken down the estimated monthly costs for both production and development/staging environments.

Production (High Availability Focus)

For the production environment, where high availability and performance are critical, the costs are higher but justifiable. This includes:

  • ECS Fargate (API & Web): Running multiple tasks across different AZs for redundancy. ~$30/month for API (2 tasks @ 0.5 vCPU, 1GB) + ~$15/month for Web (2 tasks @ 0.25 vCPU, 0.5GB) = ~$45/month.
  • RDS PostgreSQL (db.t4g.medium Multi-AZ): A robust, highly available database instance. ~$85/month.
  • ElastiCache Redis (cache.t4g.medium): In-memory caching for performance. ~$35/month.
  • ALB: The load balancer itself. ~$20/month.
  • NAT Gateway (2x): Required for outbound internet access from private subnets in multiple AZs. ~$65/month.
  • Data Transfer: Estimated traffic costs. ~$10/month.
  • CloudWatch Logs & Metrics: For monitoring and logging. ~$5/month.
  • Secrets Manager: Securely storing secrets. ~$5/month.

Total Estimated Production Cost: ~$270/month

Dev/Staging (Cost-Optimized)

For development and staging environments, we can significantly reduce costs by using smaller instance types, fewer tasks, and single-AZ deployments where appropriate.

  • ECS Fargate (API & Web): Single tasks, smaller resources. ~$8/month for API (1 task @ 0.25 vCPU, 0.5GB) + ~$8/month for Web (1 task @ 0.25 vCPU, 0.5GB) = ~$16/month.
  • RDS PostgreSQL (db.t4g.micro Single-AZ): A smaller, non-HA database instance. ~$15/month.
  • ElastiCache Redis (cache.t4g.micro): A smaller cache instance. ~$12/month.
  • ALB: Still needed for traffic management. ~$20/month.
  • NAT Gateway (1x): A single NAT Gateway is usually sufficient for non-production. ~$32/month.

Total Estimated Dev/Staging Cost: ~$95/month

Note: These are estimates and actual costs may vary based on usage, specific configurations, and AWS pricing changes. Always check the latest AWS pricing.

EC2 vs. ECS Fargate: A Quick Comparison

To put things in perspective, let's compare our new approach with the old manual EC2 setup.

Feature EC2 (Manual) ECS Fargate (IaC) Winner (for most apps)
Cost/Month $40-50 (can vary wildly) $95 (dev) / $270 (prod) EC2 (initially)
Management Manual (OS, patching, scaling) Fully Managed (Serverless) ECS Fargate
Scalability Manual, slow, error-prone Automatic, rapid, reliable ECS Fargate
High Availability Manual setup required, complex Built-in (Multi-AZ) ECS Fargate
Database Backup Manual, risky Automatic, reliable ECS Fargate
Infrastructure as Code No (or difficult to implement) Yes (Terraform) ECS Fargate
Zero-Downtime Deploy No (or very complex) Yes ECS Fargate
Complexity Low (initially) Medium-High (setup phase) EC2 (initial setup)

While EC2 might seem cheaper initially for very simple setups, the operational overhead, lack of scalability, and manual effort quickly outweigh the cost savings. ECS Fargate, powered by IaC, offers a superior long-term solution for most modern applications.

Implementation Tasks: Your Roadmap to Success

Migrating to this new architecture involves several steps. We've outlined a phased approach to make it manageable:

Phase 1: Initial Setup

  • [ ] Create AWS Account (if needed)
  • [ ] Configure AWS CLI locally
  • [ ] Create S3 bucket for Terraform state
  • [ ] Create DynamoDB table for state locking
  • [ ] Configure GitHub Secrets (AWS credentials)

Phase 2: Terraform Core Modules

  • [ ] Set up Terraform project structure
  • [ ] Implement the networking module (VPC, subnets, etc.)
  • [ ] Implement the rds module
  • [ ] Implement the elasticache module
  • [ ] Implement the ecr module
  • [ ] Implement the ecs module
  • [ ] Implement the alb module
  • [ ] Implement the secrets module
  • [ ] Implement the cloudwatch module

Phase 3: Environment Configurations

  • [ ] Create configuration for dev environment
  • [ ] Create configuration for staging environment
  • [ ] Create configuration for production environment
  • [ ] Test provisioning in the dev environment

Phase 4: CI/CD Pipelines

  • [ ] Create Terraform Plan workflow (for PRs)
  • [ ] Create Terraform Apply workflow (for deployments)
  • [ ] Create Build & Push ECR workflow
  • [ ] Create Deploy ECS workflow
  • [ ] Test the complete pipeline

Phase 5: Security & Compliance

  • [ ] Configure SSL/TLS (ACM Certificate)
  • [ ] Implement WAF (optional, for advanced security)
  • [ ] Configure restrictive Security Groups
  • [ ] Enable encryption at rest (RDS, S3)
  • [ ] Configure VPC Flow Logs
  • [ ] Implement AWS Config for compliance monitoring

Phase 6: Monitoring & Alerting

  • [ ] Set up CloudWatch Dashboards
  • [ ] Create critical alarms
  • [ ] Configure SNS for notifications
  • [ ] Implement X-Ray for distributed tracing (optional)
  • [ ] Integrate alerts with PagerDuty/Slack

Phase 7: Documentation

  • [ ] Document the architecture
  • [ ] Create operational runbooks
  • [ ] Document disaster recovery procedures
  • [ ] Create a troubleshooting guide

Priority and Dependencies

This migration is marked as Low-Medium priority. It's an enhancement that builds upon existing foundations. A key dependency is that Issue #28 (likely the current EC2 setup) should be completed or will be directly replaced by this IaC approach. We also need budget approval for the estimated costs (around $270/month for production) and a clear decision on whether to proceed with the robust ECS solution versus sticking with a simpler EC2 setup.

Labels

devops, infrastructure, terraform, iac, aws, enhancement

References

And there you have it, guys! A comprehensive plan to transform your AWS infrastructure using Terraform, ECS Fargate, and RDS. It's a significant undertaking, but the benefits in terms of automation, scalability, and reliability are immense. Happy coding and happy deploying!