Ephemeral environments for serverless applications in AWS using Terraform and GitHub Actions

Updated: Aug 29

Problem statement

Many organizations try to achieve higher development release velocity, especially in a competitive environment. Moreover, in a big team, where many developers work on various features in parallel, release process acceleration (even for a few minutes) saves a considerable amount of time.

One more important thing is the cost of the environment. Cloud allows us to provision virtually unlimited resources and have as many environments as we need, but we should be careful and terminate everything that is not required anymore to keep costs to a minimum.


Solution overview

The application is a "classic" serverless, containing many Lambda functions, API Gateway with various resources and methods, SQS queues, and S3 buckets.


The first requirement was to use the same tools as was before:

- Terraform

- Terragrunt

- GitHub Actions


The second requirement is to deploy as fast as possible. Here we need to think about the time that every AWS resource takes to be provisioned and cross-resource dependencies because some resources obviously can not be deployed in parallel. Amazon Aurora required existing VPC, Lambda functions, in this case, depend on VPC and Aurora. VPC (3-4 minutes) and Aurora (10-15 minutes) are the most time-consuming resources in provisioning via Terraform.


The third requirement is to provision a new "ephemeral environment" for every pull request into the "master" branch and terminate it once the pull request is closed.


In the initial design, we decided to split all resources logically into "shared" and "personal". The shared environment contains the most expensive and time-consuming resources such as VPC and Aurora, plus a common S3 bucket and Secrets Manager (storing Aurora secrets and more). The personal environment consists of Lambda functions because their code is continuously changing in every pull request, API Gateway, because every developer needs his/her personal endpoint for testing, and SQS/S3 that doesn't cost a lot and can be created and removed pretty fast. Lambdas will be deployed into shared VPC and connect to the shared Aurora cluster with credentials from Secrets Manager.

A structure of directories in the Github repository is as follows:

- .github/workflows/ contains pipelines definition for GitHub Actions.

- environments/shared-dev/ contains terraform files that call modules for VPC and Aurora.

- environments/dev/ contains terraform files that call modules for serverless applications.

- modules/ contains all terraform modules.

- src/ contains packages for serverless applications that appear there after CI steps.

- terragrunt.hlc file is the core of logic because it allows generating TF state dynamically and independently for every ephemeral environment. This file is included from shared-dev and dev.

Here is an example of pipelines for environments deploy and destroy:

name: Pull Request
on:
  pull_request:
    branches: [ master ]

permissions:  
  id-token: write  
  contents: read

jobs:
  pull_request:
    runs-on: ubuntu-latest

    env:
      AWS_DEFAULT_REGION: us-east-1

    steps:
    - name: Install Terraform
      uses: hashicorp/setup-terraform@v1
      with:
        terraform_version: 1.1.2

    - name: Install Terragrunt
      uses: supplypike/setup-bin@v1
      with:
        uri: 'https://github.com/gruntwork-io/terragrunt/releases/download/v0.35.16/terragrunt_linux_amd64'
        name: 'terragrunt'
        version: '0.35.16'

    - name: Configure aws credentials      
      uses: aws-actions/configure-aws-credentials@v1      
      with:        
        role-to-assume: arn:aws:iam::1**********0:role/deploy-role        
        role-session-name: deploysession        
        aws-region: ${{env.AWS_DEFAULT_REGION}}

    - name: Checkout
      uses: actions/checkout@v2

    - name: Build mini-environment
      env:
        ENV_NAME: ${{ github.head_ref }}
        USER_NAME: ${{ github.actor }}
        PR_NUMBER: ${{ github.event.number }}
        # COMMIT_MESSAGE: ${{ github.event.pull_request.title }}
      run: |
        cd environments/dev
        terragrunt apply -auto-approve

Steps for installation of Terraform and Terragrunt will be cached by Github actions and will not take time in every execution.


Configuring Github Actions with IAM roles is described in the previous post.


ENV_NAME is an important environment variable, that is used for the creation and identification of appropriate Terraform state for every ephemeral environment.

USER_NAME and PR_NUMBER (COMMIT_MESSAGE is also possible) will be added to AWS resource tags for better visibility.

name: Destroy mini-env when a branch was deleted
on:
  pull_request:
    types: [closed]

permissions:
 id-token: write
 contents: read

jobs:
  destroy_mini_env:
    runs-on: ubuntu-latest

    steps:
    - name: Install Terraform
      uses: hashicorp/setup-terraform@v1
      with:
        terraform_version: 1.1.2

    - name: Install Terragrunt
      uses: supplypike/setup-bin@v1
      with:
        uri: 'https://github.com/gruntwork-io/terragrunt/releases/download/v0.35.16/terragrunt_linux_amd64'
        name: 'terragrunt'
        version: '0.35.16'

    - name: Checkout
      uses: actions/checkout@v2
      
    - name: Configure aws credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        role-to-assume: arn:aws:iam::1**********0:role/deploy-role
        role-session-name: destroysession
        aws-region: us-east-1

    - name: Destroy mini-environment
      env:
        ENV_NAME: ${{ github.head_ref }}
        USER_NAME: ${{ github.actor }}
        PR_NUMBER: ${{ github.run_number }}
      run: |
        cd environments/dev
        terragrunt destroy -auto-approve

Once the pull request into the "master" branch is created, the pipeline starts.

Once it is completed, we can check the results.

terragrunt.hcl file contains the section as follows:

### Remote state S3/DynamoDB configuration
remote_state {
  backend = "s3"
  config = {
    region  = local.tfstate_region
    bucket  = local.tfstate_bucket
    key     = "${path_relative_to_include()}/${get_env("ENV_NAME")}/terraform.tfstate"
    encrypt = true
    dynamodb_table = local.tfstate_table
  }
}

It takes "ENV_NAME" and puts a new Terraform state into a folder with such a name.

The new state file has been created:

The application has been deployed:


Every ephemeral environment gets required values from Tarrafrom state file of "Shared" environment:

### Data sources
# Shared env State
data "terraform_remote_state" "shared" {
  backend = "s3"
  config = {
    region = var.tfstate_region
    bucket = var.tfstate_bucket
    key    = "environments/shared-dev/shared/terraform.tfstate"
  }
}

The new Lambda has been deployed into shared VPC:

Once the pull request is merged (or discarded), the pipeline for environment removal is started:



Conclusion

AWS Cloud and Infrastructure as Code tool like Terraform allows us to provision and destroy ephemeral environments quickly and conveniently. Such an approach helps us increase a release velocity, allows many developers to work on different features in parallel, minimizes costs by using only resources that we need, and removes others. Terragrunt provides us with the ability to dynamically generate a dedicated TF state file for every environment and keep them independent. A CI/CD platform like Github actions organizes all steps into a unified and automated workflow according to the DevOps best practices.