top of page

AWS EKS Fargate in the Video Conversion Project

Project overview


One of our long term customers has raised a requirement for a simple, robust, scalable, and cost-effective video transcoding solution to complement their existing services. With the customer’s R&D team, we’ve designed and built a serverless video transcoding platform that utilizes Fargate for EKS, SQS, and Lambda to deliver this service to the end-users.


All phases of the project were delivered to the customer as IaC (Infrastructure as Code, Terraform) and Jenkins pipelines code. All infrastructure-related artifacts are managed in Git.


The video transcoding project opened a new line of business for our customers, allowing them to provide a better, faster, and cheaper service to their platform’s end user.


Solution design

Since the Lambda function has a maximum timeout of 15 minutes and the client had its own docker image for this service already, our choice fell on the EKS Fargate, a serverless solution based on Kubernetes.


Also, we have chosen the AWS EKS Fargate to implement the transcoding platform because with EKS Fargate it is possible to easily build a serverless environment that does not need to be maintained or monitored, everything is managed by the AWS cloud. By utilizing this AWS service we do not need to keep constantly running instances, Fargate instances will be connected automatically exactly when they are needed.


There are also some specifics when working with AWS EKS Fargate that we have had to take into account during this project:


  • Separate EKS NodeGroup for the operation of service pods is required;

  • Additional steps are required to use IAM roles;

  • There is no standard out of the box solution for logging;


Here is the additional list of AWS EKS Fargate limitations from the AWS website:


  • There is a maximum of 4 vCPU and 30Gb memory per pod;

  • Currently, there is no support for stateful workloads that require persistent volumes or file systems;

  • you cannot run DaemonSets, Privileged pods, or pods that use HostNetwork or HostPort;

  • the only load balancer you can use is an Application Load Balancer


In the end, our final high-level design scheme was as follows:

  1. The client calls the Gateway API and uploads the source video to the S3 Bucket

  2. The Gateway API creates a message in SQS with a parameter that contains the path to the downloaded video file.

  3. SQS is triggering a Lambda Function, which creates a Kubernetes job and creates a message in the second SQS.

  4. The application takes the message from SQS, receives the path to the file from the parameter, transcodes it, loads the already converted video file into the S3 Bucket, and removes the message from the queue.

  5. If the job result is unsuccessful or execution timeout has happened, a message from the "Running" queue enters the dead letter queue for further analysis.


Picture 1 - Solution Design Schematics



Lambda function


Python was chosen as the programming language for the Lambda function.

As we said earlier, the following tasks were assigned to our lambda function:


  • read the parameter from the SQS message that triggered it;

  • create a message in the “Running” queue with the parameter that was received from the message trigger;

  • create Kubernetes batch-job;


We planned to make a kubectl analogy from the Lambda function, for this we decided to use Kubernetes Python Client. For work with SQS we used the PIP package “boto3”.


First we need get parameter from SQS message, so we written following function:


# Get input value from SQS message

def get_input_value(event):

message = (event["Records"][0]["messageAttributes"]["input"])

key = message.get('stringValue')

return key


Where “input” is the name of a parameter created by API Gateway and the content path to the video file in S3 Bucket.


Second, we need to create a message in the “Running” queue with the same parameter.


# Create message in SQS Running

def sqs_running_create_message(inputFile):

sqs = boto3.client('sqs', region_name=awsRegion)

message = sqs.send_message(

QueueUrl=sqsRunningUrl,

MessageBody="Transcoder",

DelaySeconds=0,

MessageAttributes={

'input': {

'StringValue': inputFile,

'DataType': 'String'

}

}

)

print("Sent message to Running SQS")

key = (message["MessageId"])

print("Message ID: " + key)

return key


Values “awsRegion” and “sqsRunningUrl” were stored in the Lambda environment.


After we read the parameter from the SQS message and created the message in the “Running” queue, we are ready to create a Kubernetes job.


def create_job_object(event):

jobName = "transcoder-" + randomString()

namespace = "default"

inputFile = get_input_value(event)

messageId = sqs_running_create_message(inputFile)


# Configureate Pod template container

container = client.V1Container(

name="transcoder",

image=ecrURL + ":" + ecrTag,

env=[

{

'name': 'SQS_QUEUE_URL',

'value': sqsRunningUrl

},

{

'name': 'AWS_REGION',

'value': awsRegion

},

{

'name': 'AWS_S3_BUCKET',

'value': s3bucket

},

])


# Create and configure a spec section

template = client.V1PodTemplateSpec(

metadata=client.V1ObjectMeta(labels={"app": "transcoder"}),

spec=client.V1PodSpec(restart_policy="Never", containers=[container]))

# Create the specification of deployment

spec = client.V1JobSpec(

template=template,

backoff_limit=4,

ttl_seconds_after_finished=60)

# Instantiate the job object

job = client.V1Job(

api_version="batch/v1",

kind="Job",

metadata=client.V1ObjectMeta(name=jobName),

spec=spec)


print("Creating job: "+ str(jobName))

print("Image Tag: " + str(ecrTag))