By: Tzahi Fridman, Senior Architect, Automat-IT, July 2020
Since the early days of business computing, scaling was something that was always considered, especially in growing companies. When computing is based on on-premise hardware, companies are dealing with exact evaluation challenges of the purchased hardware, how long it will serve the current and upcoming loads, what will be the next steps and of course how to maximize the existing hardware for varied workloads during the day.
Migrating to cloud computing provides solutions for scaling workloads. With the proper design and implementation, the scaling challenges can be solved, providing business continuity and availability during high loads, and save cost using lightweight system workloads during idle times.
In Automat-IT, we are striving to use AWS infrastructure to provide our customers with the highest level of scalability and reliability, while making sure to use the actual computing resources needed.
For that to happen, an application should be able to run in several replicas side-by-side, so scale up and down should mainly happen by adding more replicas of the same application in case of high load, and scale down when the load is decreasing. The system architect should make sure that such applications are ready for such deployments in terms of separated configurations, preferably stateless applications and proper communications with other system components.
One of the best practices for these kinds of deployments is having batch processing workloads that are running in a Kubernetes environment in the AWS environment. AWS provides managed Kubernetes infrastructure (aka EKS). The Kubernetes infrastructure provides scalable solutions based on:
AWS Autoscaling Groups - the ability to add more instances when the load is increasing
Horizontal Pod Autoscaling (HPA) - the ability to add more replica pods of the same application when the load is increasing
Cluster Autoscaler - make sure that the cluster adds and removes nodes to accommodate the scaling pods
But wait, what do you mean “the load is increasing?”
The basic metrics are obviously CPU and memory consumption. When a service works harder, the HPA will deploy more replicas, and if the existing Kubernetes instance does not have enough space for it, a new instance will be spun up by the autoscaling group. This scenario though does not cover more interesting use cases like working with queues. Let’s assume that there is a queue of job requests, and one replica consumes these jobs one-by-one and is executing it while the queue keeps filling up. In this case, the HPA should be based on a queue-size based metric.
When the queue fills up, more replicas will start and each will consume from the queue.
Each processing job should do its work and shutdown once finished
Using this methodology, we can create an autoscaling group of EC2 instances (In case the workload knows to self-heal and recover, Spot Instances can be used)), and here we have a system that consumes computing resources only when there is a job to be done, and in a full idle when there are no jobs in the queue.
Another option is the option to use AWS Fargate instead of Kubernetes. Using Fargate, one can deploy workloads without managing the nodes and infrastructure scaling. Using the same methodology, Docker-based workloads can be deployed in Fargate on a load basis. Again, we can build here a system that will use computing resources only upon demand.
In the Fargate solution, a Lambda function listens to the job queue. For each new job, Lambda will:
Transfer the job message to the running queue
Spin up a job worker in Fargate
The worker will fetch the job message from the running queue
Upon finishing the worker job will shut itself down