Using Epsagon for tracing and monitoring in AWS

Table of Contents

Problem statement

A modern microservices architecture allows a large application to be separated into smaller independent parts, with each part having its functionality and responsibility. To serve a single user request, a microservices-based application can call on many internal microservices to compose its response. The advantage here is that different teams/developers can work on their microservices and use languages/frameworks that suit best in a particular case, but end-to-end visibility in daily operations might be weak. Many different tools can help with monitoring, logging, and tracings, such as Amazon CloudWatch for metrics and logs, Container insights, and AWS X-Ray. In this post, we will take a look at Epsagon as a microservices-native observability platform for container and serverless environments.

Solution overview

Epsagon is a solution that allows you to monitor and troubleshoot issues in microservice environments. It’s designed to make Dev and Ops teams more efficient by identifying problems, correlating data, and finding root causes. It was acquired by Cisco in 2021.

Epsagon makes it easy to monitor your cloud services, container orchestrators (Kubernetes and AWS ECS), and serverless functions. Monitor the CPU and memory utilization of your containers and the duration and cold-starts of your serverless functions.

Epsagon has several integrations that can be easily and securely applied to your AWS environment.

Getting started

The first thing that we can do is to sign-up for Epsagon and add AWS integration.

Epsagon provides a CloudFormation template that will deploy the IAM role and other required AWS resources in the account.

The stack creates its CloudTrail for “write-only” actions, trail S3 bucket, CloudWatch Log group, and IAM role for cross-account access.

IAM role contains only policies that are required for Epsagon monitoring.

Epsagon also needs to add a subscription filter to log groups, create event rules and add Lambda Layers.

Only the Epsagon account can assume the given role + it is protected by ExternalId.

ECS cluster monitoring

ECS does not require any configuration. Epsagon just uses the IAM role and gets all information from AWS. The clusters tab shows the status and utilization of the cluster and the number of running services.

EC2 and Fargate clusters are supported. The Services tab shows utilization, number of running tasks, task definition, and other details.

The instances tab shows details about every node.

The tasks tab provides also provides container details and logs:

Tracing requires extra development, but there are many frameworks and libraries for different programming languages.

Tracing looks as follows:

EKS cluster monitoring

Kubernetes integration requires a Helm chart installation.

You will be provided with a command for Helm installation.

Once the Helm chart is installed, you will see your Kubernetes cluster in the list.

Kubernetes control plane metrics are shown in the relevant tab.

You can see all nodes and their status

configuration of every node,

and metrics for every node.

You can see controllers such as Deployments and DaemonSets,

manifests for every controller,

metrics

and events.

All pods information is also available

manifest of every pod

metrics

and events

Every container can also be checked

We can see ports, volumes, and other information about containers.

and metrics.

Metrics are useful for dashboards and alerting which will be demonstrated below.

Graphs and tracing

You can use the Epsagon framework and libraries for adding tracing to your containers or functions. Graphs show total requests and errors for a given period, as well as latency.

Every trace provides a graph, timeline, and sequence of requests.

Dashboards

First of all, we can build a high-level view dashboard, for example, the top 5 errors by application.

see API gateway endpoints throughput and latency,

error codes, invocations, and exceptions

There are various predefined dashboards for serverless and containers:

Kubernetes dashboard shows overall resource utilization and can be filtered by namescape.

Every pod and container can be monitored.

Lambda monitoring

We can see a lot of useful information about the Lambda function, such as the number of invocations and errors, duration, and estimated cost, in one place

By clicking on any function you see more details with graphs and logs. You will also have a direct link to the resource in the AWS console if you need to navigate there for further checking.

Lambda logs are the same as you can find in the CloudWatch log group.

Service map

Epsagon automatically builds a service map.

You can see details about every connection

and identify a problem.

Every component of a map provides the following graphs with success/error rate and duration for different types of requests.

Kubernetes applications require an extra effort for adding a tracing mechanism. After that, you can visualize a service map like this:

Incidents and alerts

Epsagon can be integrated with Slack, PageDuty, and other popular services:

Thresholds can be configured for any available metric

The alerts page contains information about issues, notification channels, assignees, and the capability to mute an alarm.

Conclusion

Epsagon is quite an interesting tool that allows to visualize and analyze metrics, logs, and traces. It works with AWS services such as ECS, EKS, Lambda, Kinesis, API Gateway, S3, DynamoDB, Step Functions and provides the capability of custom data collecting. Epsagon provides a free trial (14 days) with the opportunity to test all features, start onboarding your team, and monitor your applications.