Scaling Amazon Elastic Kubernetes Service Workloads with KEDA, OTEL Collector, and Amazon CloudWatch
In our article, Scaling Amazon Elastic Kubernetes Service Workloads with KEDA and Amazon CloudWatch, we suggested using the OTEL Collector to send application metrics to CloudWatch. Today, we will show how to do this in two different ways:
Deploying the OTEL Collector as a sidecar, which means putting it in a container next to our application, allows us to scale and manage it separately according to workload needs. However, this method does add extra resource demands, such as CPU and memory.
Deploying the OTEL Collector as an independent service (gateway) simplifies managing configurations and ensures consistency across multiple services. However, it can become a single point of failure for collecting telemetry data.
Pre-requisites
Install Docker Desktop.
Enable Kubernetes (the standalone version included in Docker Desktop).
An Amazon EKS Cluster with KEDA installed is required. We will use the option
operator
as theidentityOwner
for our AWS CloudWatch scaler. Therefore, we must grant the KEDA operator the necessary IAM permissions to access CloudWatch. You can find an example of how to accomplish this here.Docker Desktop Kubernetes context configured to work with the Amazon EKS cluster.
An IAM User with programmatic access.
Install the AWS CLI.
Install Terraform CLI.
Install K6.
The Application
Run the following commands to set up the solution:
dotnet new webapi -n MyWebApi
dotnet new sln -n MyWebApi
dotnet sln add --in-root MyApi
dotnet add MyWebApi package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add MyWebApi package OpenTelemetry.Instrumentation.AspNetCore
dotnet add MyWebApi package OpenTelemetry.Extensions.Hosting
Open the Program.cs
file and update the content as follows:
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
var builder = WebApplication.CreateBuilder(args);
// Add services to the container.
// Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();
builder.Services.AddHealthChecks();
builder.Services.AddOpenTelemetry()
.ConfigureResource(resource => resource
.AddService(serviceName: builder.Environment.ApplicationName))
.WithMetrics(metrics => metrics
.AddMeter("Microsoft.AspNetCore.Hosting")
.AddOtlpExporter());
var app = builder.Build();
// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
app.UseSwagger();
app.UseSwaggerUI();
}
app.UseHttpsRedirection();
var summaries = new[]
{
"Freezing", "Bracing", "Chilly", "Cool", "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching"
};
app.MapGet("/weatherforecast", async () =>
{
var delay = Random.Shared.Next(0, 1500);
await Task.Delay(delay);
Console.WriteLine($"New request {DateTimeOffset.UtcNow} with delay of {delay} ms");
var forecast = Enumerable.Range(1, 5).Select(index =>
new WeatherForecast
(
DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
Random.Shared.Next(-20, 55),
summaries[Random.Shared.Next(summaries.Length)]
))
.ToArray();
return forecast;
})
.WithName("GetWeatherForecast")
.WithOpenApi();
app.Run();
record WeatherForecast(DateOnly Date, int TemperatureC, string? Summary)
{
public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
}
The AddOpenTelemetry
method provides a builder class to set up the Open Telemetry standard in our application. In this example, we're using the new ASP.NET Core metrics found in the Microsoft.AspNetCore.Hosting
namespace, specifically the http.server.active_requests
metric. The http.server.active_requests
metric represents the number of active HTTP requests currently being handled by the application. By tracking this metric, we can identify periods of high traffic and make informed decisions about resource allocation and scaling strategies.
Infrastructure
We will use Terraform to create the necessary resources for running our application on the Kubernetes cluster. At the project level, create a main.tf
file with the following content:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.31.0"
}
}
backend "local" {}
}
provider "aws" {
region = "<MY_REGION>"
profile = "<MY_AWS_PROFILE>"
max_retries = 2
}
locals {
repository_name = "myrepository"
cluster_name = "<MY_K8S_CLUSTER_NAME>"
role_name = "myrole"
namespace = "<MY_K8S_NAMESPASE>"
policy_name = "mypolicy"
}
resource "aws_ecr_repository" "repository" {
name = local.repository_name
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = false
}
}
data "aws_iam_policy_document" "otel_policy_document" {
statement {
effect = "Allow"
actions = [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"xray:GetSamplingStatisticSummaries",
"logs:CreateLogStream",
"xray:PutTelemetryRecords",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"xray:GetSamplingRules",
"ssm:GetParameters",
"xray:GetSamplingTargets",
"logs:CreateLogGroup",
"logs:PutLogEvents",
"xray:PutTraceSegments"
]
resources = [
"*"
]
}
}
resource "aws_iam_policy" "otel_policy" {
name = local.policy_name
path = "/"
policy = data.aws_iam_policy_document.otel_policy_document.json
}
data "aws_eks_cluster" "cluster" {
name = local.cluster_name
}
module "iam_assumable_role_with_oidc" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
version = "4.14.0"
oidc_subjects_with_wildcards = ["system:serviceaccount:${local.namespace}:*"]
create_role = true
role_name = local.role_name
provider_url = data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer
role_policy_arns = [
aws_iam_policy.otel_policy.arn
]
number_of_role_policy_arns = 1
}
output "role_arn" {
value = module.iam_assumable_role_with_oidc.iam_role_arn
}
output "repository_url" {
value = aws_ecr_repository.repository.repository_url
}
We are creating an Amazon ECR repository to upload our application's image and an IAM Role for our Pod with enoght permissions. Run the following commands to create the resources in AWS:
terraform init
terraform plan -out app.tfplan
terraform apply 'app.tfplan'
Docker Image
At project level, create a Dockerfile
with the following content:
FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base
WORKDIR /app
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
COPY ["MyWebApi/MyWebApi.csproj", "MyWebApi/"]
RUN dotnet restore "MyWebApi/MyWebApi.csproj"
COPY . .
WORKDIR "/MyWebApi"
RUN dotnet build "MyWebApi.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "MyWebApi.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "MyWebApi.dll"]
Run the following command at solution level to upload the image to the Amazon ECR repository:
aws ecr get-login-password --region <MY_REGION> --profile <MY_AWS_PROFILE> | docker login --username AWS --password-stdin <MY_ACCOUNT_ID>.dkr.ecr.<MY_REGION>.amazonaws.com
docker build -t <MY_ACCOUNT_ID>.dkr.ecr.<MY_REGION>.amazonaws.com/myrepository:1.0 -f .\MyWebApi\Dockerfile .
docker push <MY_ACCOUNT_ID>.dkr.ecr.<MY_REGION>.amazonaws.com/myrepository:1.0
Kubernetes
In this section we will create all the resources needed in our Kubernetes cluster using either the sidecar or the gateway approach. Choose one method, and before proceeding to the next step, delete the resources using the following command:
kubectl delete -f <sidecar.yaml|gateway.yam> --namespace=<MY_K8S_NAMESPASE>
Sidecar
Create a sidecar.yaml
file with the following content:
apiVersion: v1
kind: ServiceAccount
metadata:
name: mywebapi-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<MY_ACCOUNT_ID>:role/myrole
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mywebapi-deployment
labels:
app: mywebapi
spec:
replicas: 1
selector:
matchLabels:
app: mywebapi
template:
metadata:
labels:
app: mywebapi
spec:
serviceAccountName: mywebapi-sa
containers:
- name: api-container
env:
- name: ASPNETCORE_ENVIRONMENT
value: Development
- name: ASPNETCORE_HTTP_PORTS
value: '80'
image: <MY_ACCOUNT_ID>.dkr.ecr.<MY_REGION>.amazonaws.com/myrepository:1.0
ports:
- name: http
containerPort: 80
protocol: TCP
- name: otel-container
image: amazon/aws-otel-collector:latest
env:
- name: AWS_REGION
value: <MY_REGION>
imagePullPolicy: Always
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 250m
memory: 250Mi
---
apiVersion: v1
kind: Service
metadata:
name: mywebapi-service
labels:
app: mywebapi
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: mywebapi
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: mywebapi-scaledobject
spec:
scaleTargetRef:
name: mywebapi-deployment
kind: Deployment
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: aws-cloudwatch
metadata:
namespace: MyWebApi
expression: SELECT AVG("http.server.active_requests") FROM MyWebApi
targetMetricValue: "2"
minMetricValue: "0"
awsRegion: "<MY_REGION>"
identityOwner: operator
ServiceAccount
: To interact with AWS CloudWatch, our pods needs to assume an AWS IAM Role through a service account.Deployment
: Under this approach, the pods contains two containers: our application and the OTEL Collector.Service
: Our application will be accessible through a service using aLoadBalancer
as its type.ScaledObject
: The AWS CloudWatch scaled object references the deployment created earlier using thehttp.server.active_requests
mentioned previously.
Run the following command to deploy the application to the cluster:
kubectl apply -f sidecar.yaml --namespace=<MY_K8S_NAMESPASE>
Gateway
Create a gateway.yaml
file with the following content:
apiVersion: v1
kind: ServiceAccount
metadata:
name: mywebapi-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<MY_ACCOUNT_ID>:role/myrole
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-deployment
labels:
app: otel
spec:
replicas: 1
selector:
matchLabels:
app: otel
template:
metadata:
labels:
app: otel
spec:
serviceAccountName: mywebapi-sa
containers:
- name: container
image: amazon/aws-otel-collector:latest
ports:
- name: http
containerPort: 4318
protocol: TCP
- name: grpc
containerPort: 4317
protocol: TCP
env:
- name: AWS_REGION
value: <MY_REGION>
imagePullPolicy: Always
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 250m
memory: 250Mi
---
apiVersion: v1
kind: Service
metadata:
name: otel-service
labels:
app: otel
spec:
type: ClusterIP
ports:
- port: 4318
targetPort: 4318
protocol: TCP
name: http
- port: 4317
targetPort: 4317
protocol: TCP
name: grpc
selector:
app: otel
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mywebapi-deployment
labels:
app: mywebapi
spec:
replicas: 1
selector:
matchLabels:
app: mywebapi
template:
metadata:
labels:
app: mywebapi
spec:
serviceAccountName: mywebapi-sa
containers:
- name: api-container
env:
- name: ASPNETCORE_ENVIRONMENT
value: Development
- name: ASPNETCORE_HTTP_PORTS
value: '80'
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://otel-service:4318
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: http/protobuf
image: <MY_ACCOUNT_ID>.dkr.ecr.<MY_REGION>.amazonaws.com/myrepository:1.0
ports:
- name: http
containerPort: 80
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: mywebapi-service
labels:
app: mywebapi
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: mywebapi
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: mywebapi-scaledobject
spec:
scaleTargetRef:
name: mywebapi-deployment
kind: Deployment
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: aws-cloudwatch
metadata:
namespace: MyWebApi
expression: SELECT AVG("http.server.active_requests") FROM MyWebApi
targetMetricValue: "2"
minMetricValue: "0"
awsRegion: "us-east-2"
identityOwner: operator
The main difference here is the presence of a dedicated deployment and service for the OTEL Collector, exposing the ports 4318
and 4317
. The deployment for our application includes two environment variables that specify the location of the OTEL Collector and the default protocol to use. Run the following command to deploy the application to the cluster:
kubectl apply -f gateway.yaml --namespace=<MY_K8S_NAMESPASE>
Tests
Create a load.js
file with the following content:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 50,
duration: '600s',
};
export default function () {
http.get('<MY_URL>/weatherforecast');
sleep(1);
}
The URL of our service can be obtained with the following command:
kubectl get services --namespace=<MY_K8S_NAMESPASE>
Execute the script using the command:
k6 run load.js
After a couple of minutes, check the number of pods for our deployments using the command kubectl get pods --namespace=<MY_K8S_NAMESPACE>
to see an output like:
NAME READY STATUS RESTARTS AGE
mywebapi-deployment-754fc44b4b-6spfj 2/2 Running 0 23s
mywebapi-deployment-754fc44b4b-8bq7v 2/2 Running 0 39s
mywebapi-deployment-754fc44b4b-95vkg 2/2 Running 0 8m34s
mywebapi-deployment-754fc44b4b-nnwrx 2/2 Running 0 23s
mywebapi-deployment-754fc44b4b-qrvq9 2/2 Running 0 39s
mywebapi-deployment-754fc44b4b-xgds9 2/2 Running 0 39s
You can find the code and scripts here. Thank you, and happy coding.