Jenkins Slowness on AWS Linux 2 AMI and working properly on Debian AMI

mala · April 25, 2024, 7:12am

I am updating Jenkins to version 2.426.3 and openjdk version “17.0.9” in my Kubernetes cluster pod . After the update, Jenkins is running slowly on a node with Amazon Linux 2 AMI, whereas it runs fine on a node with Debian AMI . So what might have make Jenkins slow on Amazon Linux 2 but not on Debian Node.

sodul · April 25, 2024, 7:45am

I’m not sure what you mean by “slow”, but one recurring issue we see is for the persistent volume to get ‘chown’ when mounted. If you have a large number of files (several millions) it can take a very long time for the volume to be effectively mounted.

There are options to avoid the chown but depending on the OS it may or may not work. The symptoms is that when the pod is scheduled, it will take a very long time before the service is up and running, but after that the runtime performance should be similar. An other thing we do is delete the “workspace” folder since we have a large number of pipelines pointing to a monorepo and these folders can include many files, further optimizations are to use sparse checkouts (we only use declarative pipelines and all our jenkins files are always under a single folder) as well as using reference clones.

2.426.3 is several months old and not getting security updates anymore, you should probably upgrade to the latest LTS which is 2.440.3.

mala · April 25, 2024, 9:39am

Hi Sodul,
I can not delete the workspace because its in production cluster.
Slow means the everything inside the jenkins is loding slow , build are taking more time something like that. And i observed that we even have enough space inside the volume.
the main issue is jenkins CI is running on one cluster and its running fine on AWS Amazon Linux2 with same jenkins version.
Jenkins CD present on diffrent cluster with same version of jenkins but its slow on AWS Linux2 and working normally on Debian node.
I have observed that base image of jenkins is Debian. But since its a pod how it depends on node ?

MarkEWaite · April 25, 2024, 11:58am

Amazon Linux 2 has not been supported by the Jenkins project since November 2023. The end of support was announced in a May 2023 blog post.

Amazon Linux 2 is based on Red Hat Enterprise Linux 7. You should plan your upgrade to a newer operating system. Debian 12 is a newer operating system that is supported by Jenkins and is regularly tested by Jenkins developers.

mala · April 25, 2024, 12:21pm

Hi MarkEWaite,
But in my case since jenkins running as a pod and when i do Exec into that pod and i checked the uname its a debian (Means base image that used while building the image is Debian) only. This pod is running on docker runtime and docker is on Amazon Linux 2 Machin. Accourding to your latest comment in that case do i need to update the operating system on which pod is running ?

In that case my jenkins CI on AWS Linux 2 running fine. Iam Facing Issue with Jenkins CD on other cluster. both the cluster configurations are same only diffrence is jenkins CI node is AWS Linux2 and Jenkins CD node is Amazon Linux Debian system

Is their any configurations that we can check/change in jenkins so that it supports for both Linux2 as well as ubuntu ?

sodul · April 25, 2024, 10:12pm

I think there are some confusions here @MarkEWaite and @mala.

What Mark is saying is that Jenkins is not tested to run directly on Amazon Linux 2 and recommends to prefer a recent Debian based instance instead.

What Balaji is working with is that the controller does not run directly on a host machine but as a Kubernetes pod where the nodes are EC2 instances with Amazon EKS running the kubernetes control plane.

There are 2 jenkins instances CI and CD which are running on separate EKS clusters with one using EC2 instances with Amazon Linux 2 and the other controller is running on EC2 instances running Debian. Note that, AFAIK, there no such thing as Amazon Debian, all Linux versions provided by Amazon are RPM based, unfortunately.

We run Jenkins on EKS 1.29 on top of EC2 nodes that are managed through Karpenter (an Amazon project), which schedule EC2 nodes with either Amazon Linux 2, BottleRocket, or Ubuntu AMIs. Which version of these AMIs is pretty much decided by Karpenter, but we decide between BottleRocket, Amazon Linux and Ubuntu.

BottleRocket is the snappiest, but their SELinux implementation is causing issues with our controllers. As far as I know right now both Al2 and Ubuntu are working fine.

@mala something that might help here is to give more details about your 2 clusters:

versions of EKS
versions of the AMIs that you are using (exact versions help)
how you provision nodes in your clusters: AutoScaling Groups, or Karpenter (strongly recommended).
EC2 instance types
Volumes used for the nodes and for the jenkins pods. For example you can attache the jenkins controller to a dedicated EBS volume for persistence, but you can attach an EFS volume instead.
how you deploy Jenkins to EKS: I suppose Helm, which version of the jenkins chart?
which container you use to run the controller, e.g.: jenkins/jenkins:2.440.3-lts-jdk17. We build our own but is pretty much this with a few minor tweaks.
which container you use for the agents, e.g.: jenkins/inbound-agent:3206.vb_15dcf73f6a_9-8-alpine-jdk21.
which version of the kubernetes plugin for jenkins, e.g.: 4203.v1dd44f5b_1cf9
are you running your clusters in single az or multi az.
how did you tune your java memory settings, did you ensure the java process leaves enough RAM in the container for other processes such as git?
are you working with larger git repositories? This is important since git can hoard a lot of cpu and memory resources and can even OOM kill your controller when pipelines are triggered.

I’m not the one directly managing our EKS cluster nor the helm chart. But here are the things I can share:

AL2 uses cgroups v1 while AL2023 uses cgroups v2. It might impact tools and scripts that depend on cgroups to tune their memory and CPU parameters. FYI memory is under a different path in v2, and there is no CPU information with v2.
Running EKS in single-az is faster and cheaper, multi-az adds no value, does not make your jenkins controller HA and make it harder to mount the Persistent Volume since EBS is AZ specific.
if you use EBS for the PV, check that it is not I/O starved: larger EBS volumes get more I/O. This can make the runtime very slow, especially considering jenkins like to read/write to lots of small xml files while running pipelines.

mala · April 26, 2024, 7:28am

Hi Stephane Odul,
Please find Below Details

versions of EKS
Iam not using EKS, My Cluster is Kops and version is 1.22.17.
versions of the AMIs that you are using (exact versions help)
NAME=“Amazon Linux”
VERSION=“2”
ID=“amzn”
ID_LIKE=“centos rhel fedora”
VERSION_ID=“2”
PRETTY_NAME=“Amazon Linux 2”
ANSI_COLOR=“0;33”
CPE_NAME=“cpe:2.3:o:amazon:amazon_linux:2”
HOME_URL=“https://amazonlinux.com/”
SUPPORT_END=“2025-06-30”
how you provision nodes in your clusters: AutoScaling Groups, or Karpenter (strongly recommended).
AutoScaling
EC2 instance types
m5.8xlarge
Volumes used for the nodes and for the jenkins pods. For example you can attache the jenkins controller to a dedicated EBS volume for persistence, but you can attach an EFS volume instead.
EBS Volume
how you deploy Jenkins to EKS: I suppose Helm, which version of the jenkins chart?
Its Kops not EKS,

Initially deployed using Helm as of now just updating the jenkins versions
chart version : 0.16.1
jenkins version is : 2.426.3

which container you use to run the controller, e.g.: jenkins/jenkins:2.440.3-lts-jdk17. We build our own but is pretty much this with a few minor tweaks.
we are using jenkins:2.426.3-lts-jdk17 container.
which container you use for the agents, e.g.: jenkins/inbound-agent:3206.vb_15dcf73f6a_9-8-alpine-jdk21.
NA
which version of the kubernetes plugin for jenkins, e.g.: 4203.v1dd44f5b_1cf9
KubernetesPlugin Version: 4029.v5712230ccb_f8
are you running your clusters in single az or multi az.
Clusters in multi AZ
how did you tune your java memory settings, did you ensure the java process leaves enough RAM in the container for other processes such as git?
these are the settings we are using for Java in Deployment File

name: JAVA_OPTS
value: -Xlog:gc:/var/jenkins_home/gc-%t.log:time,tags -XX:InitialRAMPercentage=50.0
-XX:MaxRAMPercentage=50.0 -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+UnlockExperimentalVMOptions
-XX:+UnlockDiagnosticVMOptions -XX:G1NewSizePercent=20 -XX:+PrintGC -XX:+PrintGCDetails
-XX:+UseAdaptiveSizePolicy -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent
-XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=20 -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1
-XX:NativeMemoryTracking=detail -Djenkins.model.Jenkins.logStartupPerformance=true
-javaagent:/var/jenkins_home/newrelic/newrelic.jar

are you working with larger git repositories? This is important since git can hoard a lot of cpu and memory resources and can even OOM kill your controller when pipelines are triggered.
We are working github for git repos but we are not sure about scale of it.

mala · April 29, 2024, 4:54pm

Hi @sodul ,
Good Day!!
If you don’t mind can you please look into this issue. I have provided all the details that are requested.

Thanks!!
Balaji

mala · April 30, 2024, 12:49pm

Hi @sodul , @MarkEWaite ,

Can I get the update on this.

thanks!!
Balaji

MarkEWaite · April 30, 2024, 1:06pm

I think it is a mistake to use Amazon Linux 2 when the Jenkins project no longer supports Amazon Linux 2. I believe it is unwise to use kops 1.22.17 when the most recent release is 1.28.4. I don’t have any other updates to offer.

sodul · April 30, 2024, 4:18pm

Unfortunately I’m not a support engineer for Jenkins, just yet an other user, so I’m of limited help.

I would strongly encourage you to keep your tech stack to current and maintained versions. Upgrade to the latest cops, helm, jenkins and plugins versions. If you need a reason, do it at the very least for the sake of security. The cops version you are using is several years old and as such was compiled with a version of Go with many known High and Critical CVEs. This alone makes your stack open to attacks from a bad actor, even if you run everything in a private VPC.

Next I would recommend to switch from Kops to EKS with Karpenter to manage scaling up and down. We used Kops 5 years ago, and switching to EKS, with most AWS resources managed by Terraform was a huge improvement. Plus it makes it easier to manage access through roles.
If you can, run in single AZ mode. It is only useful if you make your cluster and services az aware, otherwise it is a waste of money, Check your networking costs between AZs in cost explorer, you might be surprised.

I noticed you use jdk 17 and 21. Try to use the same versions of java between the controller and agent. Mismatched versions should work, but having the same is better in my experience.

As I mentioned above I’m not a support person and this is the extent of how I can help.

mala · May 2, 2024, 2:27pm

Hi @MarkEWaite @sodul ,
I know the version of each one is old but I cannot directly update it to latest versions as of now. Atleast tell me where can i debug for this issue , is their any configuration that we need to change so that it will run on AmazonLinux2 , or please provide me steps to debug this issue.
thanks!!

sodul · May 2, 2024, 5:44pm

@mala Sorry but my ability to help here is limited, I’m not a Jenkins expert just a power user.

I would look at your monitoring stack. If you have something like DataDog it might help you track the performance bottleneck.

I can only share some of my experiences that might give you pointers.

We had an issue several years ago which was caused by:

the console output of some pipelines was very large (try to stay under 2MB, 10MB can still be ok, but definitely avoid anything over 10MB).
the EKS control plane (managed by AWS), was undersized, and choked.

The fix was to work with the pipelines owners to reduce the amount of debug output, or redirect it to separate logs that could be archived as artifacts, and have AWS support scale our EKS backend up. You do not use EKS so you will have to use your own monitoring stack to see if you have containers that are under CPU/Memory/IO pressure.

An other issue we encountered was that our EBS volumes ran out of burst I/O because they were too small.

Take a look at the EBS volumes you use, both the PV that you probably use for the controller, but also the EBS volume used by the EC2 worker nodes. There is a dashboard for each EBS volume under the EC2 console that will show you if the volume throughput is out of burst capacity. When that happens the I/O performance is extremely slow. To help fix that you will need to tune your EBS volume settings, and potentially need to use a larger EBS volume size (larger volumes get more I/O capacity). It is easy to change the EC2 nodes volumes that are ephemeral (new one each time an EC2 node launches), but for your Persistent Volume you will need to run a command on it to have the filesystem updated to detect the increased capacity.

I would definitely look at upgrading to newer versions of the jenkins stack ASAP, even if you need to comply with FedRAMP requirements, you should be able to upgrade your jenkins infrastructure to newer version, if only to comply with security best practices. At the very least make the controller and agents use the same version of Java, right now you are using 17 and 21. It is supposed to work, but I have experienced enough quirks in the past to not trust that the marshalling/unmarshalling will work transparently.

Topic		Replies	Views
Jenkins jenkins-2.426.3-rc released Blog & News	1	877	April 25, 2024
Severe Slowness in Jenkins UI Builds Post-Upgrade (2.440.1) on AWS EC2 RHEL Instanc Ask a question	4	784	October 27, 2024
Jenkins Performance Issues Ask a question	7	2785	May 30, 2023
Jenkins performance is really slow Ask a question question , docker	12	3608	March 20, 2024
Jenkins Upgrade/Migration Ask a question	2	461	October 26, 2022

Jenkins Slowness on AWS Linux 2 AMI and working properly on Debian AMI

Related topics