Jenkins Agent PODS fail to run a job with new Docker Image

Jenkins and plugins versions report

AWS EKS Cluster: 1.27
Jenkins server version: 2.421
Agent Dockerfile:

FROM jenkins/agent
USER root
RUN \
        mkdir -p /var/jenkins_home/deployer && \
        apt-get update -y && \
        curl -kvL https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb -o /var/jenkins_home/deployer/google-chrome-stable_current_amd64.deb && \
        apt-get -f install -y /var/jenkins_home/deployer/google-chrome-stable_current_amd64.deb && \
        chown -R jenkins:jenkins /var/jenkins_home/deployer/ && \
        mkdir -pv /var/lib/jenkins && \
        chown -R jenkins:jenkins /var/lib/jenkins/

Jenkins agent Version that has issues = 3148.v532a_7e715ee3
Jenkins agent Version that is running successfully = 3107.v665000b_51092

Error output from Jenkins-server POD logs:

2023-09-05 07:17:59.486+0000 [id=44]	INFO	hudson.slaves.NodeProvisioner#update: cams-agent-4tb9l provisioning successfully completed. We have now 2 computer(s)
2023-09-05 07:17:59.509+0000 [id=7883]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes-xxxxxx devops-tools/cams-agent-4tb9l
2023-09-05 07:18:00.698+0000 [id=7883]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes-xxxx devops-tools/cams-agent-4tb9l
2023-09-05 07:18:01.700+0000 [id=7865]	INFO	o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: devops-tools/cams-agent-4tb9l Container jnlp was just terminated, so removing the corresponding Jenkins agent
2023-09-05 07:18:01.709+0000 [id=7883]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Container is terminated cams-agent-4tb9l [jnlp]: ContainerStateTerminated(containerID=containerd://0ecc19730f551a0e2e44a13650382fc6772c2cd3e7c04e298497608f452d1422, exitCode=0, finishedAt=2023-09-05T07:18:00Z, message=null, reason=Completed, signal=null, startedAt=2023-09-05T07:18:00Z, additionalProperties={})
2023-09-05 07:18:01.716+0000 [id=7865]	INFO	o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent cams-agent-4tb9l
2023-09-05 07:18:01.725+0000 [id=7865]	INFO	o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent devops-tools/cams-agent-4tb9l
2023-09-05 07:18:01.725+0000 [id=7865]	INFO	o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer cams-agent-4tb9l
2023-09-05 07:18:01.732+0000 [id=7883]	WARNING	o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: cams-agent-4tb9l, template=PodTemplate{id='92465835-bd71-4e63-adda-f795a4533b68', name='CAMS-AGENT', namespace='devops-tools', label='kubeagent', workspaceVolume='org.csanchez.jenkins.plugins.kubernetes.volumes.workspace.PersistentVolumeClaimWorkspaceVolume@9f44e74f', volumes=[org.csanchez.jenkins.plugins.kubernetes.volumes.PersistentVolumeClaim@67c57e05], containers=[ContainerTemplate{name='jnlp', image='xxxx.dkr.ecr.us-east-2.amazonaws.com/jenkins-cams:agent-2', workingDir='/home/jenkins', command='', args='', resourceRequestCpu='', resourceRequestMemory='', resourceRequestEphemeralStorage='', resourceLimitCpu='', resourceLimitMemory='', resourceLimitEphemeralStorage='', livenessProbe=ContainerLivenessProbe{execArgs='', timeoutSeconds=0, initialDelaySeconds=0, failureThreshold=0, periodSeconds=0, successThreshold=0}}], annotations=[PodAnnotation{key='splunk.com/index', value='jenkins-caas-nprod'}], nodeProperties=[hudson.tools.ToolLocationNodeProperty@3376dbd1]}
java.lang.IllegalStateException: Containers are terminated with exit codes: {jnlp=0}
  at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.checkTerminatedContainers(KubernetesLauncher.java:274)
  at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:223)
  at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297)
  at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
  at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  at java.base/java.lang.Thread.run(Thread.java:833)
2023-09-05 07:18:01.732+0000 [id=7883]	INFO	o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent cams-agent-4tb9l
2023-09-05 07:18:01.732+0000 [id=7883]	SEVERE	o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: cams-agent-4tb9l
2023-09-05 07:18:01.732+0000 [id=7883]	INFO	hudson.slaves.AbstractCloudSlave#terminate: FATAL: Computer for agent is null: cams-agent-4tb9l

I have an older version of the docker image, that is working fine. Also, that image shows critical vulnerabilities in our Security Scans, and we need to upgrade to the latest version.

Using Dive, I found the details below.

Cmp   Size  Command                                                                                drwxr-xr-x         0:0     5.3 MB  ├── bin
    124 MB  FROM 114a3630f68d0df                                                                   -rwxr-xr-x         0:0     1.2 MB  │   ├── bash
    7.8 kB  RUN |4 user=jenkins group=jenkins uid=1000 gid=1000 /bin/sh -c groupadd -g "${gid}" "$ -rwxr-xr-x         0:0      44 kB  │   ├── cat
    104 MB  RUN |5 user=jenkins group=jenkins uid=1000 gid=1000 AGENT_WORKDIR=/home/jenkins/agent  -rwxr-xr-x         0:0      73 kB  │   ├── chgrp
    1.4 MB  ADD https://repo.jenkins-ci.org/public/org/jenkins-ci/main/remoting/3107.v665000b_5109 -rwxr-xr-x         0:0      64 kB  │   ├── chmod
    1.4 MB  RUN |6 user=jenkins group=jenkins uid=1000 gid=1000 AGENT_WORKDIR=/home/jenkins/agent  -rwxr-xr-x         0:0      73 kB  │   ├── chown
     90 MB  COPY /javaruntime /opt/java/openjdk # buildkit                                         -rwxr-xr-x         0:0     151 kB  │   ├── cp
       0 B  RUN |6 user=jenkins group=jenkins uid=1000 gid=1000 AGENT_WORKDIR=/home/jenkins/agent  -rwxr-xr-x         0:0     126 kB  │   ├── dash
       0 B  WORKDIR /home/jenkins                                                                  -rwxr-xr-x         0:0     114 kB  │   ├── date
    5.1 kB  COPY ../../jenkins-agent /usr/local/bin/jenkins-agent # buildkit                       -rwxr-xr-x         0:0      81 kB  │   ├── dd
       0 B  RUN |2 version=3107.v665000b_51092-4 user=jenkins /bin/sh -c chmod +x /usr/local/bin/j -rwxr-xr-x         0:0      94 kB  │   ├── df
     46 MB  apt-get update && apt-get install -y wget apt-utils gnupg gnupg1 gnupg2                -rwxr-xr-x         0:0     147 kB  │   ├── dir
     10 kB  wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -     -rwxr-xr-x         0:0      84 kB  │   ├── dmesg
      55 B  echo "deb http://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list. -rwxrwxrwx         0:0        0 B  │   ├── dnsdomainname → hostname
    212 kB  apt-get update                              
`

This is the Jenkins Agent Pod yaml file.

apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    splunk.com/index: "jenkins-caas-nprod"
  labels:
    jenkins: "slave"
    jenkins/label: "kubeagent"
    jenkins/label-digest: "ffa3ba115a1a18165cef0867902fabef92179d38"
  name: "cams-agent-3hnk2"
  namespace: "devops-tools"
spec:
  containers:
  - env:
    - name: "JENKINS_SECRET"
      value: "********"
    - name: "JENKINS_AGENT_NAME"
      value: "cams-agent-3hnk2"
    - name: "JENKINS_NAME"
      value: "cams-agent-3hnk2"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins"
    - name: "JENKINS_URL"
      value: "http://jenkins-service/"
    image: "xxxxxxx.dkr.ecr.us-east-2.amazonaws.com/jenkins:jenkins-agent-1-00-001"
    imagePullPolicy: "IfNotPresent"
    name: "jnlp"
    resources: {}
    tty: false
    volumeMounts:
    - mountPath: "/home/jenkins/.m2"
      name: "volume-0"
      readOnly: false
    - mountPath: "/home/jenkins"
      name: "workspace-volume"
      readOnly: false
    workingDir: "/home/jenkins"
  hostNetwork: false
  nodeSelector:
    businessunit: "gbs"
    environment: "dev"
  restartPolicy: "Never"
  tolerations:
  - effect: "NoSchedule"
    key: "environment"
    operator: "Equal"
    value: "dev"
  volumes:
  - name: "volume-0"
    persistentVolumeClaim:
      claimName: "jenkins-efs-mount-1"
      readOnly: false
  - name: "workspace-volume"
    persistentVolumeClaim:
      claimName: "jenkins-efs-mount"
      readOnly: false

What Operating System are you using (both controller, and any agents involved in the problem)?

Debian

Reproduction steps

Use the docker file provided in the description and create an Agent image and deploy it on EKS Cluster

Expected Results

The job should be successful.

Actual Results

Job fails.

Hello @akandimalla, and welcome to this community! :wave:

I noticed that you’re using FROM jenkins/agent, which currently provides Debian 11 with OpenJDK version 11.0.20.1. When you mention that an “older” version is working for you, could you specify which version that is?
Please have a look at the changelog to see if anything resonates with you.

While this might not directly resolve your issue, I recommend using a pinned version and then employing a tool like dependabot or updatecli to facilitate future upgrades to newer versions.

As they say, “Friends don’t let friends use ‘latest’” – or so I’ve heard. :person_shrugging:

Thanks for the reply @poddingue
The issue is in my Jenkins docker image. I have to use an inbound agent image. I am using Jenkins agent image, which is causing issues.

This reminds me of a Jira issue I saw last week, even if it was about the controller.
What is your Docker version?