Unknown host exception disconnects the kubernetes agent on Jenkins 2.361.1

This issue occurs after migrating Jenkins version from 2.319.1 to 2.361.1.
On Jenkins version 2.319.1 doesnt occur. Only occurs on Jenkins version 2.361.1
Error occurs intermittently: Kubernetes JNLP container terminated with error code 255.

Console logs:

With Kubernetes plugin version 1.31 on Jenkins 2.319.1 it retries the pod creation while on plugin version 3704.va_08f0206b_95e on Jenkins 2.361.1 it doesn’t retry.
Is there any configuration parameter for that theme in the plugin?
On the other hand, is there a parameter in the plugin to configure the waiting time for the POD to wake up?

Jenkins version 2.319.1 with Kubernetes plugin version 1.31 does not have any timeout options configured in the arguments.

Jenkins version 2.361.1 with Kubernetes plugin version 3704.va_08f0206b_95e has these arguments configured:
-Dkubernetes.websocket.timeout=60000
-Dorg.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.websocketConnectionTimeout=60000

Hello @juandyego1983 and welcome to this community :wave:

It would be handier next time if you could paste your anonymized logs in a text format. :pray:
It seems that there are some changes between the Kubernetes plugin versions and Jenkins versions that might be causing the issue. It’s possible that the Kubernetes plugin version 3704.va_08f0206b_95e has different default behavior or configuration compared to version 1.31 that is causing the pod creation to fail.

Regarding your question about the waiting time for the pod to wake up, there is a configuration parameter in the Kubernetes plugin called podRetention. This parameter specifies the time period for which a pod will be retained after a build has finished. You can configure this parameter to control the waiting time for the pod to wake up.

In addition, you can try increasing the timeout value using the -Dkubernetes.websocket.timeout argument in the Jenkins JVM options. This argument increases the timeout value for WebSocket connections between Jenkins and the Kubernetes cluster.

Thank you very much for the answer and welcome to the community! I’m in for the next one to add the text instead of the image :sweat_smile:
Would you know where that PodRetention configuration is configured?
Thanks again.

1 Like

You’re welcome. :hugs:

The PodRetention configuration can be set in the Kubernetes cloud configuration in the Jenkins Global Configuration. Here’s how to do it:

  1. Go to Jenkins home page.
  2. Click on “Manage Jenkins” link.
  3. Click on “Configure System” link.
  4. Scroll down to the “Cloud” section.
  5. Find your Kubernetes cloud configuration and click on the “Advanced” button.
  6. Scroll down to the “Pod Template” section.
  7. Set the value of “Pod Retention” to the desired duration in seconds. The default value is 30 seconds.

Once you have made the changes, don’t forget to click the “Save” button at the bottom of the page to apply them.

Thanks again, in both Jenkins version are equals.
For example, for Jenkins version 2.319.1 with Kubernetes plugin version 1.31 on log we can see that:

Created Pod: kubernetes namespace/pod-name-wb39h
[Normal][namespace/pod-name-wb39h][Scheduled] Successfully assigned namespace/pod-name-wb39h to aks-npworker-XXXXXXXXXXX
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image1" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container builder
[Normal][namespace/pod-name-wb39h][Started] Started container builder
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image2" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container azure-cli
[Normal][namespace/pod-name-wb39h][Started] Started container azure-cli
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image3" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container maven
[Normal][namespace/pod-name-wb39h][Started] Started container maven
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image4" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container security
[Normal][namespace/pod-name-wb39h][Started] Started container security


Created Pod: kubernetes namespace/pod-name-wb39h
Agent pod-name-wb39h is provisioned from template template-wb39h
---
apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    buildUrl: "http://jenkins-master:8080/jenkins/job/name-job-jenkins/develop/7/"
    runUrl: "job/name-job-jenkins/develop/7/"
  labels:
    slave-pod: "namespace-jenkins-slave"
    aadpodidbinding: "jenkins-azure-identity-binding"
    jenkins/jenkins-master-jenkins-slave-maparc: "true"
    jenkins/label: "template"
    jenkins/label-digest: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
  name: "pod-name-wb39h"
  namespace: "namespace"
spec:
  containers:
  - command:
    - "/busybox/cat"
    image: "image1"
    imagePullPolicy: "IfNotPresent"
    name: "builder"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image2"
    imagePullPolicy: "IfNotPresent"
    name: "azure-cli"
    resources:
      limits:
        memory: "512Mi"
        cpu: "200m"
      requests:
        memory: "256Mi"
        cpu: "100m"
    tty: true
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image3"
    imagePullPolicy: "IfNotPresent"
    name: "maven"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image4"
    imagePullPolicy: "Always"
    name: "security"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - env:
    - name: "JENKINS_SECRET"
      value: "********"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-master-agent:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "pod-name-wb39h"
    - name: "JENKINS_NAME"
      value: "pod-name-wb39h"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins/agent"
    - name: "JENKINS_URL"
      value: "http://jenkins-master:8080/jenkins/"
    image: "jenkins/inbound-agent:4.11-1-jdk11"
    name: "jnlp"
    resources:
      limits: {}
      requests:
        memory: "256Mi"
        cpu: "100m"
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  nodeSelector:
    agentpool: "npworkerspot"
  restartPolicy: "Never"
  tolerations:
  - effect: "NoSchedule"
    key: "jenkins"
    operator: "Equal"
    value: "worker"
  - effect: "NoSchedule"
    key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
  volumes:
  - name: "nfs-cache"
    persistentVolumeClaim:
      claimName: "nfs-cache"
      readOnly: false
  - emptyDir:
      medium: ""
    name: "workspace-volume"
  - configMap:
      items:
      - key: "settings.xml"
        path: "settings.xml"
      name: "settings-xml"
    name: "maven-settings"

Running on pod-name-wb39h in /home/jenkins/agent/workspace/xxxxxxxxxxxxx

Anf for Jenkins version 2.361.1 with Kubernetes plugin version 3704.va_08f0206b_95e

Created Pod: kubernetes namespace/pod-name--c44pq
Agent pod-name--c44pq is provisioned from template template-nhq40
---
apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    buildUrl: "http://jenkins-master:8080/jenkins/job/name-job-jenkins/develop/7/"
    runUrl: "job/name-job-jenkins/develop/7/"
  labels:
    slave-pod: "namespace-jenkins-slave"
    aadpodidbinding: "jenkins-azure-identity-binding"
    jenkins/jenkins-master-jenkins-slave-maparc: "true"
    jenkins/label: "template"
    jenkins/label-digest: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
  name: "pod-name--c44pq"
  namespace: "namespace"
spec:
  containers:
  - command:
    - "/busybox/cat"
    image: "image1"
    imagePullPolicy: "IfNotPresent"
    name: "builder"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image2"
    imagePullPolicy: "IfNotPresent"
    name: "azure-cli"
    resources:
      limits:
        memory: "512Mi"
        cpu: "200m"
      requests:
        memory: "256Mi"
        cpu: "100m"
    tty: true
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image3"
    imagePullPolicy: "IfNotPresent"
    name: "maven"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image4"
    imagePullPolicy: "Always"
    name: "security"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - env:
    - name: "JENKINS_SECRET"
      value: "********"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-master-agent:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "pod-name--c44pq"
    - name: "JENKINS_NAME"
      value: "pod-name--c44pq"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins/agent"
    - name: "JENKINS_URL"
      value: "http://jenkins-master:8080/jenkins/"
    image: "jenkins/inbound-agent:4.11-1-jdk11"
    name: "jnlp"
    resources:
      limits: {}
      requests:
        memory: "256Mi"
        cpu: "100m"
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  nodeSelector:
    agentpool: "npworkerspot"
  restartPolicy: "Never"
  tolerations:
  - effect: "NoSchedule"
    key: "jenkins"
    operator: "Equal"
    value: "worker"
  - effect: "NoSchedule"
    key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
  volumes:
  - name: "nfs-cache"
    persistentVolumeClaim:
      claimName: "nfs-cache"
      readOnly: false
  - emptyDir:
      medium: ""
    name: "workspace-volume"
  - configMap:
      items:
      - key: "settings.xml"
        path: "settings.xml"
      name: "settings-xml"
    name: "maven-settings"

Running on pod-name--c44pq in /home/jenkins/agent/workspace/xxxxxxxxxxxxx

With Jenkins version 2.319.1 with Kubernetes plugin version 1.31 we can see on log:

Created Pod: kubernetes namespace/pod-name-wb39h
[Normal][namespace/pod-name-wb39h][Scheduled] Successfully assigned namespace/pod-name-wb39h to aks-npworker-XXXXXXXXXXX
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image1" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container builder
[Normal][namespace/pod-name-wb39h][Started] Started container builder
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image2" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container azure-cli
[Normal][namespace/pod-name-wb39h][Started] Started container azure-cli
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image3" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container maven
[Normal][namespace/pod-name-wb39h][Started] Started container maven
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image4" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container security
[Normal][namespace/pod-name-wb39h][Started] Started container security

And with that versión retry PODs creation if is necessary. But with the other versión doesn’t appear that log and doesn’t retry, so the POD creation and pipeline fails. With other version, If the POD creation fails it is retried until it is created.

Does anyone know the answer to the question posed?

Did you find a fix for this issue? Even we are facing the same issue since the upgrade of Jenkins

is your agent image using the right version of remoting aka agent.jar for the version of jenkins you have?

Hi,
We are currently in jenkins version 2.361.1 and the jnlp inbound-agent which we are using for our kubernetes is windowsserver-ltsc2019

And remoting version : 3107.v665000b_51092

Hi All, After a support ticket in Microsoft, our problem is in the POD to POD communication due to the use of Pod Identity. Our cluster is heavily scaled/descaled causing delays due to the use of Pod Identity. That functionality is already deprecated and recommends migration to Workload Identity.
In previous versions of Jenkins/plugins the error was not displayed because internally the plugin made retries until the POD was fully up. But with the indicated version, those retries are not performed making the problem evident. I hope this information is useful to you.

Hi @juandyego1983 , so this issue is resolved on your side? Did you make any changes in your pod template as well or its completely on infrastructure side?
I have an aws environment and still finding workaround.

Hi, on Azure Kubernetes Service we had to modify POD yaml to use worload identity instead of pod identity. To use Azure workload identity you must to add these labels:

azure.workload.identity/use=true
azure.workload.identity/inject-proxy-sidecar=true

Ref: Migrate your Azure Kubernetes Service (AKS) pod to use workload identity - Azure Kubernetes Service | Microsoft Learn

1 Like