Unknown host exception disconnects the kubernetes agent on Jenkins 2.361.1

juandyego1983 · March 17, 2023, 8:15am

This issue occurs after migrating Jenkins version from 2.319.1 to 2.361.1.
On Jenkins version 2.319.1 doesnt occur. Only occurs on Jenkins version 2.361.1
Error occurs intermittently: Kubernetes JNLP container terminated with error code 255.

Console logs:

With Kubernetes plugin version 1.31 on Jenkins 2.319.1 it retries the pod creation while on plugin version 3704.va_08f0206b_95e on Jenkins 2.361.1 it doesn’t retry.
Is there any configuration parameter for that theme in the plugin?
On the other hand, is there a parameter in the plugin to configure the waiting time for the POD to wake up?

Jenkins version 2.319.1 with Kubernetes plugin version 1.31 does not have any timeout options configured in the arguments.

Jenkins version 2.361.1 with Kubernetes plugin version 3704.va_08f0206b_95e has these arguments configured:
-Dkubernetes.websocket.timeout=60000
-Dorg.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.websocketConnectionTimeout=60000

poddingue · March 17, 2023, 8:57am

Hello @juandyego1983 and welcome to this community

It would be handier next time if you could paste your anonymized logs in a text format.
It seems that there are some changes between the Kubernetes plugin versions and Jenkins versions that might be causing the issue. It’s possible that the Kubernetes plugin version 3704.va_08f0206b_95e has different default behavior or configuration compared to version 1.31 that is causing the pod creation to fail.

Regarding your question about the waiting time for the pod to wake up, there is a configuration parameter in the Kubernetes plugin called podRetention. This parameter specifies the time period for which a pod will be retained after a build has finished. You can configure this parameter to control the waiting time for the pod to wake up.

In addition, you can try increasing the timeout value using the -Dkubernetes.websocket.timeout argument in the Jenkins JVM options. This argument increases the timeout value for WebSocket connections between Jenkins and the Kubernetes cluster.

juandyego1983 · March 17, 2023, 9:23am

Thank you very much for the answer and welcome to the community! I’m in for the next one to add the text instead of the image
Would you know where that PodRetention configuration is configured?
Thanks again.

poddingue · March 17, 2023, 9:35am

You’re welcome.

The PodRetention configuration can be set in the Kubernetes cloud configuration in the Jenkins Global Configuration. Here’s how to do it:

Go to Jenkins home page.
Click on “Manage Jenkins” link.
Click on “Configure System” link.
Scroll down to the “Cloud” section.
Find your Kubernetes cloud configuration and click on the “Advanced” button.
Scroll down to the “Pod Template” section.
Set the value of “Pod Retention” to the desired duration in seconds. The default value is 30 seconds.

Once you have made the changes, don’t forget to click the “Save” button at the bottom of the page to apply them.

juandyego1983 · March 17, 2023, 10:59am

Thanks again, in both Jenkins version are equals.
For example, for Jenkins version 2.319.1 with Kubernetes plugin version 1.31 on log we can see that:

Created Pod: kubernetes namespace/pod-name-wb39h
[Normal][namespace/pod-name-wb39h][Scheduled] Successfully assigned namespace/pod-name-wb39h to aks-npworker-XXXXXXXXXXX
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image1" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container builder
[Normal][namespace/pod-name-wb39h][Started] Started container builder
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image2" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container azure-cli
[Normal][namespace/pod-name-wb39h][Started] Started container azure-cli
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image3" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container maven
[Normal][namespace/pod-name-wb39h][Started] Started container maven
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image4" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container security
[Normal][namespace/pod-name-wb39h][Started] Started container security


Created Pod: kubernetes namespace/pod-name-wb39h
Agent pod-name-wb39h is provisioned from template template-wb39h
---
apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    buildUrl: "http://jenkins-master:8080/jenkins/job/name-job-jenkins/develop/7/"
    runUrl: "job/name-job-jenkins/develop/7/"
  labels:
    slave-pod: "namespace-jenkins-slave"
    aadpodidbinding: "jenkins-azure-identity-binding"
    jenkins/jenkins-master-jenkins-slave-maparc: "true"
    jenkins/label: "template"
    jenkins/label-digest: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
  name: "pod-name-wb39h"
  namespace: "namespace"
spec:
  containers:
  - command:
    - "/busybox/cat"
    image: "image1"
    imagePullPolicy: "IfNotPresent"
    name: "builder"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image2"
    imagePullPolicy: "IfNotPresent"
    name: "azure-cli"
    resources:
      limits:
        memory: "512Mi"
        cpu: "200m"
      requests:
        memory: "256Mi"
        cpu: "100m"
    tty: true
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image3"
    imagePullPolicy: "IfNotPresent"
    name: "maven"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image4"
    imagePullPolicy: "Always"
    name: "security"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - env:
    - name: "JENKINS_SECRET"
      value: "********"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-master-agent:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "pod-name-wb39h"
    - name: "JENKINS_NAME"
      value: "pod-name-wb39h"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins/agent"
    - name: "JENKINS_URL"
      value: "http://jenkins-master:8080/jenkins/"
    image: "jenkins/inbound-agent:4.11-1-jdk11"
    name: "jnlp"
    resources:
      limits: {}
      requests:
        memory: "256Mi"
        cpu: "100m"
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  nodeSelector:
    agentpool: "npworkerspot"
  restartPolicy: "Never"
  tolerations:
  - effect: "NoSchedule"
    key: "jenkins"
    operator: "Equal"
    value: "worker"
  - effect: "NoSchedule"
    key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
  volumes:
  - name: "nfs-cache"
    persistentVolumeClaim:
      claimName: "nfs-cache"
      readOnly: false
  - emptyDir:
      medium: ""
    name: "workspace-volume"
  - configMap:
      items:
      - key: "settings.xml"
        path: "settings.xml"
      name: "settings-xml"
    name: "maven-settings"

Running on pod-name-wb39h in /home/jenkins/agent/workspace/xxxxxxxxxxxxx

Anf for Jenkins version 2.361.1 with Kubernetes plugin version 3704.va_08f0206b_95e

Created Pod: kubernetes namespace/pod-name--c44pq
Agent pod-name--c44pq is provisioned from template template-nhq40
---
apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    buildUrl: "http://jenkins-master:8080/jenkins/job/name-job-jenkins/develop/7/"
    runUrl: "job/name-job-jenkins/develop/7/"
  labels:
    slave-pod: "namespace-jenkins-slave"
    aadpodidbinding: "jenkins-azure-identity-binding"
    jenkins/jenkins-master-jenkins-slave-maparc: "true"
    jenkins/label: "template"
    jenkins/label-digest: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
  name: "pod-name--c44pq"
  namespace: "namespace"
spec:
  containers:
  - command:
    - "/busybox/cat"
    image: "image1"
    imagePullPolicy: "IfNotPresent"
    name: "builder"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image2"
    imagePullPolicy: "IfNotPresent"
    name: "azure-cli"
    resources:
      limits:
        memory: "512Mi"
        cpu: "200m"
      requests:
        memory: "256Mi"
        cpu: "100m"
    tty: true
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image3"
    imagePullPolicy: "IfNotPresent"
    name: "maven"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - command:
    - "cat"
    image: "image4"
    imagePullPolicy: "Always"
    name: "security"
    resources:
      limits:
        memory: "1Gi"
        cpu: "500m"
      requests:
        memory: "512Mi"
        cpu: "200m"
    tty: true
    volumeMounts:
    - mountPath: "/localrepo"
      name: "nfs-cache"
    - mountPath: "/root/.m2"
      name: "maven-settings"
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  - env:
    - name: "JENKINS_SECRET"
      value: "********"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-master-agent:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "pod-name--c44pq"
    - name: "JENKINS_NAME"
      value: "pod-name--c44pq"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins/agent"
    - name: "JENKINS_URL"
      value: "http://jenkins-master:8080/jenkins/"
    image: "jenkins/inbound-agent:4.11-1-jdk11"
    name: "jnlp"
    resources:
      limits: {}
      requests:
        memory: "256Mi"
        cpu: "100m"
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  nodeSelector:
    agentpool: "npworkerspot"
  restartPolicy: "Never"
  tolerations:
  - effect: "NoSchedule"
    key: "jenkins"
    operator: "Equal"
    value: "worker"
  - effect: "NoSchedule"
    key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
  volumes:
  - name: "nfs-cache"
    persistentVolumeClaim:
      claimName: "nfs-cache"
      readOnly: false
  - emptyDir:
      medium: ""
    name: "workspace-volume"
  - configMap:
      items:
      - key: "settings.xml"
        path: "settings.xml"
      name: "settings-xml"
    name: "maven-settings"

Running on pod-name--c44pq in /home/jenkins/agent/workspace/xxxxxxxxxxxxx

With Jenkins version 2.319.1 with Kubernetes plugin version 1.31 we can see on log:

Created Pod: kubernetes namespace/pod-name-wb39h
[Normal][namespace/pod-name-wb39h][Scheduled] Successfully assigned namespace/pod-name-wb39h to aks-npworker-XXXXXXXXXXX
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image1" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container builder
[Normal][namespace/pod-name-wb39h][Started] Started container builder
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image2" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container azure-cli
[Normal][namespace/pod-name-wb39h][Started] Started container azure-cli
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image3" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container maven
[Normal][namespace/pod-name-wb39h][Started] Started container maven
[Normal][namespace/pod-name-wb39h][Pulled] Container image "image4" already present on machine
[Normal][namespace/pod-name-wb39h][Created] Created container security
[Normal][namespace/pod-name-wb39h][Started] Started container security

And with that versión retry PODs creation if is necessary. But with the other versión doesn’t appear that log and doesn’t retry, so the POD creation and pipeline fails. With other version, If the POD creation fails it is retried until it is created.

juandyego1983 · March 24, 2023, 8:39am

Does anyone know the answer to the question posed?

abcdefstar · April 7, 2023, 2:21am

Did you find a fix for this issue? Even we are facing the same issue since the upgrade of Jenkins

halkeye · April 7, 2023, 5:30am

is your agent image using the right version of remoting aka agent.jar for the version of jenkins you have?

abcdefstar · April 7, 2023, 7:57am

Hi,
We are currently in jenkins version 2.361.1 and the jnlp inbound-agent which we are using for our kubernetes is windowsserver-ltsc2019

abcdefstar · April 10, 2023, 10:56am

And remoting version : 3107.v665000b_51092

juandyego1983 · April 24, 2023, 11:51am

Hi All, After a support ticket in Microsoft, our problem is in the POD to POD communication due to the use of Pod Identity. Our cluster is heavily scaled/descaled causing delays due to the use of Pod Identity. That functionality is already deprecated and recommends migration to Workload Identity.
In previous versions of Jenkins/plugins the error was not displayed because internally the plugin made retries until the POD was fully up. But with the indicated version, those retries are not performed making the problem evident. I hope this information is useful to you.

abcdefstar · May 26, 2023, 8:15am

Hi @juandyego1983 , so this issue is resolved on your side? Did you make any changes in your pod template as well or its completely on infrastructure side?
I have an aws environment and still finding workaround.

juandyego1983 · May 26, 2023, 11:59am

Hi, on Azure Kubernetes Service we had to modify POD yaml to use worload identity instead of pod identity. To use Azure workload identity you must to add these labels:

azure.workload.identity/use=true
azure.workload.identity/inject-proxy-sidecar=true

Ref: Migrate your Azure Kubernetes Service (AKS) pod to use workload identity - Azure Kubernetes Service | Microsoft Learn

Topic		Replies	Views
Unable to use latest version of Kubernetes Plugins with Jenkins Using Jenkins question , pipeline , kubernetes	0	965	August 8, 2023
Seeing issue with pod connection after Kubernetes Plugin Upgrade Ask a question	7	858	September 27, 2024
Jenkins controller connection issue with kubernetes plugin pod on node termination Using Jenkins	1	331	February 4, 2025
Kubernetes plugin: retries not working? Ask a question kubernetes	4	1962	February 12, 2025
Kubernetes Plugin Using Jenkins question	0	82	February 11, 2025

Unknown host exception disconnects the kubernetes agent on Jenkins 2.361.1

Related topics