Jenkins Agent disconnect randomly while running job on it

Hello, my friends of community, would you help on these issues please?

Issues:
When we are executing a Jenkins job, one of the instances on Windows 10 failed every time, when we checked the logs, the connections lost between control node and the work node while executing the job randomly, some time it disconnected less than 1 hour, some time it disconnected after several hours.

Jenkins Cluster Information:
We have Jenkins instances running in different servers:
Jenkins control node: Jenkins in docker container hosted in Ubuntu Server 20.04 LTS

Jenkins agent node: ‘Jenkins installed to Windows 10’ * 3

Errors collected from the Jenkins agent node on Windows 10:

Inbound agent connected from 10.67.12.204
Remoting version: 3046.v38db_38a_b_7a_86
Launcher: JNLPLauncher
Communication Protocol: WebSocket
This is a Windows agent
Agent successfully connected and online
ERROR: Connection terminated
java.nio.channels.ClosedChannelException
        at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:153)
        at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:80)
        at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:146)
        at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.notifyOnClose(JettyWebSocketFrameHandler.java:308)
        at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onClosed(JettyWebSocketFrameHandler.java:292)
        at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$0(WebSocketCoreSession.java:272)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1445)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
        at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
        at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$1(WebSocketCoreSession.java:272)
        at org.eclipse.jetty.util.Callback$4.completed(Callback.java:184)
        at org.eclipse.jetty.util.Callback$Completing.succeeded(Callback.java:344)
        at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:268)
        at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1463)
        at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
        at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
        at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284)
        at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$sendFrame$7(WebSocketCoreSession.java:519)
        at org.eclipse.jetty.util.Callback$3.succeeded(Callback.java:155)
        at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.notifyCallbackSuccess(TransformingFlusher.java:197)
        at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:154)
        at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232)
        at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214)
        at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77)
        at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:526)
        at org.eclipse.jetty.websocket.common.JettyWebSocketRemoteEndpoint.sendBlocking(JettyWebSocketRemoteEndpoint.java:223)
        at org.eclipse.jetty.websocket.common.JettyWebSocketRemoteEndpoint.sendPing(JettyWebSocketRemoteEndpoint.java:169)
        at jenkins.websocket.Jetty10Provider$1.sendPing(Jetty10Provider.java:92)
        at jenkins.websocket.WebSocketSession.lambda$startPings$0(WebSocketSession.java:71)
        at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:69)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Jenkins Environment Information:

Jenkins: 2.365
OS: Linux - 5.4.0-165-generic
Java: 11.0.16 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
---
Office-365-Connector:4.15.0
ace-editor:1.1
active-directory:2.25.1
ant:1.11
antisamy-markup-formatter:2.1
apache-httpcomponents-client-4-api:4.5.13-1.0
authentication-tokens:1.4
authorize-project:1.4.0
bootstrap4-api:4.6.0-3
bootstrap5-api:5.1.3-6
bouncycastle-api:2.26
branch-api:2.1044.v2c007e51b_87f
build-timeout:1.20
caffeine-api:2.9.2-29.v717aac953ff3
checks-api:1.7.4
cloudbees-folder:6.722.v8165b_a_cf25e9
command-launcher:1.6
credentials:1139.veb_9579fca_33b_
credentials-binding:1.27.1
display-url-api:2.3.6
docker-commons:1.19
docker-java-api:3.1.5.2
docker-plugin:1.2.3
durable-task:496.va67c6f9eefa7
echarts-api:5.3.2-1
email-ext:2.83
font-awesome-api:6.0.0-1
git:4.11.5
git-client:3.11.2
git-server:1.10
github:1.35.0
github-api:1.123
github-branch-source:2.11.2
gradle:1.37.1
handlebars:3.0.8
instance-identity:116.vf8f487400980
jackson2-api:2.13.3-285.vc03c0256d517
jakarta-activation-api:2.0.1-1
jakarta-mail-api:2.0.1-1
javax-activation-api:1.2.0-3
javax-mail-api:1.6.2-5
jaxb:2.3.6-1
jdk-tool:1.5
jjwt-api:0.11.2-9.c8b45b8bb173
jquery3-api:3.6.0-2
jsch:0.1.55.2
junit:1119.1121.vc43d0fc45561
ldap:2.7
lockable-resources:2.11
mailer:408.vd726a_1130320
mapdb-api:1.0.9.0
matrix-auth:3.1.2
matrix-project:1.20
mina-sshd-api-common:2.8.0-36.v8e25ce90d4b_1
mina-sshd-api-core:2.8.0-36.v8e25ce90d4b_1
momentjs:1.1.1
okhttp-api:4.9.3-108.v0feda04578cf
p4:1.11.5
pam-auth:1.6
pipeline-build-step:2.18
pipeline-github-lib:1.0
pipeline-graph-analysis:1.11
pipeline-groovy-lib:591.v3a_7f422b_d058
pipeline-input-step:449.v77f0e8b_845c4
pipeline-milestone-step:101.vd572fef9d926
pipeline-model-api:2.2086.v12b_420f036e5
pipeline-model-definition:2.2086.v12b_420f036e5
pipeline-model-extensions:2.2086.v12b_420f036e5
pipeline-rest-api:2.19
pipeline-stage-step:293.v200037eefcd5
pipeline-stage-tags-metadata:2.2086.v12b_420f036e5
pipeline-stage-view:2.19
plain-credentials:139.ved2b_9cf7587b
plugin-util-api:2.16.0
popper-api:1.16.1-2
popper2-api:2.11.2-1
powershell:1.5
resource-disposer:0.16
scm-api:621.vda_a_b_055e58f7
script-security:1175.v4b_d517d6db_f0
snakeyaml-api:1.29.1
ssh-agent:1.24.1
ssh-credentials:295.vced876c18eb_4
ssh-slaves:1.33.0
sshd:3.236.ved5e1b_cb_50b_2
structs:318.va_f3ccb_729b_71
subversion:2.15.5
timestamper:1.13
token-macro:293.v283932a_0a_b_49
trilead-api:1.67.vc3938a_35172f
variant:1.4
workflow-aggregator:581.v0c46fa_697ffd
workflow-api:1153.vb_912c0e47fb_a_
workflow-basic-steps:948.v2c72a_091b_b_68
workflow-cps:2706.v71dd22b_c5a_a_2
workflow-cps-global-lib:588.v576c103a_ff86
workflow-cps-global-lib-http:2.8.0
workflow-durable-task-step:1139.v252a_e12e8463
workflow-job:1182.v60a_e6279b_579
workflow-multibranch:716.vc692a_e52371b_
workflow-scm-step:2.13
workflow-step-api:625.vd896b_f445a_f8
workflow-support:818.v4eb_969241b_c7
ws-cleanup:0.39

Node Information
-----------------
os.name	Windows 10
os.version	10.0
java.version	11.0.15.1
java.version.date	2022-04-22
java.runtime.version	11.0.15.1+2-LTS-10

Jenkins Access and Setup
------------------
Jenkins is being run in a docker container.
Jenkins is accessed through the GUI, directly.

Hello @eman9527 and welcome to this community. :wave:

Is there any kind of network appliance between your Jenkins controller and your Windows agent? A proxy, a firewall, anything?

Hi @poddingue . Thank you for your reply.

our Jenkins control node(Linux hosted) and the Jenkins work node( Windows hosted) are setup in 2 different VLANs, we have company Palo Alto firewall protecting company network. but there is no reverse proxies between the 2 servers, the Jenkins work node connected to Jenkins control node directly.

1 Like

@eman9527 - I’m curious if this behavior has continued for you. We are running Jenkins 2.2.11 and are seeing this behavior between our k8s hosted controller and the windows agents that are running as VMs. It is intermittent though and it proving difficult to figure out.

Make sure you have the same JDK on each machine. Disconnecting is common when the Java versions differ.

For linux easiest option is to just download JDK and unpack it to /var/jenkins/jdk/ (copy to /var/jenkins, untargz, rename jdk_something to jdk). This is the easiest and maybe best option as it will make the JDK on a Jenkins agent completely separated from the system.

You can download JDK from here:

Hi,

Did anyone ever get to the bottom of this? We are seeing the same issue and are struggling to resolve.

Our Controllers run under AKS, and our build agents are on a different vnet. The Windows build agents are connected via websocket. All Controllers are running Jenkins 2.462.1.

It appears to be mostly happening for long running robs e.g. > 2 hours. But on occasion can also occur for smaller jobs.

We have tried increasing our Pod resource, load balancer timeout etc, but nothing seems to make any difference. Any help would be greatly appreciated as we feel we have hit a dead-end.

cc @eman9527 @shhintbw

Kind Regards,
Lee