Agents failing randomly with ClosedChannelException

Hi,

We are running a Jenkins instance on-prem, connected to around 20 agents, which are used to run UFT tests via hpe-application-automation-tools-plugin.

Everything was working fine until one day, when suddenly things started to get bumpy, even though we didn’t do any change to the infrastructure, apparently.
What happens is that builds fail randomly with errors like the following (from the console output):

20:35:46 Running test: C:\UFT Test Results\workspace\XXX
20:37:42 ERROR: Failed running HpToolsLauncher Backing channel 'uft-vdi-ic-010' is disconnected.
20:37:42 Build step 'Execute OpenText tests from file system' changed build result to FAILURE
20:37:42 FATAL: Channel "hudson.remoting.Channel@808624:uft-vdi-ic-010": Remote call on uft-vdi-ic-010 failed. The channel is closing down or has closed down
20:37:42 java.nio.channels.ClosedChannelException
20:37:42 	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:157)
20:37:42 	at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:88)
20:37:42 	at jenkins.websocket.WebSockets$1.onWebSocketError(WebSockets.java:94)
20:37:42 	at jenkins.websocket.Jetty10Provider$2.onWebSocketError(Jetty10Provider.java:174)
20:37:42 	at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:261)
20:37:42 	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284)
20:37:42 	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1469)
20:37:42 	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1488)
20:37:42 	at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
20:37:42 	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284)
20:37:42 	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.onEof(WebSocketCoreSession.java:254)
20:37:42 	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.fillAndParse(WebSocketConnection.java:491)
20:37:42 	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onFillable(WebSocketConnection.java:349)
20:37:42 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
20:37:42 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
20:37:42 	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
20:37:42 	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421)
20:37:42 	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390)
20:37:42 	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277)
20:37:42 	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199)
20:37:42 	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)
20:37:42 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
20:37:42 	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
20:37:42 	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
20:37:42 	at java.base/java.lang.Thread.run(Thread.java:829)
20:37:42 Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@808624:uft-vdi-ic-010": Remote call on uft-vdi-ic-010 failed. The channel is closing down or has closed down
20:37:42 	at hudson.remoting.Channel.call(Channel.java:996)
20:37:42 	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1147)
20:37:42 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:538)
20:37:42 	at hudson.model.Run.execute(Run.java:1895)
20:37:42 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
20:37:42 	at hudson.model.ResourceController.execute(ResourceController.java:101)
20:37:42 	at hudson.model.Executor.run(Executor.java:442)
20:37:42 ERROR: Step ‘Publish OpenText tests result’ failed: no workspace for UFT-test-runner-4001-omq91 #6186
20:37:42 Finished: FAILURE

We activated debug logging on the agents, and this is what comes out:

GRAVE: Connection error has occurred
java.io.IOException: Operazione di I/O terminata a causa dell'uscita dal thread oppure della richiesta di un'applicazione.

	at java.base/sun.nio.ch.Iocp.translateErrorToIOException(Unknown Source)
	at java.base/sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

dic 21, 2023 9:57:34 AM hudson.remoting.jnlp.Main$CuiListener status
INFORMAZIONI: Read side closed
dic 21, 2023 9:57:34 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint close
BUONO: Close public void close(CloseReason cr): CloseReason[1000]
dic 21, 2023 9:57:34 AM hudson.remoting.jnlp.Main$CuiListener status
INFORMAZIONI: Terminated
dic 21, 2023 9:57:34 AM hudson.slaves.ChannelPinger$2 onClosed
BUONO: Terminating ping thread for uft-vdi-ic-006
dic 21, 2023 9:57:34 AM org.jvnet.winp.WinProcess sendCtrlC
BUONO: Attempting to send CTRL+C to pid=4860 ("C:\UFT Test Results\workspace\UFT-test-runner-2003-g1y94\HpToolsLauncher.exe" -paramfile props21122023090014107.txt)
dic 21, 2023 9:57:34 AM hudson.Launcher$RemoteLaunchCallable$1 join
INFORMAZIONI: Failed to synchronize IO streams on the channel hudson.remoting.Channel@11d6656e:uft-vdi-ic-006
hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11d6656e:uft-vdi-ic-006": Remote call on uft-vdi-ic-006 failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:996)
	at hudson.remoting.Channel.syncIO(Channel.java:1735)
	at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1408)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:924)
	at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:902)
	at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:853)
	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
	at hudson.remoting.Request$2.run(Request.java:377)
	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:125)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11d6656e:uft-vdi-ic-006": channel is already closed
	at hudson.remoting.Engine$1AgentEndpoint.lambda$onClose$1(Engine.java:637)
	... 6 more

dic 21, 2023 9:57:34 AM hudson.remoting.Request$2 run
INFORMAZIONI: Failed to send back a reply to the request UserRequest:UserRPCRequest:hudson.Launcher$RemoteProcess.join[](16): hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11d6656e:uft-vdi-ic-006": channel is already closed
dic 21, 2023 9:57:44 AM sun.net.www.protocol.http.HttpURLConnection plainConnect0
OTTIMALE: ProxySelector Request for http://jenkins-uft.servizi.gr-u.it:9090/login
dic 21, 2023 9:57:44 AM sun.net.www.protocol.http.HttpURLConnection plainConnect0
OTTIMALE: Proxy used: DIRECT
dic 21, 2023 9:57:44 AM sun.net.www.protocol.http.HttpURLConnection writeRequests
BUONO: sun.net.www.MessageHeader@9543ee35 pairs: {GET /login HTTP/1.1: null}{User-Agent: Java/11.0.21}{Host: jenkins-uft.servizi.gr-u.it:9090}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Connection: keep-alive}
dic 21, 2023 9:57:44 AM sun.net.www.http.HttpClient logFinest

We can’t figure out what is the root cause.
That

IOException: Operazione di I/O terminata a causa dell’uscita dal thread oppure della richiesta di un’applicazione

is not really helpful, it’s not clear what is causing the connection to drop: was it a connection timeout? was the connection reset? How can we figure out which is the right one?
I have the feeling that there is a bug in the remoting library.
Talking about timeouts: is it possible to configure the connection timeout in the remoting library on the agents? I think not.

Shall we try to switch from web socket to TCP port?

We tried to upgrade:

  • Jenkins (now it’s the latest LTS),
  • the plugins,
  • the JDK on the server,
  • the JDK on the agents,
  • the remoting version on the agents (agent.jar)

to no avail.

We also thought it could be due to a server overload, so we gave more memory (now Jenkins has 16GB max), but we didn’t see any benefit.

If anyone could point us in the right direction, we would be really grateful.

Thank you very much in advance.

Jenkins setup:

Jenkins: 2.426.2
OS: Windows Server 2016 Standard - Version 1607- Build 14393.6452
Java: 11.0.21 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
Running Jenkins directly or in a container like Tomcat? Directly
Is Jenkins accessed through a reverse proxy? No
How you installed Jenkins: Windows installer
How you’re launching any involved agents: WinSW

ace-editor:1.1
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
authentication-tokens:1.53.v1c90fd9191a_b_
bootstrap4-api:4.6.0-6
bootstrap5-api:5.3.2-3
bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9
branch-api:2.1135.v8de8e7899051
build-timeout:1.31
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.0.2
cloudbees-folder:6.858.v898218f3609d
command-launcher:107.v773860566e2e
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.11.0-95.v22a_d30ee5d36
configuration-as-code:1763.vb_fe9c1b_83f7b
credentials:1311.vcf0a_900b_37c2
credentials-binding:642.v737c34dea_6c2
cucumber-reports:5.8.0
data-tables-api:1.13.8-2
display-url-api:2.200.vb_9327d658781
docker-commons:439.va_3cb_0a_6a_fb_29
durable-task:523.va_a_22cf15d5e0
echarts-api:5.4.3-2
email-ext:2.102
font-awesome-api:6.5.1-1
git:4.12.1
git-client:3.12.4
git-server:99.va_0826a_b_cdfa_d
github-api:1.318-461.v7a_c09c9fa_d63
gradle:2.3
handlebars:3.0.8
hp-application-automation-tools-plugin:23.4
htmlpublisher:1.32
instance-identity:185.v303dc7c645f9
ionicons-api:56.v1b_1c8c49374e
jackson2-api:2.15.3-372.v309620682326
jakarta-activation-api:2.0.1-3
jakarta-mail-api:2.0.1-3
javadoc:243.vb_b_503b_b_45537
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jjwt-api:0.11.5-77.v646c772fddb_0
jnr-posix-api:3.1.18-1
jquery:1.12.4-1
jquery-detached:1.2.1
jquery3-api:3.7.1-1
jsch:0.2.8-65.v052c39de79b_2
json-api:20231013-3.v20f3c247f2fe
json-path-api:2.8.0-5.v07cb_a_1ca_738c
junit:1252.vfc2e5efa_294f
ldap:711.vb_d1a_491714dc
locale:314.v22ce953dfe9e
lockable-resources:1218.va_3dd45e2b_fa_7
mailer:463.vedf8358e006b_
matrix-auth:2.6.6
matrix-project:822.v01b_8c85d16d2
maven-plugin:3.23
mina-sshd-api-common:2.11.0-86.v836f585d47fa_
mina-sshd-api-core:2.11.0-86.v836f585d47fa_
momentjs:1.1.1
okhttp-api:4.11.0-157.v6852a_a_fa_ec11
pam-auth:1.10
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:38.v445716ea_edda_
pipeline-graph-analysis:202.va_d268e64deb_3
pipeline-groovy-lib:689.veec561a_dee13
pipeline-input-step:477.v339683a_8d55e
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2151.ve32c9d209a_3f
pipeline-model-extensions:2.2151.ve32c9d209a_3f
pipeline-rest-api:2.34
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2151.ve32c9d209a_3f
pipeline-stage-view:2.34
plain-credentials:143.v1b_df8b_d3b_e48
plugin-util-api:3.6.0
popper-api:1.16.1-3
popper2-api:2.11.6-4
prism-api:1.29.0-8
resource-disposer:0.23
scm-api:683.vb_16722fb_b_80b_
script-security:1294.v99333c047434
snakeyaml-api:2.2-111.vc6598e30cc65
ssh-credentials:305.v8f4381501156
sshd:3.312.v1c601b_c83b_0e
structs:325.vcb_307d2a_2782
template-project:1.5.2
thinBackup:1.18
timestamper:1.26
token-macro:400.v35420b_922dcb_
trilead-api:1.67.vc3938a_35172f
variant:60.v7290fc0eb_b_cd
workflow-api:1283.v99c10937efcb_
workflow-basic-steps:2.23
workflow-cps:3826.v3b_5707fe44da_
workflow-cps-global-lib:609.vd95673f149b_b
workflow-durable-task-step:2.38
workflow-job:1385.vb_58b_86ea_fff1
workflow-multibranch:756.v891d88f2cd46
workflow-scm-step:415.v434365564324
workflow-step-api:639.v6eca_cd8c04a_a_
workflow-support:865.v43e78cc44e0d
ws-cleanup:0.45

Nodes setup:

OS: Windows 10 Enterprise - Version 21H2 - Build 19044.3570 - 64 bit
Java: 11.0.21 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
Remoting version: 3160.vd76b_9ddd10cc
Launcher: JNLPLauncher
Communication Protocol: WebSocket
This is a Windows agent

FYI I think I solved it by switching from Websockets to plain old JNLP4-connect.

That is:

  1. From Security settings, activate ‘TCP port for inbound agents’ (e.g. select ‘Random port’)
  2. In the configuration page of each agent untick the option ‘Use WebSocket’

I’ll keep you updated.

1 Like

Thanks a lot for your feedback. :pray:

Just to confirm that switching to JNLP4-connect solved the issue.
In fact, we reverted a couple of agents to Websocket, and two builds which ran on those agents failed with the ClosedChannelException again.

I believe there is a bug in the remoting library. I should open a bug report.