Hi,
We are seeing random disconnects on our Jenkins inbound agents using WebSocket.
Jenkins version: 2.541.3
Remoting version: 3352.v17a_fb_4b_2773f
Agent Java: OpenJDK 21.0.10
Controller Java: OpenJDK 21.0.9 Temurin
Our setup is:
agent → AWS Global Accelerator → ALB → Jenkins controller
What we are seeing:
-
Agents disconnect randomly, sometimes even while a job is running
-
On the agent side we see:
Ping failed. Terminating the channel java.util.concurrent.TimeoutException: Ping started at ... hasn't completed by ... at hudson.remoting.PingThread.ping at hudson.remoting.PingThread.run INFO: Write side closed INFO: Read side closed INFO: Terminated INFO: Performing onReconnect operation -
Immediately after that, the agent reconnects and comes back online
-
On the controller side we mostly only see:
INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Jetty (winstone)-194376 for .... terminated: java.nio.channels.ClosedChannelException
What we checked so far:
-
Agent machine looks healthy
-
ALB logs do not show clear errors like 460/504 around the disconnect window
-
Connection is over WebSocket on HTTPS (443)
So it does not look like a normal node crash. It feels like the connection stops responding for some time and Jenkins later closes it due to ping timeout.
Questions:
-
Has anyone seen WebSocket agents disconnect like this without clear errors?
-
Could this be related to Jenkins remoting or WebSocket handling?
-
Are there recommended settings for ping interval / timeout or WebSocket keepalive?
Thanks!