WebSocket Jenkins agent disconnects (java.util.concurrent.TimeoutException: Ping started at 1775776035301 hasn't completed by 1775776275302)

Hi,

We are seeing random disconnects on our Jenkins inbound agents using WebSocket.

Jenkins version: 2.541.3
Remoting version: 3352.v17a_fb_4b_2773f
Agent Java: OpenJDK 21.0.10
Controller Java: OpenJDK 21.0.9 Temurin

Our setup is:

agent → AWS Global Accelerator → ALB → Jenkins controller

What we are seeing:

  • Agents disconnect randomly, sometimes even while a job is running

  • On the agent side we see:

    Ping failed. Terminating the channel
    java.util.concurrent.TimeoutException: Ping started at ... hasn't completed by ...
        at hudson.remoting.PingThread.ping
        at hudson.remoting.PingThread.run
    
    INFO: Write side closed
    INFO: Read side closed
    INFO: Terminated
    INFO: Performing onReconnect operation
    
  • Immediately after that, the agent reconnects and comes back online

  • On the controller side we mostly only see:

    INFO	j.s.DefaultJnlpSlaveReceiver#channelClosed: Jetty (winstone)-194376 for .... terminated: java.nio.channels.ClosedChannelException
    

What we checked so far:

  • Agent machine looks healthy

  • ALB logs do not show clear errors like 460/504 around the disconnect window

  • Connection is over WebSocket on HTTPS (443)

So it does not look like a normal node crash. It feels like the connection stops responding for some time and Jenkins later closes it due to ping timeout.

Questions:

  1. Has anyone seen WebSocket agents disconnect like this without clear errors?

  2. Could this be related to Jenkins remoting or WebSocket handling?

  3. Are there recommended settings for ping interval / timeout or WebSocket keepalive?

Thanks!

We’re seeing the same problem in Jenkins with the same version, and it’s making builds unstable. Any suggestions would be appreciated.