Jenkins agents randomly disconnecting from controller

Hello all,
I’m hoping someone here can give me direction on how to fix random disconnects from build agents in my Jenkins environment. The past few months I have been experiencing agents intermittently disconnecting from the Jenkins controller. I have tried various fixes such as tuning the ssh config on the controller and agents, ensuring I have the same Java version on the controller along with the agents. I have rebuilt the controller and both agents, however I’m still experiencing random disconnects when building docker images. I have listed below the environment configuration along with some logs that I’m seeing when these agents disconnect.

I was not able to attach logs w/ the support plugin… but, do have the information if other logs need to be provided. I super appreciate any direction… :slight_smile:

Environment and configuration
Jenkins Core Version: 2.319.2
Jenkins Controller OS: Amazon Linux II
Jenkins Agents: Ubuntu 20.04
Java Version:
Controller - openjdk version “1.8.0_312”
Build Agent (ARM Instance) - openjdk version “1.8.0_312”
Build Agent (x64_x86 instance) -
OpenSSH Version: Controller: OpenSSH_7.4p1 ---- Build Agents: OpenSSH_8.2p1

Agent Configs:

<slave>
<name>ARM</name>
<description>ARM Build Agent - Ubuntu 20.04</description>
<remoteFS>/home/ubuntu</remoteFS>
<numExecutors>6</numExecutors>
<mode>NORMAL</mode>
<retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
<launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.33.0">
<host>REDACTED</host>
<port>22</port>
<credentialsId>build-agent-ed25519</credentialsId>
<launchTimeoutSeconds>60</launchTimeoutSeconds>
<maxNumRetries>10</maxNumRetries>
<retryWaitTime>15</retryWaitTime>
<sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy"/>
<tcpNoDelay>true</tcpNoDelay>
</launcher>
<label>ARM</label>
<nodeProperties>
<hudson.slaves.EnvironmentVariablesNodeProperty>
<envVars serialization="custom">
<unserializable-parents/>
<tree-map>
<default>
<comparator class="java.lang.String$CaseInsensitiveComparator"/>
</default>
<int>1</int>
<string>Java</string>
<string>/usr/lib/jvm/java-8-openjdk-arm64</string>
</tree-map>
</envVars>
</hudson.slaves.EnvironmentVariablesNodeProperty>
<hudson.tools.ToolLocationNodeProperty>
<locations>
<hudson.tools.ToolLocationNodeProperty_-ToolLocation>
<type>hudson.plugins.git.GitTool$DescriptorImpl</type>
<name>git</name>
<home>/usr/bin/git</home>
</hudson.tools.ToolLocationNodeProperty_-ToolLocation>
</locations>
</hudson.tools.ToolLocationNodeProperty>
</nodeProperties>
</slave>

Second Agent

<slave>
<name>x86</name>
<description>x86 Build Server</description>
<remoteFS>/home/ubuntu</remoteFS>
<numExecutors>4</numExecutors>
<mode>NORMAL</mode>
<retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
<launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.33.0">
<host>REDACTED</host>
<port>22</port>
<credentialsId>build-agent-ed25519</credentialsId>
<launchTimeoutSeconds>60</launchTimeoutSeconds>
<maxNumRetries>10</maxNumRetries>
<retryWaitTime>15</retryWaitTime>
<sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy"/>
<tcpNoDelay>true</tcpNoDelay>
</launcher>
<label>x86</label>
<nodeProperties>
<hudson.slaves.EnvironmentVariablesNodeProperty>
<envVars serialization="custom">
<unserializable-parents/>
<tree-map>
<default>
<comparator class="java.lang.String$CaseInsensitiveComparator"/>
</default>
<int>1</int>
<string>Java</string>
<string>/usr/lib/jvm/java-1.8.0-openjdk-amd64</string>
</tree-map>
</envVars>
</hudson.slaves.EnvironmentVariablesNodeProperty>
<hudson.tools.ToolLocationNodeProperty>
<locations>
<hudson.tools.ToolLocationNodeProperty_-ToolLocation>
<type>hudson.plugins.git.GitTool$DescriptorImpl</type>
<name>git</name>
<home>/usr/bin/git</home>
</hudson.tools.ToolLocationNodeProperty_-ToolLocation>
</locations>
</hudson.tools.ToolLocationNodeProperty>
</nodeProperties>
</slave>

Agent Log when experiencing connectivity problems

Apr 05, 2022 3:35:03 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel channel
java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused by: java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2905)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3400)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:936)
	at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:379)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)

Another Agent Log When Problem is Present

The kexTimeout (65000 ms) expired.
SSH Connection failed with IOException: "The kexTimeout (65000 ms) expired.", retrying in 15 seconds. There are 10 more retries left.
The kexTimeout (65000 ms) expired.
SSH Connection failed with IOException: "The kexTimeout (65000 ms) expired.", retrying in 15 seconds. There are 9 more retries left.
The kexTimeout (65000 ms) expired.
SSH Connection failed with IOException: "The kexTimeout (65000 ms) expired.", retrying in 15 seconds. There are 8 more retries left.
The kexTimeout (65000 ms) expired.
SSH Connection failed with IOException: "The kexTimeout (65000 ms) expired.", retrying in 15 seconds. There are 7 more retries left.
connect timed out

I don’t know what kex timeout is but it feels like the problem.

You could try using jnlp and having your agents connect to your controller instead. That should eliminate ssh related issues?

Hey Halkeye… :slight_smile: Thanks for the response… I was thinking about trying jnlp… I’ll give that a shot tonight after hours…

Thanks again.

Johnny M

This can be related to your AWS configuration:

Hey Saper,
I wanted to reach out and say thank you for your response… Currently all traffic is is allowed from the agent IP to the controller and all traffic from the controller to the agent is allowed.

I also have tried using JNLP to connect from the agent to the controller and was able to establish a connection, but it appears to be unreliable as well… Whenever I close the shell to one of my build servers the JNLP connection drops.

Im wondering if you guys may have any other suggestions??? Ive been banging on my head on this one for a while now… Its been somewhat difficult to troubleshoot with the problem being intermittent.

Thanks again.

This is interesting, are you using SSH tunnel to have the connection open?

I am afraid your problem is not related to Jenkins at all - you have a networking problem between the controller and the agent. My way to troubleshoot this would be to:

  • Understand if there is a middlebox - a firewall / NAT / whatever between the agent and the controller - if yes, examine the settings such as NAT timeout or similar.
  • Sniff the traffic using tools like tcpdump on 1) controller 2) agent and 3) middlebox and see which packets are lost and where

Hey Saper,
Thank you for the post… My environment is hosted in AWS and generally speaking the disconnects weren’t persistent before an upgrade in January of 2022. I tried using JNLP agents as well, but… still experiencing the Intermittent disconnects… from only one of the two build agents.

The latest logs on the agent disconnecting are listed below… Im really hoping I can find a fix to this… I think I will take your advice and run a tcpdump on the agent and controller. Thanks again,

Apr 27, 2022 11:07:55 PM hudson.remoting.Request$2 run
WARNING: Failed to send back a reply to the request UserRequest:hudson.FilePath$IsDirectory@77a0621
java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:313)
	at hudson.remoting.StandardOutputStream.write(StandardOutputStream.java:84)
	at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:92)
	at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:65)
	at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
	at hudson.remoting.Channel.send(Channel.java:766)
	at hudson.remoting.Request$2.run(Request.java:389)
	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Apr 27, 2022 11:07:55 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel channel
java.io.IOException: Unexpected EOF
	at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:100)
	at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
1 Like
Apr 27, 2022 11:07:55 PM hudson.remoting.Request$2 run
WARNING: Failed to send back a reply to the request UserRequest:hudson.FilePath$IsDirectory@51a4c24e
java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:313)
	at hudson.remoting.StandardOutputStream.write(StandardOutputStream.java:84)
	at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:92)
	at hudson.remoting.ChunkedOutputStream.drain(ChunkedOutputStream.java:88)
	at hudson.remoting.ChunkedOutputStream.write(ChunkedOutputStream.java:57)
	at java.io.OutputStream.write(OutputStream.java:75)
	at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:45)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
	at hudson.remoting.Channel.send(Channel.java:766)
	at hudson.remoting.Request$2.run(Request.java:389)
	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Apr 27, 2022 10:23:13 PM hudson.remoting.RemoteInvocationHandler$Unexporter run
WARNING: Couldn't clean up oid=2 from null
java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:313)
	at hudson.remoting.StandardOutputStream.write(StandardOutputStream.java:84)
	at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:92)
	at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:65)
	at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
	at hudson.remoting.Channel.send(Channel.java:766)
	at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.cleanup(RemoteInvocationHandler.java:398)
	at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.access$1000(RemoteInvocationHandler.java:357)
	at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:615)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121)
	at java.lang.Thread.run(Thread.java:748)
1 Like