Debugging Unexpected termination of the channel

Which side the connection throws this message that I see on the jenkins controller log page?

Nov 24, 2022 9:28:07 AM INFO hudson.remoting.SynchronousCommandTransport$ReaderThread runI/O error in channel slippery_host
java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2894)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3389)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:931)
	at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:374)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)  

I use SSH, and I disabled the ping thread. Disabling the ping thread helped a little, but I am still seeing the same error.
Using Jenkins LTS 2.361.2.

It happened again, this time I was able to capture the following on the agent:

<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3044.vb_940a_a_e4f72e
Launcher: SSHLauncher
Communication Protocol: Standard in/out
This is a Unix agent
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by jenkins.slaves.StandardOutputSwapper$ChannelSwapper to constructor java.io.FileDescriptor(int)
WARNING: Please consider reporting this to the maintainers of jenkins.slaves.StandardOutputSwapper$ChannelSwapper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Agent successfully connected and online
ERROR: Connection terminated
java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2894)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3389)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:931)
	at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:374)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Agent JVM has terminated. Exit signal=TERM

I am having the same problem, the connection terminates after about an hour; there’s a stack exchange that suggests a variety of solutions (mostly updating the JDK of the client) but these haven’t worked
The slack: Jenkins slave jobs failing on "Unexpected termination of channel" - Stack Overflow

The frustrating thing was, I’d seen this issue and fixed it with a JDK change but now it’s failing again after another Jenkins update (2.361.4) - I’ve tried half a dozen different JDK’s now and still no luck.

The failing node is on a 4GB Pi4 running 64 bit PiOSLite with Java JDK 11.0.8-zulu This was working fine, and still works fine on my other node that is a 1G 3B+ running Raspian Lite with the same JDK (Running Java client instead of server, as client isn’t supported on 64bit) so I’m scratching my head as to what’s up.

Welcome to this community @robinbloke :wave: .

When the connection terminates, is it while running a job?

Thanks @poddingue Hey there.
The connection terminates after about an hour of running a job, connection to the client is stable if no jobs are running. There’s no particular point in the job that I can determine that is common when it terminates.

1 Like

I see.
What are the network appliances (if any) between your controller and your agent?

There’s a switch and a DHCP server on the network, which has other things on it - but nothing that’s involved in this run.

Is your switch a managed one?
Do you have any logs?

I have logs from the server/client in Jenkins, but nothing from the switch.
The switch is not managed.

Logs from the server:

SSHLauncher{host='10.82.117.60', port=22, credentialsId='1b924920-8f2d-45f2-80ef-ecac597aa573', jvmOptions='', javaPath='/usr/bin/java', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=0, retryWaitTime=0, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/02/22 09:28:28] [SSH] Opening SSH connection to 10.82.117.60:22.
[12/02/22 09:28:28] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[12/02/22 09:28:28] [SSH] Authentication successful.
[12/02/22 09:28:28] [SSH] The remote user's environment is:
BASH=/usr/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_EXECUTION_STRING=set
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="1" [2]="4" [3]="1" [4]="release" [5]="aarch64-unknown-linux-gnu")
BASH_VERSION='5.1.4(1)-release'
DIRSTACK=()
EUID=1000
GROUPS=()
HOME=/home/pit
HOSTNAME=pits-rack-dev-pen
HOSTTYPE=aarch64
IFS=$' \t\n'
LANG=en_GB.UTF-8
LOGNAME=pit
MACHTYPE=aarch64-unknown-linux-gnu
MOTD_SHOWN=pam
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
PIPESTATUS=([0]="0")
PPID=1465
PS4='+ '
PWD=/home/pit
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=1
SSH_CLIENT='10.82.117.41 56588 22'
SSH_CONNECTION='10.82.117.41 56588 10.82.117.60 22'
TERM=dumb
UID=1000
USER=pit
XDG_RUNTIME_DIR=/run/user/1000
XDG_SESSION_CLASS=user
XDG_SESSION_ID=3
XDG_SESSION_TYPE=tty
_=']'
[12/02/22 09:28:28] [SSH] Starting sftp client.
[12/02/22 09:28:28] [SSH] Copying latest remoting.jar...
Source agent hash is 8D575C4C8219E6AB2039295EC545C6C3. Installed agent hash is 8D575C4C8219E6AB2039295EC545C6C3
Verified agent jar. No update is necessary.
Expanded the channel window size to 4MB
[12/02/22 09:28:28] [SSH] Starting agent process: cd "/home/pit" && /usr/bin/java  -jar remoting.jar -workDir /home/pit -jar-cache /home/pit/remoting/jarCache
Dec 02, 2022 9:28:30 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/pit/remoting as a remoting work directory
Dec 02, 2022 9:28:30 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/pit/remoting
<===[JENKINS REMOTING CAPACITY]===>\00\00\00channel started
Remoting version: 3044.vb_940a_a_e4f72e
Launcher: SSHLauncher
Communication Protocol: Standard in/out
This is a Unix agent
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by jenkins.slaves.StandardOutputSwapper$ChannelSwapper to constructor java.io.FileDescriptor(int)
WARNING: Please consider reporting this to the maintainers of jenkins.slaves.StandardOutputSwapper$ChannelSwapper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Evacuated stdout
Agent successfully connected and online
ERROR: Connection terminated
java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2911)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3406)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:932)
	at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:375)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
ERROR: Socket connection to SSH server was lost
java.io.IOException: Cannot read full block, EOF reached.
	at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:81)
	at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
	at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
	at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706)
	at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
	at java.base/java.lang.Thread.run(Thread.java:829)
Agent JVM has not reported exit code before the socket was lost
[12/02/22 10:05:43] [SSH] Connection closed.

I have a hard time determining where the TERM signal comes from… either from the controller or the agent, or could it be the agent sending it to itself to self-terminate?

The two things that helped are:

  1. Disable the ping thread: a lot fewer disconnects after that
  2. Configure the ssh client side with:
    Host *
        ServerAliveInterval 300
        ServerAliveCountMax 30
    
    This further reduced the number of disconnects, but did not eliminate them entirely.

I did read Oleg’s Troubleshooting Remoting issues, but I am unsure about the current status on how easy or hard it is to diagnose these problems. I tried to turn on all the debugging and logging I can, but I still can’t find the source or cause for of the TERM signal.

1 Like