Build frequintly fails with "Backing channel agent-name is disconnected"

I keep on getting build failures every to builds, which is tiring because the build can tak an hour to complete (tests on real DB).

What would be the ways to stabilize builds? Both nodes are in the same network. Both are on proxmox and machines are close to each other. So it seems like there should be no real issues with the network. Any way I can stabilize it?

There is not much info in the build console:

ERROR: Build step failed with exception
java.lang.NullPointerException: no workspace from node hudson.slaves.DumbSlave[builder02-molvm] which is computer hudson.slaves.SlaveComputer@538f39c and has channel null
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:114)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
	at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.build(MavenModuleSetBuild.java:944)
	at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:894)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1895)
	at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:543)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:442)
Build step 'Execute shell' marked build as failure
ERROR: Failed to parse POMs
java.io.IOException: Backing channel 'builder02-molvm' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
	at jdk.proxy2/jdk.proxy2.$Proxy192.isAlive(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1212)
	at hudson.maven.ProcessCache$MavenProcess.call(ProcessCache.java:167)
	at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:877)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1895)
	at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:543)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:442)
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused by: java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2915)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3410)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:954)
	at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:392)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:50)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
ERROR: Step ‘Archive the artifacts’ failed: no workspace for MLNET--build-dev--main-full #18
ERROR: builder02-molvm is offline; cannot locate JDK17
ERROR: builder02-molvm is offline; cannot locate Maven39

Node console (builder02):

Checking Java version in the PATH
openjdk version "11.0.22" 2024-01-16
OpenJDK Runtime Environment (build 11.0.22+7-post-Ubuntu-0ubuntu220.04.1)
OpenJDK 64-Bit Server VM (build 11.0.22+7-post-Ubuntu-0ubuntu220.04.1, mixed mode, sharing)
[03/28/24 10:59:59] [SSH] Checking java version of /var/jenkins/jdk/bin/java
Couldn't figure out the Java version of /var/jenkins/jdk/bin/java
bash: /var/jenkins/jdk/bin/java: No such file or directory

[03/28/24 10:59:59] [SSH] Checking java version of java
[03/28/24 10:59:59] [SSH] java -version returned 11.0.22.
[03/28/24 10:59:59] [SSH] Starting sftp client.
[03/28/24 10:59:59] [SSH] Copying latest remoting.jar...

Jenkins setup:

Jenkins: 2.440.2
OS: Linux - 5.15.0-100-generic
Java: 17.0.10 - Private Build (OpenJDK 64-Bit Server VM)

adoptopenjdk:1.5
ansicolor:1.0.4
ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
asm-api:9.6-3.v2e1fa_b_338cd7
bootstrap5-api:5.3.3-1
bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9
branch-api:2.1152.v6f101e97dd77
build-timeout:1.32
build-user-vars-plugin:1.9
build-with-parameters:76.v9382db_f78962
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.0.2
cloudbees-folder:6.901.vb_4c7a_da_75da_3
command-launcher:107.v773860566e2e
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.11.0-95.v22a_d30ee5d36
compact-columns:1.185.vf3851b_4d31fe
conditional-buildstep:1.4.3
config-file-provider:968.ve1ca_eb_913f8c
console-column-plugin:252.v0b_8fa_0e33b_72
copyartifact:722.v0662a_9b_e22a_c
credentials:1337.v60b_d7b_c7b_c9f
credentials-binding:657.v2b_19db_7d6e6d
cron_column:1.7
description-setter:239.vd0a_6b_785f92d
display-url-api:2.200.vb_9327d658781
durable-task:550.v0930093c4b_a_6
echarts-api:5.5.0-1
email-ext:2.105
extended-choice-parameter:381.v360a_25ea_017c
external-monitor-job:215.v2e88e894db_f8
extra-columns:1.26
font-awesome-api:6.5.1-3
git:5.2.1
git-client:4.7.0
github:1.38.0
github-api:1.318-461.v7a_c09c9fa_d63
github-branch-source:1781.va_153cda_09d1b_
gradle:2.10
groovy:457.v99900cb_85593
gson-api:2.10.1-15.v0d99f670e0a_7
instance-identity:185.v303dc7c645f9
ionicons-api:56.v1b_1c8c49374e
jackson2-api:2.17.0-379.v02de8ec9f64c
jakarta-activation-api:2.1.3-1
jakarta-mail-api:2.1.3-1
javadoc:243.vb_b_503b_b_45537
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jjwt-api:0.11.5-77.v646c772fddb_0
joda-time-api:2.12.7-29.v5a_b_e3a_82269a_
jquery3-api:3.7.1-2
jsch:0.2.16-86.v42e010d9484b_
json-api:20240303-41.v94e11e6de726
json-path-api:2.9.0-52.v57de85cc4722
junit:1259.v65ffcef24a_88
ldap:719.vcb_d039b_77d0d
locale:431.v3435fa_8f8445
mailer:472.vf7c289a_4b_420
mapdb-api:1.0.9-28.vf251ce40855d
matrix-auth:3.2.2
matrix-project:822.824.v14451b_c0fd42
maven-plugin:3.23
mina-sshd-api-common:2.12.0-90.v9f7fb_9fa_3d3b_
mina-sshd-api-core:2.12.0-90.v9f7fb_9fa_3d3b_
next-executions:310.v52e770651319
nodejs:1.6.1
nodelabelparameter:1.12.0
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pam-auth:1.10
parameterized-trigger:787.v665fcf2a_830b_
permissive-script-security:0.7
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:42.v0739460cda_c4
pipeline-graph-analysis:216.vfd8b_ece330ca_
pipeline-groovy-lib:704.vc58b_8890a_384
pipeline-input-step:491.vb_07d21da_1a_fb_
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2184.v0b_358b_953e69
pipeline-model-definition:2.2184.v0b_358b_953e69
pipeline-model-extensions:2.2184.v0b_358b_953e69
pipeline-rest-api:2.34
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2184.v0b_358b_953e69
pipeline-stage-view:2.34
plain-credentials:179.vc5cb_98f6db_38
plugin-util-api:4.1.0
postbuild-task:1.9
postbuildscript:3.2.0-550.v88192b_d3e922
publish-over:0.22
publish-over-ssh:1.25
rebuild:330.v645b_7df10e2a_
resource-disposer:0.23
rich-text-publisher-plugin:1.5
role-strategy:713.vb_3837801b_8cc
run-condition:1.7
saferestart:0.7
scm-api:689.v237b_6d3a_ef7f
script-security:1326.vdb_c154de8669
show-build-parameters:1.0
simple-theme-plugin:176.v39740c03a_a_f5
snakeyaml-api:2.2-111.vc6598e30cc65
ssh-credentials:326.v7fcb_a_ef6194b_
ssh-slaves:2.948.vb_8050d697fec
sshd:3.322.v159e91f6a_550
structs:337.v1b_04ea_4df7c8
subversion:2.17.3
text-finder:1.26
text-finder-run-condition:6.vdf94e6f8d2c3
throttle-concurrents:2.14
timestamper:1.26
token-macro:400.v35420b_922dcb_
trilead-api:2.142.v748523a_76693
variant:60.v7290fc0eb_b_cd
view-job-filters:369.ve0513a_a_f5524
workflow-aggregator:596.v8c21c963d92d
workflow-api:1291.v51fd2a_625da_7
workflow-basic-steps:1049.v257a_e6b_30fb_d
workflow-cps:3883.vb_3ff2a_e3eea_f
workflow-durable-task-step:1331.vc8c2fed35334
workflow-job:1400.v7fd111b_ec82f
workflow-multibranch:773.vc4fe1378f1d5
workflow-scm-step:427.v4ca_6512e7df1
workflow-step-api:657.v03b_e8115821b_
workflow-support:881.v7663695646cf
ws-cleanup:0.45

I see that you’re using MavenBuild job type, similar to freestyle this job type is not able to survive connection loss.
If you run your builds with pipeline jobs, chances are good that the connection loss has no impact and once the agent reconnects you will see the build still running.

Besides that those connection problems might be an indicator for network problems.
You might also want to check the agent logs in JENKINS_HOME/logs/slaves/<agent_name>, older logs are there with name agent.log.x
maybe there is something that indicates why connection was lost.

You run on Linux another source for problems can be memory pressure. When the kernel runs out of memory the OOM killer might kill processes.

Another thing you say your controller is running with java17 but it seems your agent is using java11. The recommendation is to use the same major java version for controller and agent.

Installing JDK17 on build agents seem to have fixed the problem.

For anyone having similar problems:

  • Get JDK 17 from Adoptium (Linux x64 in my case).
  • Unpack to /var/jenkins.
  • Rename unpacked “jdk[version]” to “jdk”.
  • Check: /var/jenkins/jdk/bin/java -version

Jenkins agent will use /var/jenkins/jdk/bin/java over any other version.