Agents connecting to controller are cleared after a Jenkins restart

Hello,

I run the Jenkins controller in a container, with a volume mounted to JENKINS_HOME=/var/jenkins_home. I also have agents running in a cloud with the inbound-agent image, with -webSocket communication.

When I restart one of the agents, they re-connect successfully to the controller, but when I restart the controller, the agent disappears completely from the “Node” page, and I see that the websocket connection starts failing to the controller in the agent logs.

My assumption is that some folder on the controller side which contains the value of “JENKINS_SECRET” is being removed/destroyed when the controller is restarted. If that’s a thing, is there a way to have it persist across restarts?

Thank you,
Joey

Jenkins setup:

Jenkins: 2.440.1
OS: Linux - 4.19.0-18-cloud-amd64
Java: 17.0.10 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
---
ace-editor:1.1
ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
asm-api:9.7-33.v4d23ef79fcc8
bootstrap4-api:4.6.0-6
bootstrap5-api:5.3.2-4
bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9
branch-api:2.1152.v6f101e97dd77
build-timeout:1.32
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.0.2
cloudbees-disk-usage-simple:203.v3f46a_7462b_1a_
cloudbees-folder:6.901.vb_4c7a_da_75da_3
command-launcher:107.v773860566e2e
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.11.0-95.v22a_d30ee5d36
configuration-as-code:1775.v810dc950b_514
credentials:1337.v60b_d7b_c7b_c9f
credentials-binding:677.vdc9d38cb_254d
data-tables-api:1.13.8-4
discard-old-build:1.07
display-url-api:2.200.vb_9327d658781
durable-task:550.v0930093c4b_a_6
echarts-api:5.4.3-4
email-ext:2.104
font-awesome-api:6.5.1-3
git:5.2.2
git-client:4.7.0
git-server:117.veb_68868fa_027
github:1.38.0
github-api:1.318-461.v7a_c09c9fa_d63
github-branch-source:1772.va_69eda_d018d4
gradle:2.10
gson-api:2.10.1-15.v0d99f670e0a_7
handlebars:3.0.8
instance-identity:185.v303dc7c645f9
ionicons-api:56.v1b_1c8c49374e
jackson2-api:2.16.1-373.ve709c6871598
jakarta-activation-api:2.1.3-1
jakarta-mail-api:2.1.3-1
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jjwt-api:0.11.5-77.v646c772fddb_0
jnr-posix-api:3.1.19-1
joda-time-api:2.12.7-29.v5a_b_e3a_82269a_
jquery3-api:3.7.1-2
jsch:0.2.16-86.v42e010d9484b_
json-api:20240205-27.va_007549e895c
json-path-api:2.9.0-33.v2527142f2e1d
junit:1259.v65ffcef24a_88
ldap:711.vb_d1a_491714dc
lockable-resources:1243.v346d600eea_24
mailer:470.vc91f60c5d8e2
matrix-auth:3.2.1
matrix-project:822.824.v14451b_c0fd42
metrics:4.2.21-449.v6960d7c54c69
mina-sshd-api-common:2.12.0-90.v9f7fb_9fa_3d3b_
mina-sshd-api-core:2.12.0-90.v9f7fb_9fa_3d3b_
momentjs:1.1.1
nomad:0.10.0
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pam-auth:1.10
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:42.v0739460cda_c4
pipeline-graph-analysis:202.va_d268e64deb_3
pipeline-groovy-lib:704.vc58b_8890a_384
pipeline-input-step:491.vb_07d21da_1a_fb_
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2175.v76a_fff0a_2618
pipeline-model-definition:2.2175.v76a_fff0a_2618
pipeline-model-extensions:2.2175.v76a_fff0a_2618
pipeline-rest-api:2.34
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2175.v76a_fff0a_2618
pipeline-stage-view:2.34
plain-credentials:179.vc5cb_98f6db_38
plugin-util-api:4.1.0
popper-api:1.16.1-3
popper2-api:2.11.6-4
prism-api:1.29.0-13
prometheus:2.5.1
rebuild:330.v645b_7df10e2a_
resource-disposer:0.23
scm-api:690.vfc8b_54395023
script-security:1336.vf33a_a_9863911
snakeyaml-api:2.2-111.vc6598e30cc65
ssh-credentials:337.v395d2403ccd4
ssh-slaves:2.948.vb_8050d697fec
sshd:3.322.v159e91f6a_550
structs:337.v1b_04ea_4df7c8
timestamper:1.26
token-macro:400.v35420b_922dcb_
trilead-api:2.141.v284120fd0c46
variant:60.v7290fc0eb_b_cd
windows-slaves:1.8.1
workflow-aggregator:596.v8c21c963d92d
workflow-api:1291.v51fd2a_625da_7
workflow-basic-steps:1042.ve7b_140c4a_e0c
workflow-cps:3867.v535458ce43fd
workflow-cps-global-lib:612.v55f2f80781ef
workflow-durable-task-step:1331.vc8c2fed35334
workflow-job:1400.v7fd111b_ec82f
workflow-multibranch:773.vc4fe1378f1d5
workflow-scm-step:427.v4ca_6512e7df1
workflow-step-api:657.v03b_e8115821b_
workflow-support:865.v43e78cc44e0d
ws-cleanup:0.45

The nodes are persisted in the folder JENKINS_HOME/nodes each node has its own directory with a config.xml and potentially other files.
Do you for some reason loose this directory during a restart?
Jenkins doesn’t store any configuration outside of JENKINS_HOME

Hello @mawinter69!

I’ll have a look at what happens to that directory during a restart.

In the Jenkins controller startup log I see those events when the agent tries to connect to it:

2024-05-24 20:04:27.456+0000 [id=25]	WARNING	jenkins.agents.WebSocketAgents#doIndex: no such agent jenkins-worker-63da1efdebe376
2024-05-24 20:04:30.418+0000 [id=125]	WARNING	jenkins.agents.WebSocketAgents#doIndex: no such agent jenkins-worker-63da1efdebe376
2024-05-24 20:04:33.402+0000 [id=130]	WARNING	jenkins.agents.WebSocketAgents#doIndex: no such agent jenkins-worker-63da1efdebe376
2024-05-24 20:04:40.388+0000 [id=130]	WARNING	jenkins.agents.WebSocketAgents#doIndex: no such agent jenkins-worker-63da1efdebe376

So it means the agent with ID “jenkins-worker-63da1efdebe376” tries to connect back to the controller but the controller doesn’t know about it.

One thing is that I use JasC (Jenkins As Code) - could that wipe out the JENKINS_HOME/nodes directory when I restart?

@mawinter69 :

The nodes directory seems to have the right permissions, but it’s always empty. Now I have one agent registered with the controller, but it’s not showing in the nodes directory:

jenkins@3370ac549e70:/$ ls -lart /var/jenkins_home/ | grep nodes
drwxrwxrwx   2 jenkins jenkins  4096 Feb 26 21:15 nodes


jenkins@3370ac549e70:/$ ls -lart /var/jenkins_home/nodes
total 8
drwxrwxrwx  2 jenkins jenkins 4096 Feb 26 21:15 .
drwxrwxrwx 21 jenkins jenkins 4096 May 24 20:11 ..

hmm you mentioned you use cloud agents, so you also have configured a cloud in Jenkins and the agents are provided on demand? If yes what kind of cloud do you use.
Are the cloud agents busy running a job or are they idling when you restart?

Yes a use a cloud of type “nomad”.

The first agent spins up when the client app triggers a build in Jenkins, then when it’s up it processes the build then it stays idle until the next one. I use the “websocket” type of connection and I see CONNECTED in the agent logs, idle or not. In the Jenkins UI the node (agent) shows as online.

I’ve had restarts when the agent was processing a build and other restarts when it was idle. Not matter what, the agent disappears from the node screen after a restart. It does eventually recover by spinning up a new agent, but I think the “normal” process would be for the existing agent to reconnect to Jenkins using the same ID after a restart?

somehow the nomad plugin needs to take care that agents will not be forgotten during a restart probably. Might be a bug in this plugin, though I don’t know this plugin.

Oh okay. Thank you @mawinter69 . I thought it was Jenkin’s job to “persist” the agent configuration to its file systems, and not the plug-in’s. That specific plug-in doesn’t seem to have much activity in its code base, so I guess I’ll just make sure the Nomad job is recycled quickly after Jenkins is restarted.