Nodes/config.xml not cleaned up after failed provisioning

arturrekawek · March 21, 2023, 3:58pm

Hi hi

Environment:
Plugin name: kubernetes-plugin
Plugin tags:

3646.va_b_469a_7666b_7
3651.v908e7db_10d06

Jenkins version: 2.332.2

We observed the following behaviour:

create a new job in Jenkins and specify an invalid pod template (e.g. use more memory than is currently available in your resource quota)
start a new build for this job
Jenkins will create a new /var/jenkins_home/nodes/$pod/config.xml in its filesystem
the Kubernetes API will reject the pod as expected
Jenkins will retry to create the pod as often as a new executor is available (if you have 10 executors, it will retry ten times per second)
now trigger this job multiple times so that the build queue increases (e.g. 50 times)
Jenkins will again create the config.xml files but now it will no longer clean up any failed ../nodes/$pod/config.xml files
after a while, the number of config.xml files from failed builds has increased to an unreasonable number (in our case: 138 thousand)
once you restart Jenkins, it will try to load all nodes from disk, which will not succeed in time (it takes a while to load 138k obsolete configs)

We are severly impacted by this issue and our Jenkins master fails to restart due to OOM error since it tries to load all (undeleted) nodes config.xml files form the disk. Everytime we have to clean the node config.xml files manually to bring the Jenkins up and running.

Does anyone knows any workaround until we have a permanent fix for this issue.

jglick · March 21, 2023, 5:35pm

Please report bugs in Jira, if at all possible with complete, self-contained instructions to reproduce the issue from scratch.

arturrekawek · March 22, 2023, 7:57pm

Thanks Jesse
Of course I will do that

Topic		Replies	Views
Does Jenkins wipe out plugin config in config.xml if unable to inspect plugin? Ask a question question , kubernetes	4	712	June 21, 2023
Kubernetes plugin pod evicted due to disk pressure but no retry done Using Jenkins	1	114	February 4, 2025
Jenkins controller connection issue with kubernetes plugin pod on node termination Using Jenkins	1	307	February 4, 2025
Cloud configs missing from config.xml after upgrade Ask a question question , plugin-site , kubernetes	2	1477	June 14, 2023
Kubernetes Plugin: The yaml merge strategy does not work for raw yaml Using Jenkins question , pipeline , kubernetes	0	224	November 7, 2024

Nodes/config.xml not cleaned up after failed provisioning

Related topics