Noob - Builds Fail on Node, OSError: [Errno 28] No space left on device

Hi, I recently was asked to take over supporting Jenkins for our company and to be honest, it’s a bit over my head so any help would be very appreciated.

Jenkins Version: 2.361.1
Running 9 Nodes
Built-in node - disabled

We have one node (Node01) that has no free swap space and our engineering team is indicating they are getting OSError: [Errno 28] No space left on device

This is a Linux (Ubuntu) Node is set up for 5 executors. They have asked for me to increase memory on the node to allow builds to run and complete.

I’ve been searching and I’m very confused about the best way to increase memory as it’s only the one build node that is affected. The team is building on Node02 at the moment and that is working.

When I run “systemctl edit jenkins” its a blank file with no environment variables.

When stopping jenkins, we do use systemctl stop jenkins, but then i need to kill the Java process after to actually get it to stop so something seems a bit wonky.

Viewing view /etc/sysconfig/jenkins, i do see the JENKINS_JAVA_OPTIONS setting which has the following settings.

JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true"

running jps -lv command i see the following output

3374542 /usr/lib/jenkins/jenkins.war -Djava.awt.headless=true -DJENKINS_HOME=/var/lib/jenkins
2244301 sun.tools.jps.Jps -Dapplication.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-0.el8_3.x86_64 -Xms8m

Is /etc/sysconfig/jenkins the area to increase general memory? Our controller server has 16 GB of memory available.

Since its just the one Node that is out of free swap memory do I need to set an environment variable on the specific node (Node01) to utilize a specific amount of memory?

Anything will be helpful as i have no idea where to start since our set up seems different from the documentation by cloudbees which is what i was following.

First of all you need to understand that normally a node in Jenkins will run on a different machine than the controller itself.

Increasing the memory of the controller will have no effect on the nodes.

When your node runs out of swap space it means it runs out of memory (swap space on Linux is like pagefile on Windows). This can only be solved by increasing the swap space on that node. This can’t be changed by modifying anything in Jenkins. This must be done on OS level (don’t ask me how that is done on Linux). If your node is a VM it might be possible to increase the memory of the VM.

You should be aware that when you have swapping that everything will become very slow as it means the OS will save parts of the memory to your disk and disk access is slow compared to direct memory access. So swapping should be avoided.

An alternative might be to reduce the number of executors of that node, then you have less parallel running processes and most likely less memory consumption.

This is super helpful so thank you for the response. The build node in question, I actually do not have access to it so I’ll need to get a member of our infrastructure team to review it as well, but I do see that we are also down to 2.38 GB of free disk space so to my first step will get that disk space increased and check if we can go from 5 to 4 executors.

btw free swap space 0 can also mean that the machine has no swap space configured at all ( I have this for my VMs).
And with just 2.4 GiB free disk space it is more likely that this was the cause of the failures. You should also check what is consuming the diskspace, probably you can clean up old stuff and don’t need to increase the disk at all.

We’re looking into some of the larger projects and will look to enable the option to remove old builds. Are there are any common folders on that tend to cache files that I should be reviewing? I just found out this node is running in a docker container.

Look at the configuration of your node, there is Remote root directory. Inside you will find a directory workspace, this is the directory where you find all the directories that are used by your jobs running on that node. If you have deleted jobs that once run on that node, the corresponding directories never get deleted.
Parallel to workspace might be other folders containing cached jar files or tools that are used by builds.
I would take the node offline (or even better to disconnect it) to make sure you don’t interfere with running builds, and then you can remove everything in the root.
The next builds might then take a bit longer as they need to fetch everything from scratch but you’re in a clean state then.

Enabling the build discarder is a good idea but this only affects the disk consumption of the controller not the nodes.

1 Like