After reboot of computer, Jenkins became unresponsive

On October 7, I upgraded our Jenkins installation to version 2.462.3. It is running on OpenJDK version 21.0.4 on an Ubuntu based system (20.04).

After this upgrade everything was running fine, until we had to shutdown and restart the computer due to maintenance to the power supply on October 22. After we rebooted the computer, Jenkins became unresponsive (unable to load the GUI in the browser, the process taking 1600% CPU and logging a lot of StackOverflowExceptions and OutOfMemoryExceptions.

What I did after that, is clean up the entire configuration and start to rebuild it. That was Wednesday October 23. At the end of the day, Jenkins still seemed to be running correctly with the same configuration as it was running before I had to reboot the system. Unfortunately, the next day Jenkins was again completely unresponsive.

After that I have tried to run Docker instances of Jenkins, but they also become unresponsive.

Is there something else I could try to resolve this issue. When Jenkins become unresponsive, there are no jobs running and Jenkins is actually “idle”. It could be polling for source code changes, but that is the only thing I can think of right now.

When I start Jenkins and look at the log file, the log shows that Jenkins is up and running and after that the first thing that is logged is either an OutOfMemoryException or a StackOverflowException, so it is also unclear what is happening there.

I have also inspected the java threads and logged them and as far as I can see, most threads are just waiting. That is also expected, as the machine Jenkins is running on stays completely responsive, its just Jenkins that is unresponsive (agent nodes are not found, GUI cannot be loaded, etc…)

Is there anything else that I could try?

Sincerely,
Marcel

As you say you see OOMs are you able to visit <jenkinsurl>/manage/systemInfo and is the memory consumption fine?
If that is the problem then you might need to give your vm more heap. When out of then java is busy with running garbage collection all the time and your jenkins will become more or less unresponsive.

In the situation where Jenkins is unresponsive, I cannot load any page (not even able to simply login). And this situation occurs very soon after restarting the Jenkins controller, so I hardly have time to navigate to some page.

Using Linux “top” I can see that in that situation (but also with a completely clean configuration), the Java process is taking 17G of virtual memory and 2.5% MEM.

I am currently again in the process of rebuilding the job configuration to see when the issue starts to occur, and so far (little less than half the jobs restored) Jenkins is still running fine. Now the systemInfo shows that has a little above 8GB of memory and is using around 250K.