Parallel stages are being killed

DavidA2014 · October 23, 2024, 10:42am

Hi, we have a Jenkins declarative script that schedules a set of stages across a set of agents using:

parallel p

This has worked well, but since early August stages are being randomly killed.
The console log shows:

Killed

but there is no explanation or reason.

How can I determine the reason? Might it be caused by the Jenkins Process Tree Killer?

DavidA2014 · October 25, 2024, 7:50am

I have seen in the logs that the jobs are killed because of:

Cannot contact <server>: java.lang.InterruptedException

The system log shows:

org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep ... waiting for ... unresponsive for 3 min 0 sec
...
org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0

Any suggestions of how to investigate or fix this please?

DavidA2014 · October 29, 2024, 12:01pm

I have been advised on Jira that this is likely “a configuration issue on the system that is running the agent. It may be killing the process due to out of memory issues or due to hardware failures or for other reasons that are outside the control of Jenkins.”

Does anyone have any advise for how to diagnose this please?

poddingue · October 31, 2024, 9:41am

The issue you’re experiencing with Jenkins stages being randomly killed could indeed be related to the Jenkins Process Tree Killer or other system-level issues.

Here are some steps that may help you diagnose and potentially resolve the issue:

Check System Resources:
- Monitor the system resources (CPU, memory, disk I/O) on the agents to see if they are running out of resources.
- Use tools like top, htop, or vmstat to monitor resource usage.
Disable Process Tree Killer:
- I’ve heard the Jenkins Process Tree Killer could sometimes kill processes that it shouldn’t. Or that was the case a long time ago. In any case, you could disable it by setting the hudson.util.ProcessTreeKiller.disable system property to true (if it still exists, I’m not so sure).
  Add the following to your Jenkins startup options:
  -Dhudson.util.ProcessTreeKiller.disable=true
Increase Logging Level:
Increase the logging level for the relevant Jenkins components to get more detailed logs. You can do this in the Jenkins UI under Manage Jenkins → System Log → Add recorder.

DavidA2014 · October 31, 2024, 6:21pm

@poddingue Thanks very much for your help. Do you have a suggestion for which logger(s) to use?

DavidA2014 · November 1, 2024, 5:00pm

The Jenkins declarative script schedules multiple tasks onto an agent, one per available executor. We have a 48 core machine and have 40 executors. When running multiple tasks the machine becomes unresponsive. The Jenkins SSHLauncher times out and jobs are killed.

What measures could we take to ensure that the machine remains responsive to SSH?

The JVM options for the agent do not specify heap memory limits using -Xms and -Xmx. The machine has 4GB memory. Should I set heap limits?

mawinter69 · November 1, 2024, 7:26pm

When you have 48 cores and configure 40 executors then you probably assume that each executors only starts a single threaded process. Many build tool read how many CPUs are available on the machine and then start as many compile jobs. So you if you have such build jobs then you can easily overload your machine.
On my Jenkins we start the build of a bigger C/C++ project with cmake and the machine has 120 CPUs but we only have one executor as the build will consume all 120 CPUs and all of the 400GiB of memory. If we would use 2 executors we would need to take extra action to ensure we don’t overload the machine.
The whole machine has only 4GiB memory with 48CPU? That is not much.

DavidA2014 · November 4, 2024, 12:28pm

@mawinter69 Thanks for your reply. Our builds are single-threaded so each has only one process.

I made a stupid mistake checking the system memory - in fact we have 264GB.

Should I specify limits for the heap in the agent’s configuration page?

Topic		Replies	Views
Jenkins Agent Memory Issue Using Jenkins	10	444	October 23, 2024
Jenkins pipeline hangs after agent is terminated Using Jenkins	0	683	March 5, 2024
How can i kill java process on agent without killing jenkins agent java process? Ask a question question	0	418	January 13, 2024
Running dynamic parallel stages in Jenkins with different agents Ask a question question , jenkins-job-builder	0	3019	May 20, 2023
Sh in parallel stages scale very badly Ask a question	3	221	August 4, 2024

Parallel stages are being killed

Related topics