We have been having lots of memory issue with Jenkins (restarting it worked fine for a few days). We even allocated 32G of memory but it still eventually run out.
We have noticed there are many builds that are appearing on the built-in node, but we did restrict those builds to run on agents, which is probably why in the screenshot, it’s not using the 2 executors of the built-in node, but those builds are still appearing on built-in node, as well as on the agent.
We are wondering if this is causing the memory issue and any help would be greatly appreciated.
parts of the pipeline run on the controller node regardless of what you’ve specified in your Jenkinsfile. It may be worth looking into what the actual pipeline is here.
I have similar question, but not quite same. I use kubernetes as the cloud provider, and i can always see some pipeline jobs being “executed” on built-in node, but after a couple hundred milliseconds they’ll be “delivered” to nodes (in other words: pods) during the busy time in workdays.
The pipeline job is delegated to built-in node
Then the new node(pod) will take this pipeline job
Is it the same problem that built-in node is running parts of pipeline script which are not control-able for us?
The controller runs the Jenkinsfile itself because part of the pipeline identifies what agents to use. Once the pipeline has specified a specific node and that node is available, then the remainder of the job is processed on the agent itself.
we have the same issue on our Jenkins - 2.414.3 , when the load is high - more than 200 agents connected + aws instances + k8s pods , I can see that the controller holds a list of executers , although
the configuration on it is 0 executers - no pipeline should run on it …
In my experience, most memory issues boil down to not cleaning up old jobs ( Jenkins really tries to read all job results it can find into memory). So make sure each jobs configures discarding old jobs in a sane way.
I would say that over 90% of pipeline code runs on the controller and only when it comes to executing things on the agent then something happens on the agent.
Lets look at a simple pipeline (scripted)
node('myagent') {
echo "start of node"
dir('test') {
echo "inside test directory"
withEnv(['myvar=value']) {
sh '''
echo $myvar
'''
}
}
}
the first time, that code is running on the agent is the sh step, the node step alone is not yet creating the workspace and also the dir step isn’t creating the workspace.
Only the sh step will create a temporary file on the agent containing the script and then it will call the script. There is some code now running decoupled from the agent process (to ensure that a restart doesn’t kill the process), and some code inside the agent process that will forward log output to the controller and reap the status once the sh step finished.
All the echo steps are run on the controller. It would only be unnecessary overhead to first transfer something to the agent java process just to print something to the log.
This is not different from freestyle jobs where also most code is running in the controller and only when it comes to actually executing an external process or collecting files from the workspace that there is code executed on the agent.
The difference between freestyle and pipeline as that a freestyle job always consumes an executor while a pipeline job only while inside a node step.
pipeline jobs create flyweight executors on the controller that sometimes are listed on the controller but those do not consume any of the controllers explicit executors and appear without the executor number. This is normal and expected behaviour.
We are facing the same issue. We don’t have much Jobs history and Jenkins is running on the physical machine. But suddenly after 4-5 days we notice the CPU is completely utilized and controller (built-in) is holding 300-400 pipeline that is either completed or scheduled.
Please help if anyone found the solution.
I would suspect some misbehaving plugin that is not properly releasing the pipeline run so that it stays in a state where it is completed but not yet finalized.
That can only be analyzed with the help of a stacktrace.
Thank you for reply Markus. We took the thread dump during that time.
We are using Jenkins controller - 2.452.2
JAVA - 21.0.3+9-Ubuntu-1ubuntu120.04.1
Server OS - Ubuntu:20.04
Following are few blocked threads :
“Reference Handler” #118 [1601178] daemon prio=10 os_prio=0 cpu=2.75ms elapsed=5.42s tid=0x00007efe00132860 nid=1601178 waiting on condition [0x00007efa3330e000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@21.0.3/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@21.0.3/Reference.java:246)
at java.lang.ref.Reference$ReferenceHandler.run(java.base@21.0.3/Reference.java:208)
“Finalizer” #119 [1601179] daemon prio=8 os_prio=0 cpu=0.26ms elapsed=5.42s tid=0x00007efe00133ee0 nid=1601179 in Object.wait() [0x00007efa3320d000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait0(java.base@21.0.3/Native Method)
- waiting on
at java.lang.Object.wait(java.base@21.0.3/Object.java:366)
at java.lang.Object.wait(java.base@21.0.3/Object.java:339)
at java.lang.ref.NativeReferenceQueue.await(java.base@21.0.3/NativeReferenceQueue.java:48)
at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.3/ReferenceQueue.java:158)
at java.lang.ref.NativeReferenceQueue.remove(java.base@21.0.3/NativeReferenceQueue.java:89)
- locked <0x00004000208a9c20> (a java.lang.ref.NativeReferenceQueue$Lock)
at java.lang.ref.Finalizer$FinalizerThread.run(java.base@21.0.3/Finalizer.java:173)
“Common-Cleaner” #144 [1601186] daemon prio=8 os_prio=0 cpu=1.68ms elapsed=5.41s tid=0x00007efe00168760 nid=1601186 waiting on condition [0x00007efa32a35000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.3/Native Method)
- parking to wait for <0x000040002091b268> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.3/LockSupport.java:269)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21.0.3/AbstractQueuedSynchronizer.java:1847)
at java.lang.ref.ReferenceQueue.await(java.base@21.0.3/ReferenceQueue.java:71)
at java.lang.ref.ReferenceQueue.remove0(java.base@21.0.3/ReferenceQueue.java:143)
“org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#38026]” #1417807 [3375223] daemon prio=5 os_prio=0 cpu=20.28ms elapsed=18.30s tid=0x00007f4ef414a9c0 nid=3375223 waiting on condition [0x00007f55dc5c4000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.3/Native Method)
- parking to wait for <0x00004009dea05eb8> (a java.util.concurrent.SynchronousQueue$Transferer)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.3/LockSupport.java:410)
at java.util.concurrent.LinkedTransferQueue$DualNode.await(java.base@21.0.3/LinkedTransferQueue.java:452)
at java.util.concurrent.SynchronousQueue$Transferer.xferLifo(java.base@21.0.3/SynchronousQueue.java:194)
at java.util.concurrent.SynchronousQueue.xfer(java.base@21.0.3/SynchronousQueue.java:233)
at java.util.concurrent.SynchronousQueue.poll(java.base@21.0.3/SynchronousQueue.java:336)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@21.0.3/ThreadPoolExecutor.java:1069)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.3/ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.3/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith(java.base@21.0.3/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.3/Thread.java:1583)
“org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#38025]” #1417806 [3375212] daemon prio=5 os_prio=0 cpu=65.43ms elapsed=18.39s tid=0x00007f5354e29310 nid=3375212 waiting on condition [0x00007f56de7e6000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.3/Native Method)
- parking to wait for <0x00004009dea05eb8> (a java.util.concurrent.SynchronousQueue$Transferer)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.3/LockSupport.java:410)
at java.util.concurrent.LinkedTransferQueue$DualNode.await(java.base@21.0.3/LinkedTransferQueue.java:452)
at java.util.concurrent.SynchronousQueue$Transferer.xferLifo(java.base@21.0.3/SynchronousQueue.java:194)
at java.util.concurrent.SynchronousQueue.xfer(java.base@21.0.3/SynchronousQueue.java:233)
at java.util.concurrent.SynchronousQueue.poll(java.base@21.0.3/SynchronousQueue.java:336)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@21.0.3/ThreadPoolExecutor.java:1069)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.3/ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.3/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith(java.base@21.0.3/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.3/Thread.java:1583)
“org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#38026]” #1417807 [3375223] daemon prio=5 os_prio=0 cpu=20.28ms elapsed=18.30s tid=0x00007f4ef414a9c0 nid=3375223 waiting on condition [0x00007f55dc5c4000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.3/Native Method)
- parking to wait for <0x00004009dea05eb8> (a java.util.concurrent.SynchronousQueue$Transferer)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21.0.3/LockSupport.java:410)
at java.util.concurrent.LinkedTransferQueue$DualNode.await(java.base@21.0.3/LinkedTransferQueue.java:452)
at java.util.concurrent.SynchronousQueue$Transferer.xferLifo(java.base@21.0.3/SynchronousQueue.java:194)
at java.util.concurrent.SynchronousQueue.xfer(java.base@21.0.3/SynchronousQueue.java:233)
at java.util.concurrent.SynchronousQueue.poll(java.base@21.0.3/SynchronousQueue.java:336)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@21.0.3/ThreadPoolExecutor.java:1069)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.3/ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.3/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith(java.base@21.0.3/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.3/Thread.java:1583)