Jenkins huge CPU utilization provokes slowness

Hi Guys,

I have noticed some chokes = slowness on single node jenkins instance. I believe problem is related to inappropriate JVM params.
Instance is a bit of legacy running gui [Jenkins 2.303.2].
On prem server consists of 128GB RAM. CPU 64 . Instance has got 1000+ projects

openjdk version “1.8.0_161”
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

Quite recently “top” %CPU was reaching 6k during peak hours. I have swapped XMS quite recently to 32GB, because could see in the GC log "Full GC and Allocation Failure - heap already fully expanded once was XMS-16G XMX-32G)

I am also reviewing possible options

JENKINS_JAVA_OPTIONS="
-DSoftKillWaitSeconds=0
-Djava.awt.headless=true
-Djenkins.install.runSetupWizard=false
-DoctaneAllowedStorage=/apps/jenkins/
-Djava.io.tmpdir=/apps/jenkins/bin/temp
-Dhudson.model.DirectoryBrowserSupport.CSP=""
-Xms16g -Xmx32g -Xss1m -server
-XX:+UseG1GC
-XX:+ExplicitGCInvokesConcurrent
-XX:+ParallelRefProcEnabled
-XX:+UseStringDeduplication
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=40
-XX:+UnlockDiagnosticVMOptions
-XX:G1SummarizeRSetStatsPeriod=1
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=true
-Dcom.sun.management.jmxremote.password.file=/xx/jmxremote.password
-Dcom.sun.management.jmxremote.access.file=/xx/jmxremote.access
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=xxx
-Xloggc:/xx/gc-%t.log
-XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation
-XX:GCLogFileSize=20m
-XX:+PrintGC
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-XX:+PrintGCCause
-XX:+PrintTenuringDistribution
-XX:+PrintReferenceGC
-XX:+PrintAdaptiveSizePolicy
-Dhudson.upstreamCulprits=true
-Ddevops_pipeline_jenkins"

usually the recommended solution is to trigger a thread dump and take a look at the expert. Not many forum users have knowledge and patience for this advanced topic.

I know people have had luck with cloudbees support for this as they have tooling they can use to help analyze threaddumps, but its not something i’ve ever done perosnally.

jstack is not available for me, however, thread dump is not appearing.

[GC pause (G1 Evacuation Pause) (young)
Desired survivor size 864026624 bytes, new threshold 15 (max 15)
5907.453: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 291934, predicted base time: 375.32 ms, remaining time: 0.00 ms, target pause time: 200.00 ms]
5907.453: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: 0 regions, predicted young region time: 0.26 ms]
5907.453: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 375.57 ms, target pause time: 200.00 ms]
5907.454: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 114064 bytes]
5907.454: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 114064 bytes, attempted expansion amount: 16777216 bytes]
5907.454: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
2022-05-17T10:48:04.264+0000: 5907.509: [SoftReference, 0 refs, 0.0069123 secs]2022-05-17T10:48:04.271+0000: 5907.516: [WeakReference, 0 refs, 0.0039032 secs]2022-05-17T10:48:04.274+0000: 5907.520: [FinalReference, 0 refs, 0.0043186 secs]2022-05-17T10:48:04.279+0000: 5907.524: [Phan
(to-space exhausted), 0.1000980 secs]
[Parallel Time: 51.9 ms, GC Workers: 43]
[GC Worker Start (ms): Min: 5907453.4, Avg: 5907454.1, Max: 5907454.9, Diff: 1.5]
[Ext Root Scanning (ms): Min: 3.1, Avg: 5.3, Max: 48.5, Diff: 45.5, Sum: 228.3]
[Update RS (ms): Min: 0.0, Avg: 42.9, Max: 44.8, Diff: 44.8, Sum: 1845.4]
[Processed Buffers: Min: 0, Avg: 27.2, Max: 44, Diff: 44, Sum: 1168]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]

  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
  [Object Copy (ms): Min: 0.3, Avg: 1.0, Max: 1.6, Diff: 1.3, Sum: 44.8]
  [Termination (ms): Min: 0.0, Avg: 0.4, Max: 0.9, Diff: 0.9, Sum: 18.6]
     [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 43]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum: 5.5]
  [GC Worker Total (ms): Min: 49.0, Avg: 49.8, Max: 50.8, Diff: 1.7, Sum: 2143.5]
  [GC Worker End (ms): Min: 5907503.8, Avg: 5907503.9, Max: 5907504.2, Diff: 0.4]

[Code Root Fixup: 0.3 ms]

[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 11.5 ms, GC Workers: 43]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 9.8, Avg: 10.3, Max: 10.7, Diff: 0.9, Sum: 440.8]
[Clear CT: 2.3 ms]
[Other: 34.2 ms]
[Evacuation Failure: 1.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 26.4 ms]
[Ref Enq: 2.3 ms]
[Redirty Cards: 1.8 ms]
[Humongous Register: 0.3 ms]
[Humongous Reclaim: 0.2 ms]
[Free CSet: 0.1 ms]
[Eden: 16.0M(12.8G)->0.0B(12.8G) Survivors: 0.0B->0.0B Heap: 31.6G(32.0G)->31.6G(32.0G)]
Heap after GC invocations=1366 (full 68):
garbage-first heap total 33554432K, used 33158031K [0x00007f99d8000000, 0x00007f99d9004000, 0x00007fa1d8000000)
region size 16384K, 0 young (0K), 0 survivors (0K)
Metaspace used 392628K, capacity 423348K, committed 480728K, reserved 481280K
}
[Times: user=2.67 sys=0.00, real=0.10 secs]