Hey Experts,
We are using
Jenkins: 2.426.3
Java: 11.0.23
Tomcat Apache: 9.0.85
We have seen our Jenkins randomly loose JNLP connections due to
2024-08-07 13:28:20.453
07-Aug-2024 07:58:20.453 SEVERE [TCP agent listener port=50000] hudson.TcpSlaveAgentListener.run Failed to accept TCP connections
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:803)
at hudson.TcpSlaveAgentListener.run(TcpSlaveAgentListener.java:188)
Our JVM Opts are currently set to
export JAVA_OPTS="-DJENKINS_HOME=$JENKINS_HOME \
-XX:ReservedCodeCacheSize=1024m -XX:+UseCodeCacheFlushing -Xms$MIN_HEAP_SIZE -Xmx$MAX_HEAP_SIZE \
-XX:+UseG1GC -XX:G1ReservePercent=20 \
-Xlog:gc:/usr/local/tomcat/logs/jenkins.gc-%t.log \
-Xlog:gc \
-Xlog:age*=debug \
-Dhudson.ClassicPluginStrategy.useAntClassLoader=true \
-Dkubernetes.websocket.ping.interval=20000 \
-Dhudson.slaves.SlaveComputer.allowUnsupportedRemotingVersions=true \
-Djava.awt.headless=true -Dhudson.slaves.ChannelPinger.pingIntervalSeconds=300"
export JAVA_OPTS="$JAVA_OPTS -Dhudson.model.DirectoryBrowserSupport.CSP=\"default-src 'none' netdna.bootstrapcdn.com; img-src 'self' 'unsafe-inline' data:; style-src 'self' 'unsafe-inline' https://www.google.com ajax.googleapis.com netdna.bootstrapcdn.com; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://www.google.com ajax.googleapis.com netdna.bootstrapcdn.com cdnjs.cloudflare.com; child-src 'self';\" -Dcom.cloudbees.hudson.plugins.folder.computed.ThrottleComputationQueueTaskDispatcher.LIMIT=100"
with Max Heap Size = 200GB & Min Heap Size = ~12GB
We are using a OL8 VM with 80 OCPU(s) and 1280GB Memory.
Our Jenkins ulimts
are set to
cat >>/etc/security/limits.conf <<EOF
<user> soft nofile 16384
<user> hard nofile 65536
<user> soft nproc 16384
<user> hard nproc 32768
EOF
Also, attaching the threadDump collected
jstack_2024-08-07_699666.log (1.2 MB)
Dynamically seeing
Threads on xxx-jenkins@<IP>: Number = 522, Maximum = 1,317, Total started = 860,454
We have good number of threads either on WAITING
or TIMED_WAITING
Can someone assist is analysing the issue further/steps that can be taken to avoid this issue in the future.
Regards