Jenkins Performance Issues

Over the past several months, at least once a week, sometimes more, we are experiencing issues where performance just tanks and web requests to Jenkins take a very long time to complete. This seems to have happened without any configuration or changes to anything installed on the server.
Our version information is below and I have attached a thread dump
threaddump.log (2.2 MB)
:

Jenkins: 2.297
OS: Windows Server 2016 - 10.0
Java: 11.0.4 - Amazon.com Inc. (OpenJDK 64-Bit Server VM)

Parameterized-Remote-Trigger:3.1.5.1
ace-editor:1.1
active-directory:2.24
analysis-model-api:10.2.5
ansicolor:1.0.0
ant:1.11
antisamy-markup-formatter:2.1
apache-httpcomponents-client-4-api:4.5.13-1.0
authentication-tokens:1.4
aws-credentials:1.29
aws-java-sdk:1.11.995
blueocean:1.24.7
blueocean-autofavorite:1.2.4
blueocean-bitbucket-pipeline:1.24.7
blueocean-commons:1.24.7
blueocean-config:1.24.7
blueocean-core-js:1.24.7
blueocean-dashboard:1.24.7
blueocean-display-url:2.4.1
blueocean-events:1.24.7
blueocean-git-pipeline:1.24.7
blueocean-github-pipeline:1.24.7
blueocean-i18n:1.24.7
blueocean-jira:1.24.7
blueocean-jwt:1.24.7
blueocean-personalization:1.24.7
blueocean-pipeline-api-impl:1.24.7
blueocean-pipeline-editor:1.24.7
blueocean-pipeline-scm-api:1.24.7
blueocean-rest:1.24.7
blueocean-rest-impl:1.24.7
blueocean-web:1.24.7
bootstrap4-api:4.6.0-3
bootstrap5-api:5.0.1-2
bouncycastle-api:2.20
branch-api:2.6.4
build-name-setter:2.2.0
build-timeout:1.20
build-token-root:1.7
build-user-vars-plugin:1.7
caffeine-api:2.9.1-23.v51c4e2c879c8
checks-api:1.7.0
cloudbees-bitbucket-branch-source:2.9.9
cloudbees-disk-usage-simple:0.10
cloudbees-folder:6.15
codedeploy:1.23
command-launcher:1.6
configuration-as-code:1.51
copyartifact:1.46.1
credentials:2.5
credentials-binding:1.25
data-tables-api:1.10.25-1
display-url-api:2.3.5
docker-commons:1.17
docker-workflow:1.26
doktor:0.4.1
durable-task:1.37
echarts-api:5.1.2-2
email-ext:2.83
emailext-template:1.2
extended-choice-parameter:0.82
extended-read-permission:3.2
external-monitor-job:1.7
favorite:2.3.3
font-awesome-api:5.15.3-3
forensics-api:1.1.0
git:4.7.2
git-client:3.7.2
git-server:1.9
github:1.33.1
github-api:1.123
github-branch-source:2.11.1
gradle:1.36
handlebars:3.0.8
handy-uri-templates-2-api:2.1.8-1.0
htmlpublisher:1.25
http_request:1.9.0
icon-shim:3.0.0
jackson2-api:2.12.3
javadoc:1.6
jaxb:2.3.0.1
jdk-tool:1.5
jenkins-design-language:1.24.7
jira:3.3
jira-steps:1.6.0
jjwt-api:0.11.2-9.c8b45b8bb173
job-dsl:1.77
job-import-plugin:3.4
jquery:1.12.4-1
jquery-detached:1.2.1
jquery3-api:3.6.0-1
jsch:0.1.55.2
junit:1.50
ldap:2.7
lockable-resources:2.11
mailer:1.34
mapdb-api:1.0.9.0
matrix-auth:2.6.7
matrix-project:1.19
maven-plugin:3.11
mercurial:2.15
metrics:4.0.2.8
momentjs:1.1.1
monitoring:1.87.0
okhttp-api:3.14.9
pam-auth:1.6
permissive-script-security:0.6
pipeline-aws:1.43
pipeline-build-step:2.13
pipeline-github-lib:1.0
pipeline-graph-analysis:1.11
pipeline-input-step:2.12
pipeline-milestone-step:1.3.2
pipeline-model-api:1.8.5
pipeline-model-definition:1.8.5
pipeline-model-extensions:1.8.5
pipeline-rest-api:2.19
pipeline-stage-step:2.5
pipeline-stage-tags-metadata:1.8.5
pipeline-stage-view:2.19
pipeline-timeline:1.0.3
pipeline-utility-steps:2.8.0
plain-credentials:1.7
plugin-util-api:2.3.0
popper-api:1.16.1-2
popper2-api:2.5.4-2
powershell:1.5
prometheus:2.0.10
promoted-builds:3.10
pubsub-light:1.15
rebuild:1.32
resource-disposer:0.16
role-strategy:3.1.1
s3:0.11.7
saml:2.0.6
schedule-build:0.5.1
scm-api:2.6.4
script-security:1.77
show-build-parameters:1.0
simple-theme-plugin:0.6
snakeyaml-api:1.29.1
sse-gateway:1.24
ssh-credentials:1.19
ssh-slaves:1.32.0
ssh-steps:2.0.0
sshd:3.0.3
structs:1.23
subversion:2.14.3
template-project:1.5.2
terraform:1.0.10
timestamper:1.13
token-macro:2.15
trilead-api:1.0.13
variant:1.4
versionnumber:1.9
warnings-ng:9.2.0
windows-slaves:1.8
workflow-aggregator:2.6
workflow-api:2.44
workflow-basic-steps:2.23
workflow-cps:2.92
workflow-cps-global-lib:2.19
workflow-durable-task-step:2.39
workflow-job:2.41
workflow-multibranch:2.24
workflow-scm-step:2.12
workflow-step-api:2.23
workflow-support:3.8
ws-cleanup:0.39

Java 11.0.4 is a very old patch level of Java 11. You want Java 11.0.18 (released 3 months ago) or Java 11.0.19 (will be released this week per the Java site). I don’t know if that will help with your performance issue, but it is a healthy improvement to do in any case.

Thank you for the suggestion, we have a few other Jenkins instances set up similar to this one that are using the same Java version without this issue, so I also am not sure that would fix our problem. That being said, I will make note of that for when we go through our next round of upgrades.

So I understand you’re on Windows.
How long does the “slowness” persist ?
Do you have enough time to gather more info from he system during that time ?

On Unix I would gather:

  • the load on the system
  • the top processes contributing to this load
  • the swaping/paging state of the system
  • memory usage, again by top “contributors”

From that top-contributors you can distill a list of the main culprits.
(Ok, if it is “java” you still might not know for sure, who is using it)

HTH

Martin

This is a production system that we do not want to leave in this state for too long, so we are usually fairly quick to restart Jenkins on it. That being said, the first couple times we had let it go a little longer and there has not been a time where it has resolved itself without us restarting Jenkins.

We have also looked at other things running on the server and nothing has really stood out to us.

We attempted an upgrade of Java, Jenkins, and our plugins this evening. When running a couple test jobs I almost immediately encountered timeouts on the new version and decided to back out of the upgrade. I did grab a couple thread dumps from when we were running on the new version, which was 2.402.
threaddump3.log (154.7 KB)
threaddump4.log (115.2 KB)

We had disabled antivirus/security software on our server to rule that out and have encountered the issue again. I have another thread dump as well as some data we were able to gather from JMX. It seems like something is causing things to lock up so the thread pool gets exhausted and new requests get stuck in the queue.
threaddump.log (2.3 MB)

So it turns out that we did not have a build discarder configured on an extremely active job that’s been around for a few years, so we had ~273k builds sitting out there.

Once we purged all the builds on disk for this job we have been stable since.

(We also set up a global build discarder on the controller)