Our Jenkins dev controller is slow on every page for users across the company. We’ve spent ~6 hours debugging and need fresh eyes.
Setup
- Jenkins 2.541.3 (Jetty 12.1.5, Java 21.0.9)
- Docker container, AWS EC2 r5a.2xlarge (8 vCPU / 64 GB), us-east-1
- JENKINS_HOME 177 GB on gp3 EBS (10k IOPS / 1000 MB/s)
- 136 agents (130 online — mostly k8s ephemeral pods + ~30 persistent)
- 130+ MultiBranch projects, 6,470 active branches, 436 top-level items
- Users mostly hit it from Israel (≈150 ms RTT to us-east-1)
Symptom
“Every page feels like 10–30 s to load.” Started ~3 days ago.
What we measured
- Server-side / TTFB: ~700 ms (fine)
- /api/json: 18–22 ms (fine)
- Dashboard / HTML payload: 4 MB uncompressed (~174 KB compressed)
- /computer/api/json: 6–8 s (130+ agents enumerated serially) — fires on every page (sidebar)
- iostat: load avg ~5, disk %util drains between waves
- Thread dump: previously showed concurrent BranchIndexing flooding the controller with fsyncs
What we tried (each verified, none solved it)
- Built-In Node numExecutors 8 → 2 (stopped BranchIndexing storm; confirmed in thread dump)
- Pipeline durability MAX_SURVIVABILITY → PERFORMANCE_OPTIMIZED (cut per-pipeline fsync)
- JVM heap -Xms8g → -Xms50g (matches -Xmx, no more G1 region resize)
- MAX_CONCURRENT_BRANCH_INDEXING=2 JVM property
- JavaMelody monitoring filter disabled (-Djavamelody.disabled=true)
- Datadog JMX disabled (was wired with no agent listening)
- GC log enabled (-Xlog:gc*)
- EBS gp3 throughput bumped 500 → 1000 MB/s
- Deleted 3 dead persistent agents (packer-builder, Permanent, etc.)
- Bulk-deleted some MultiBranch branches with last build >14 days
- Enabled HTTP/2 on Jetty (–httpsPort=443 → --http2Port=443) — verified curl --http2 returns version=2
- Tested MCP-server plugin disabled (just to rule out) — actually made /computer/api/json worse, re-enabled
- 3 Jenkins restarts during the session
- Zero plugins installed in last 5 days (ruled out recent plugin regression)
Where we suspect the remaining cost is
- 4 MB HTML payload from “All” view enumerating 436 top-level items → browser parse/paint time
- /computer/api/json 6–8 s blocks the executor-status sidebar AJAX on every nav
- Possible plugin filter overhead we haven’t isolated
What we’d love help with
- Anyone seen /computer/api/json stuck at 6–8 s with ~130 mostly-k8s agents? Any way to speed it up beyond deleting agents?
- Is there a way to render Jenkins’s “All” view without enumerating every MultiBranch project’s children?
- Is the 4 MB HTML payload symptomatic of a known plugin (e.g. Blue Ocean, build-monitor) injecting per-item gadgets?
- Recommended next step — instance upsize, branch retention overhaul, or some other knob we missed?
Happy to share thread dumps, GC logs, or specific URLs of slow requests on request.