Jenkins UI slow on every page — server fast, HTTP/2 on, 6,470 MultiBranch branches, 130+ k8s agents

Our Jenkins dev controller is slow on every page for users across the company. We’ve spent ~6 hours debugging and need fresh eyes.

Setup

  • Jenkins 2.541.3 (Jetty 12.1.5, Java 21.0.9)
  • Docker container, AWS EC2 r5a.2xlarge (8 vCPU / 64 GB), us-east-1
  • JENKINS_HOME 177 GB on gp3 EBS (10k IOPS / 1000 MB/s)
  • 136 agents (130 online — mostly k8s ephemeral pods + ~30 persistent)
  • 130+ MultiBranch projects, 6,470 active branches, 436 top-level items
  • Users mostly hit it from Israel (≈150 ms RTT to us-east-1)

Symptom

“Every page feels like 10–30 s to load.” Started ~3 days ago.

What we measured

  • Server-side / TTFB: ~700 ms (fine)
  • /api/json: 18–22 ms (fine)
  • Dashboard / HTML payload: 4 MB uncompressed (~174 KB compressed)
  • /computer/api/json: 6–8 s (130+ agents enumerated serially) — fires on every page (sidebar)
  • iostat: load avg ~5, disk %util drains between waves
  • Thread dump: previously showed concurrent BranchIndexing flooding the controller with fsyncs

What we tried (each verified, none solved it)

  • Built-In Node numExecutors 8 → 2 (stopped BranchIndexing storm; confirmed in thread dump)
  • Pipeline durability MAX_SURVIVABILITY → PERFORMANCE_OPTIMIZED (cut per-pipeline fsync)
  • JVM heap -Xms8g → -Xms50g (matches -Xmx, no more G1 region resize)
  • MAX_CONCURRENT_BRANCH_INDEXING=2 JVM property
  • JavaMelody monitoring filter disabled (-Djavamelody.disabled=true)
  • Datadog JMX disabled (was wired with no agent listening)
  • GC log enabled (-Xlog:gc*)
  • EBS gp3 throughput bumped 500 → 1000 MB/s
  • Deleted 3 dead persistent agents (packer-builder, Permanent, etc.)
  • Bulk-deleted some MultiBranch branches with last build >14 days
  • Enabled HTTP/2 on Jetty (–httpsPort=443 → --http2Port=443) — verified curl --http2 returns version=2
  • Tested MCP-server plugin disabled (just to rule out) — actually made /computer/api/json worse, re-enabled
  • 3 Jenkins restarts during the session
  • Zero plugins installed in last 5 days (ruled out recent plugin regression)

Where we suspect the remaining cost is

  • 4 MB HTML payload from “All” view enumerating 436 top-level items → browser parse/paint time
  • /computer/api/json 6–8 s blocks the executor-status sidebar AJAX on every nav
  • Possible plugin filter overhead we haven’t isolated

What we’d love help with

  • Anyone seen /computer/api/json stuck at 6–8 s with ~130 mostly-k8s agents? Any way to speed it up beyond deleting agents?
  • Is there a way to render Jenkins’s “All” view without enumerating every MultiBranch project’s children?
  • Is the 4 MB HTML payload symptomatic of a known plugin (e.g. Blue Ocean, build-monitor) injecting per-item gadgets?
  • Recommended next step — instance upsize, branch retention overhaul, or some other knob we missed?

Happy to share thread dumps, GC logs, or specific URLs of slow requests on request.

1 Like