Hi, we are have a fairly large Jenkins, but we have annoying issue where when we click start a task it jus freezes for 10 minutes or throws 504 error.
The problem is that we cannot find the root cause of this behaviour, nothing in Jenkins logs, there are a lot of resources of CPU, Mem, Disk, IOPS nothing hits the limits.
We have tried Jenkins-prometheus monitoring, but at the same time when there is that lag this plugin also stops sending any metrics.
The type of issue you’re experiencing, where Jenkins tasks freeze or throw 504 errors without any apparent resource bottleneck, can be challenging to debug. Could you let us know your operating system, Java version and vendor, and Jenkins version?
Here are a few areas to explore:
Thread Blocking or Deadlocks:
If Jenkins threads are blocked or waiting on locks, it can lead to UI freezes or task initiation delays.
Garbage Collection (GC) Issues:
Long GC pauses can halt Jenkins, especially if the JVM heap is not properly tuned.
Plugin Issues:
Misbehaving or outdated plugins can cause delays, particularly during task execution.
Network or Reverse Proxy Timeouts:
If Jenkins is operating behind a proxy (like NGINX, HAProxy, or Apache), misconfigured timeouts might cause 504 errors during prolonged requests.
Too Many Concurrent Jobs:
A large number of builds or excessive job queue processing can overload Jenkins’ internal task scheduler.
Controller-Agent Communication Issues:
Problems in communication between the controller and agents can cause tasks to hang.
We are constantly upgrading Jenkins and plugins ( we have over 200 of them ).
We are using AWS ALB as a load balancer, but I’m not sure which settings could be tuned if it causes the issue.
Also we have a lot of agents runnings in different AWS accounts and regions, that could cause some delays between controller and slaves, but all of them are in AWS so quite reliable connection.
Basically all UI of Jenkins is working fine, but it hangs when starting a new Task/Job - e.g nothing happens for like a few minutes and then new task number is being created and basically task starts…