Hi, we are have a fairly large Jenkins, but we have annoying issue where when we click start a task it jus freezes for 10 minutes or throws 504 error.
The problem is that we cannot find the root cause of this behaviour, nothing in Jenkins logs, there are a lot of resources of CPU, Mem, Disk, IOPS nothing hits the limits.
We have tried Jenkins-prometheus monitoring, but at the same time when there is that lag this plugin also stops sending any metrics.
The type of issue you’re experiencing, where Jenkins tasks freeze or throw 504 errors without any apparent resource bottleneck, can be challenging to debug. Could you let us know your operating system, Java version and vendor, and Jenkins version?
Here are a few areas to explore:
Thread Blocking or Deadlocks:
If Jenkins threads are blocked or waiting on locks, it can lead to UI freezes or task initiation delays.
Garbage Collection (GC) Issues:
Long GC pauses can halt Jenkins, especially if the JVM heap is not properly tuned.
Plugin Issues:
Misbehaving or outdated plugins can cause delays, particularly during task execution.
Network or Reverse Proxy Timeouts:
If Jenkins is operating behind a proxy (like NGINX, HAProxy, or Apache), misconfigured timeouts might cause 504 errors during prolonged requests.
Too Many Concurrent Jobs:
A large number of builds or excessive job queue processing can overload Jenkins’ internal task scheduler.
Controller-Agent Communication Issues:
Problems in communication between the controller and agents can cause tasks to hang.