We are seeing Jenkins 504 errors

Hi, we are have a fairly large Jenkins, but we have annoying issue where when we click start a task it jus freezes for 10 minutes or throws 504 error.
The problem is that we cannot find the root cause of this behaviour, nothing in Jenkins logs, there are a lot of resources of CPU, Mem, Disk, IOPS nothing hits the limits.
We have tried Jenkins-prometheus monitoring, but at the same time when there is that lag this plugin also stops sending any metrics.

Any advises?

Hello and welcome to the community, @Explas! :wave:

The type of issue you’re experiencing, where Jenkins tasks freeze or throw 504 errors without any apparent resource bottleneck, can be challenging to debug. :person_shrugging: Could you let us know your operating system, Java version and vendor, and Jenkins version?

Here are a few areas to explore:

  1. Thread Blocking or Deadlocks:

    • If Jenkins threads are blocked or waiting on locks, it can lead to UI freezes or task initiation delays.
  2. Garbage Collection (GC) Issues:

    • Long GC pauses can halt Jenkins, especially if the JVM heap is not properly tuned.
  3. Plugin Issues:

    • Misbehaving or outdated plugins can cause delays, particularly during task execution.
  4. Network or Reverse Proxy Timeouts:

    • If Jenkins is operating behind a proxy (like NGINX, HAProxy, or Apache), misconfigured timeouts might cause 504 errors during prolonged requests.
  5. Too Many Concurrent Jobs:

    • A large number of builds or excessive job queue processing can overload Jenkins’ internal task scheduler.
  6. Controller-Agent Communication Issues:

    • Problems in communication between the controller and agents can cause tasks to hang.