Job hang with no apparent reason

Hi there !

I’m out of ideas to understand what is going on in my setup so I come here to ask for some help and maybe clues.

My Jenkins setup is the following:

  • using Java jdk-17.0.4.1 on both machines
  • controller version Jenkins 2.426.1 installed on Windows Server 2019 hosted in Azure datacenter (enterprise network)
  • agent deployed on a desktop pc in the office, running Windows 10 Enterprise version 22H2, on a dedicated VLAN (route is opened between agent and controller), launch method is “Launch agent by connecting it to the controller”

We recently moved to this setup so I only have one main job.
This job is connecting with Bitbucket to retrieve repository content using SSH keys, and it only launches a bunch of Python Pytest tests.

For some reason, some recent builds were hung in the middle of nowhere, or let’s say in the middle of a test (not always the same and different kind of steps).
I was seeing the console output of the build with the three dots moving showing that it is in progress, in the middle of a pytest test, and nothing else was moving… The steps where it was hung was not involving any loop or so that could explain it was blocked in it. The machine targeted by the test was also responsive (I could manually play with it).

I went to the admin console of Jenkins, in the hope to find some internal logs, only found this, which is most probably the trace when I manually canceled the job:

Jan 05, 2024 8:58:28 AM WARNING hudson.Launcher$RemoteLauncher$ProcImpl join
Process hudson.Launcher$RemoteLauncher$ProcImpl@1be13565 has not really finished after the join() method completion
Jan 05, 2024 8:58:28 AM INFO hudson.model.Run execute
WF1/DOD 5940G Functional Tests #22 aborted
java.lang.InterruptedException
	at java.base/java.lang.Object.wait(Native Method)
	at hudson.remoting.Request.call(Request.java:177)
	at hudson.remoting.Channel.call(Channel.java:1002)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
	at jdk.proxy2/jdk.proxy2.$Proxy70.join(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1198)
	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:164)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1895)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:442)

I tried some Googling, found that it could be a matter of stability of the connection between the agent and the controller.

Does anyone have any idea of what could be going on here?

Thanks in advance.
Brice

was the python process still running when you saw that the build was hanging?

Yes it was !
Side note: surprisingly there were two processes, maybe one for python itself and a second one for pytest ?

If the python processes where still running then it is not a problem of Jenkins. Jenkins will wait until the processes finished unless you use a timeout which would kill the processes when the timeout is reached.