Agents not able connect to controller (already connected agents are fine)

Hello,

Suddenly, our agents are not able to connect to the controller, but agents that are already connected are working fine. I do not want to restart the controller before I will find out where is the issue, since if restart will not solve the issue, currently connected agents will be also disconnected and this would have a massive impact.

I was trying to google the answer for last couple of days, but with no success. Any idea where can be the problem is welcomed. Here is output from when I try to run an agent:

C:\windows\system32>java -jar E:\Jenkins\agent.jar -jnlpUrl http://XXXXXXXXXXXXX/jenkins-agent.jnlp -secret XXXXXXXXXXXXX
Jun 02, 2023 4:21:12 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: techprod-pv
Jun 02, 2023 4:21:12 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.14
Jun 02, 2023 4:21:12 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: C:\Users\XXXXXXXXXXXXX.jenkins\cache\jars
Jun 02, 2023 4:21:13 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://XXXXXXXXXXXXX/]
Jun 02, 2023 4:21:13 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Jun 02, 2023 4:21:13 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
Jun 02, 2023 4:21:13 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: XXXXXXXXXXXXX
Agent port: XXXXXXXXXXXXX
Identity: XXXXXXXXXXXXX
Jun 02, 2023 4:21:13 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 02, 2023 4:21:13 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to XXXXXXXXXXXXX:XXXXXXXXXXXXX
Jun 02, 2023 4:21:25 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to XXXXXXXXXXXXX:XXXXXXXXXXXXX (retrying:2)
java.io.IOException: Failed to connect to XXXXXXXXXXXXX:XXXXXXXXXXXXX
at org.jenkinsci.remoting.engine.JnlpAgentEndpoint.open(JnlpAgentEndpoint.java:248)
at hudson.remoting.Engine.connectTcp(Engine.java:882)
at hudson.remoting.Engine.innerRun(Engine.java:766)
at hudson.remoting.Engine.run(Engine.java:539)
Caused by: java.net.ConnectException: Connection refused: connect
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:482)
at sun.nio.ch.Net.connect(Net.java:474)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)
at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
at org.jenkinsci.remoting.engine.JnlpAgentEndpoint.open(JnlpAgentEndpoint.java:206)
… 3 more

You are certain the agents are all configured the same? Also, if you disconnect an agent that is currently connected, can it reconnect after?

Hello Alex,

Thank you for engaging in this thread.

You are certain the agents are all configured the same?
Do you mean node configuration via Jenkins web interface? We have different configuration for different agent groups (based on their purpose), but all of the configuration types seems to be affected.

If you disconnect an agent that is currently connected, can it reconnect after?
I can’t test it on all agents, since some of them are running critical services/jobs, but I disconnected few agents and they were not able to reconnect, returning same output as the one I shared.


So I just tried to connect some agents, and seems that the problem went away itself. Or maybe some of my colleagues did something? Anyway if I will find out what was wrong, I will update this thread.

Ok, so seems that restart of the controller was a reason that it started working again.