How to terminate orphan Kubernetes Agent Pods?

I’m using Jenkins 2.361.4 with the kubernetes-plugin 3734.v562b_b_a_627ea_c and the remoting agent jar version 4.13 in a Kubernetes. So far everything works as expected. The Jenkins can spin up pods and the pods can connect to the Jenkins. Job execution works fine as well.

If I now restart the Jenkins itself, the already existing agent pods can no longer connect to the Jenkins. This is expected, since the Jenkins did not create these pods. Previously (maybe when I was still using version 4.2 of the agent jar) these orphan pods did fail after some time and a clean up of error-state pods was enough to get rid of them. But now, those pods keep trying to reconnect forever.

This is part of the logs I get every 10 seconds:

Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.namespace.svc:8080/]
Dec 06, 2022 1:56:58 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Dec 06, 2022 1:56:58 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: jenkins-jnlp.namespace.svc
  Agent port:    50000
  Identity:      xxxxxxxx  
Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins-jnlp.namespace.svc:50000
Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Dec 06, 2022 1:56:58 PM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run
INFO: Waiting for ProtocolStack to start.
Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: xxxxxxx
Dec 06, 2022 1:56:58 PM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv
INFO: [JNLP4-connect connection to jenkins-jnlp.namespace.svc/SOME_IP:50000] Local headers refused by remote: Unknown client name: jenkins-c54645fdd-h5xqh-agent-3rwn2
Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Protocol JNLP4-connect encountered an unexpected exception    
java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: jenkins-c54645fdd-h5xqh-agent-3rwn2
    at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
    at hudson.remoting.Engine.innerRun(Engine.java:787)
    at hudson.remoting.Engine.run(Engine.java:539)
Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: jenkins-c54645fdd-h5xqh-agent-3rwn2
    at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:380)
    at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:435)
    at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
    at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289)
    at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:168)
    at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
    at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:155)
    at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:51)
    at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:257)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:121) 
    at java.base/java.lang.Thread.run(Thread.java:829)
    Suppressed: java.nio.channels.ClosedChannelException
        ... 7 more

Dec 06, 2022 1:56:58 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: reconnect rejected, sleeping 10s:
java.lang.Exception: The server rejected the connection: None of the protocols were accepted
    at hudson.remoting.Engine.onConnectionRejected(Engine.java:866)
     at hudson.remoting.Engine.innerRun(Engine.java:813)
     at hudson.remoting.Engine.run(Engine.java:539)

Anyone any thoughts on that? How does one terminate orphan Kubernetes agent pods?

Thanks!

1 Like

This seems to be even worse than what I have observed: after Jenkins restart with some Kubernetes agents running, Jenkins re-creates lots of pods that cannot reconnect. I wrote a shell script to kill all Kubernetes computers via Jenkins API, but it looks like this here would have needed to be done on the Kubernetes side?