Jenkins Master <----> Worker Docker Swarm Communication

Jenkins setup:

  • Current version: 2.440.2-lts
  • EC2 instances as part of ASG in AWS
  • Docker Swarm

Please help! We recently tried to upgrade the Jenkins controller version to 2.479.1, but there were compatibility issues with the worker nodes, which only were using JDK11(EC2 instances in the same VPC). As a result, we tried to roll back the changes using a previous snapshot and accidentally killed our whole Docker service stack, although the ec2 workers do still exist. We also had to replace the old controller nodes with new ones during the process of restoring from the snapshots. We are now at the point where the Jenkins controller UI comes up and is accessible, but any jobs get stick in the build queue, as it seems like the controller and workers nodes cannot communicate. It seems like the issue could be with the SSL certs, which are new. Has anybody ever had luck getting like emergency support? Or has anybody experienced a similar issue before and have any tips? Thanks in advance!

For info - all of the jobs get stuck in the build queue, saying that:
*Docker swarm agent for building part of » st-web-portal » PR-69 #2* *This agent is offline because Jenkins failed to launch the agent process on it.*

and the logs seem to indicate an issue with the SSL handshake:
[4:06:21 PM] Creating Service with Name : agt-prod-1208 java.net.SocketException: Broken pipe at java.base/sun.nio.ch.NioSocketImpl.implWrite(Unknown Source) at java.base/sun.nio.ch.NioSocketImpl.write(Unknown Source) at java.base/sun.nio.ch.NioSocketImpl$2.write(Unknown Source) at java.base/java.net.Socket$SocketOutputStream.write(Unknown Source) at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeChangeCipherSpec(Unknown Source) at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(Unknown Source) at java.base/sun.security.ssl.ChangeCipherSpec$T10ChangeCipherSpecProducer.produce(Unknown Source) at java.base/sun.security.ssl.Finished$T12FinishedProducer.onProduceFinished(Unknown Source) at java.base/sun.security.ssl.Finished$T12FinishedProducer.produce(Unknown Source) at java.base/sun.security.ssl.SSLHandshake.produce(Unknown Source) at java.base/sun.security.ssl.ServerHelloDone$ServerHelloDoneConsumer.consume(Unknown Source) at java.base/sun.security.ssl.SSLHandshake.consume(Unknown Source) at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source) at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source) at java.base/sun.security.ssl.TransportContext.dispatch(Unknown Source) at java.base/sun.security.ssl.SSLTransport.decode(Unknown Source) at java.base/sun.security.ssl.SSLSocketImpl.decode(Unknown Source) at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(Unknown Source)

It looks like you’re dealing with some SSL/TLS handshake issues between your Jenkins controller and the worker nodes. :thinking:
Here are some laid-back steps you could try to sort things out:

  1. Check SSL Certificates: Double-check that the SSL certificates on your new controller nodes are set up correctly and that the worker nodes trust them. It’s key for smooth communication.
  2. Update Java Keystore: You might need to import the new SSL certificates into the Java keystore on your worker nodes. This helps ensure everything is recognized and good to go.
  3. Network Connectivity: Take a moment to confirm there aren’t any network glitches affecting the connection between your controller and worker nodes. Sometimes the fix is simpler than it seems!
  4. Jenkins Configuration: Last but not least, make sure the Jenkins setup on your new controller nodes is geared up correctly to chat with the worker nodes. A small misconfiguration can sometimes throw a wrench in the works.

Give these a go and see if they help clear up your SSL/TLS handshake troubles. I hope this helps you get everything running smoothly again!