We have a Jenkins controller installed and configured using the official Jenkins Helm chart on an EKS cluster, with all resource provisioning managed by Terraform.
Currently, it is functioning well and is accessible at https://jenkins-utility.example.com/ using our LDAP credentials. All agent nodes connect to same jenkins endpoint.
However, when the Jenkins controller is restarted either via the URL using safe restart or due to the pod being killed/restarted, I can still access Jenkins using the same URL. Despite this, all agent nodes get disconnected because the Jenkins location URL resets to localhost:8080 instead of https://jenkins-utility.example.com/. Please see the attached image for reference.
Below is the Jenkins Configuration as Code (CasC) configuration we are using:
I have spent many hours troubleshooting this issue but have not found a solution yet. Any assistance or references to resolve this issue would be greatly appreciated.
My only experience with Jcasc is through Docker, and I’ve been bitten several times when I forgot to put the Jcasc file in /usr/share/jenkins/ref, but I don’t know if that applies to “standard” Linux installations.
Adding bit more details if this can help.
As shared in previous thread, our Jenkins controller CasC configuration is located at /var/jenkins_home/casc_configs, with the configuration file named jcasc-default-config.yaml, which I have shared in the issue description. When inspecting the pod, the environment variable is correctly set to the specified path:
The /var/jenkins_home/ directory is mounted on a persistent volume (EBS gp3).
jenkins@jenkins-utility-prod-0:~/casc_configs$ env | grep -i casc_jenkins
CASC_JENKINS_CONFIG=/var/jenkins_home/casc_configs
jenkins@jenkins-utility-prod-0:~/casc_configs$ pwd
/var/jenkins_home/casc_configs
jenkins@jenkins-utility-prod-0:~/casc_configs$ ls -ltrh
total 4.0K
-rw-rw-r-- 1 jenkins jenkins 2.7K Jun 6 14:24 jcasc-default-config.yaml
jenkins@jenkins-utility-prod-0:~/casc_configs$ df -hT /var/jenkins_home/
jenkins@jenkins-utility-prod-0:~/casc_configs$ df -hT /var/jenkins_home/
Filesystem Type Size Used Avail Use% Mounted on
/dev/nvme2n1 ext4 492G 30G 463G 7% /var/jenkins_home
The issue I am facing is that whenever the Jenkins controller restarts, the Jenkins Location URL isn’t updated according to the CasC file. Surprisingly, if I navigate to https://jenkins-utility.example.com/manage/configuration-as-code/ and click on the Reload existing configuration option under Actions, the CasC configuration reloads correctly and reflects properly. However, upon restarting Jenkins, the URL resets to http://localhost:8080.
I guess it should be qualified as tinkering instead of proper method, but whenever I change anything in the Jcasc file, and after restarting Jenkins, I trigger a Jcasc reload thanks to the REST API.
There must be a clean way of doing that.
Thank you @mawinter69@poddingue for your prompt responses and willingness to assist, much appreciated!
Although, the initial suggestions didn’t resolve my issue, I continued my investigation. After exploring and digging JENKINS_HOME i.e. /var/jenkins_home, I noticed the init.groovy.d directory, I discovered groovy script named base.groovy that was resetting the Jenkins Location URL to localhost:8080. By modifying this script, I was able to set the correct base URL, and now everything works as expected.
Thank you so much for your feedback, @altif.
I’m delighted to hear that you found a solution.
Do you have any insight into why this script was written or generated in this particular manner?
@poddingue We have 5 to 6 Jenkins controllers currently running as Docker containers on EC2 instances. These init Groovy scripts are used for the initial setup of our Jenkins controllers, including configuring the admin email, base URL, LDAP setup, user access, and other settings. These scripts are an integral part of our Jenkins controller setup, allowing us to avoid manual changes and updates when provisioning a new Jenkins controller.
Recently, we began migrating all our container-based Jenkins controllers to EKS to leverage Kubernetes capabilities. During the data migration, these scripts were copied over, which caused issues that we didn’t realize until yesterday.