I have deployed Jenkins in the Amazon EKS cluster using Helm. It has created one StatefulSet Pod for the master with the name jenkins-latest-0. When I deploy a job, sometimes it is getting killed and the job in the slave pod is getting failed. I tried giving 5CPU and 5Gi RAM as requests to resolve this issue but still, sometimes master is getting failed to make the underlying job in slave pods to fail.
So to make it highly available, I updated stateful sets to 2. But now it is giving me HTTP ERROR 403 No valid crumb was included in the request error and if I refresh the page the job which I deployed is not appearing sometimes [I believe it’s because the new pod jenkins-latest-1 is eventually consistent and NodePort is doing load-balancing between the pods]. How can I make master highly available without having these issues?
I did some digging and found a few solutions, but in the end, I preferred running master on EC2 and deploying jobs on EKS Agent. Hence, since the master was dying every time, since it’s an ec2 instance, master will not break even if there is high resource consumption. But if you are looking for some other solutions, you can refer to some links I came across for the same.
Links: