using jenkins master/controller as active and standby

Hi,

I am newbee in Jenkins. I am trying to make Jenkins master high availability i.e. Active and Standby mode in GKE cluster.

I have a load balancer on top of 2 Jenkins master(as deployment) connected to persistent file system across nodes and running a pipe-line running on slave connected to a master(say master1).

When running pipeline, I delete master1 pod, then the pipeline execution is paused and then it is failed. Ideally, pipeline should be continued by re-establishing connection with master2 i.e. it should be highly available.

When I deleted master1 pod, then following error observed on console output of pipeline executing on slave connected to master1 :

Waiting to resume part of slave1-pipeline #5:slave1-jvz6z’ is offline

Waiting to resume part of slave1-pipeline #5

Resuming build at Tue Jan 25 12:16:12 UTC 2022 after Jenkins restart

Waiting to resume part of slave1-pipeline #5: Finished waiting

Waiting to resume part of slave1-pipeline #5:slave1-jvz6z’ is offline

Waiting to resume part of slave1-pipeline #5:slave1-jvz6z’ is offline

Waiting to resume part of slave1-pipeline #5:slave1-jvz6z’ is offline

ees.groovy.cps.impl.SequenceBlock$ContinuationImpl.exp2

in object com.cloudbees.groovy.cps.impl.SequenceBlock$ContinuationImpl@5c43a50 …………………

Caused: java.io.IOException: Stale file handle

at java.base/java.io.FileInputStream.available0(Native Method)

at java.base/java.io.FileInputStream.available(FileInputStream.java:330) ………………………

Caused: java.io.IOException: Failed to load build state

at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:865) …………………………….

Finished: FAILURE

So, my questions are:

Why pipeline failed, ideally I expect it to continue and finish. It looks master1(Active) is not checkpointing/syncing the pipelines states to another master2(Standby).

Can you please provide the documentation/configuration of making one master as Active and another master as Standby. I didn’t find any documentation related to it.

PS: Just to mention, we did experimented with one master and one slave and below is our observation:
we are running pipeline on slave and during execution of pipeline we killed master pod. The execution of pipeline got paused and resumed when the master pod came back after restart. This is kind of high availability behaviour.

Resuming build at Fri Jan 21 09:29:21 UTC 2022 after Jenkins restart

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘slave1-xdnjc’ is offline

Waiting to resume part of pipeline-slave1 #3: ‘Jenkins’ doesn’t have label ‘slave1-xdnjc’

‘slave1-xdnjc’ is offline

Ready to run at Fri Jan 21 09:30:16 UTC 2022

Ideally we are expecting the same behaviour of pipeline with 2 master pods.

Hi there,

As a reminder, the term “slave” to refer to an agent has been deprecated since 2016. Please refer to On Jenkins Terminology Updates for more details. We request you update your post.

Thanks,
Gavin Mogan

Hi guys,
Is there any response on this?
Thanks.

Thanks for pointing out but i don’t find any edit option.