Parallel pipeline on a defined list of nodes

Hi all,

I’m using a loop like this one to start a parallelStageMap:

def generateTestStage(job_name, nodes_label) {
    return {
        stage(job_name) {
            node(nodes_label) {
                /* stage stuff */
            }
        }
    }
}

pipeline
for(item in nodesByLabel(label: "myNodes", offline: true)) { // nodesByLabel requires "Pipeline Utility Steps" plugin
    stagesMap[ "${item}" ] = generatePrepareStage("${item}")
}

stagesMap += JOB_MAP.collectEntries() { key, val ->
    [ "$key" : generateTestStage(key, val) ]
}

parallel parallelStageMap

The idea is to prepare all available nodes in a label before testing them. Our job is very long (4 hours on 8 nodes in parallel), and sometimes a node may become offline or in maintenance for some reasons.

In this case, I have 2 scenarios:

  • nodesByLabel with offline: false: if a node is offline before job starts, and become online after, generatePrepareStage will not be executed so all later tests maybe not correctly setup
  • nodesByLabel with offline: true: job will wait for offline nodes to become online to finish the job, so a job may take far longer than expected (several days maybe if users did not seen it) nodesByLabel documentation: Pipeline Utility Steps

To solve this, I would like to store a list of online nodes instead of node(nodes_label) in generateTestStage, but I haven’t find a solution to do this.

Is anyone has an idea to do this solution ? Or maybe should I structure my parallelStageMap differently ?

We have another purpose to run a job on a list of nodes: a user may want to run a job on 2 nodes (on 8 total) in order to avoid locking resources all the day.

BR,
Alex.

Jenkins setup:
Jenkins: 2.440.2
OS: Linux - 4.18.0-477.27.1.el8_8.x86_64
Java: 17.0.9 - Red Hat, Inc. (OpenJDK 64-Bit Server VM)

allure-jenkins-plugin:2.31.1
ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
asm-api:9.6-3.v2e1fa_b_338cd7
authentication-tokens:1.53.v1c90fd9191a_b_
bootstrap5-api:5.3.3-1
bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9
branch-api:2.1152.v6f101e97dd77
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.0.2
cloudbees-folder:6.901.vb_4c7a_da_75da_3
command-launcher:107.v773860566e2e
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.11.0-95.v22a_d30ee5d36
credentials:1337.v60b_d7b_c7b_c9f
credentials-binding:657.v2b_19db_7d6e6d
delivery-pipeline-plugin:1.4.2
display-url-api:2.200.vb_9327d658781
durable-task:550.v0930093c4b_a_6
echarts-api:5.5.0-1
external-monitor-job:215.v2e88e894db_f8
favorite:2.208.v91d65b_7792a_c
font-awesome-api:6.5.1-3
git:5.2.1
git-client:4.7.0
gitea:1.4.7
github:1.38.0
github-api:1.318-461.v7a_c09c9fa_d63
github-branch-source:1781.va_153cda_09d1b_
gson-api:2.10.1-15.v0d99f670e0a_7
handy-uri-templates-2-api:2.1.8-30.v7e777411b_148
htmlpublisher:1.33
instance-identity:185.v303dc7c645f9
ionicons-api:56.v1b_1c8c49374e
jackson2-api:2.17.0-379.v02de8ec9f64c
jakarta-activation-api:2.1.3-1
jakarta-mail-api:2.1.3-1
javadoc:243.vb_b_503b_b_45537
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jjwt-api:0.11.5-77.v646c772fddb_0
joda-time-api:2.12.7-29.v5a_b_e3a_82269a_
jquery:1.12.4-1
jquery3-api:3.7.1-2
json-api:20240303-41.v94e11e6de726
json-path-api:2.9.0-58.v62e3e85b_a_655
junit:1259.v65ffcef24a_88
ldap:719.vcb_d039b_77d0d
mailer:472.vf7c289a_4b_420
matrix-auth:3.2.2
matrix-project:822.824.v14451b_c0fd42
metrics:4.2.21-449.v6960d7c54c69
mina-sshd-api-common:2.12.0-90.v9f7fb_9fa_3d3b_
mina-sshd-api-core:2.12.0-90.v9f7fb_9fa_3d3b_
nodelabelparameter:1.12.0
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pam-auth:1.10
parameterized-trigger:787.v665fcf2a_830b_
pipeline-agent-build-history:28.vc1153328e666
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-graph-analysis:216.vfd8b_ece330ca_
pipeline-graph-view:232.vc7ca_8d934725
pipeline-groovy-lib:704.vc58b_8890a_384
pipeline-input-step:491.vb_07d21da_1a_fb_
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2184.v0b_358b_953e69
pipeline-model-definition:2.2184.v0b_358b_953e69
pipeline-model-extensions:2.2184.v0b_358b_953e69
pipeline-rest-api:2.34
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2184.v0b_358b_953e69
pipeline-stage-view:2.35-bpk-001-SNAPSHOT (private-94c4112f-broadpeak)
pipeline-utility-steps:2.16.2
plain-credentials:179.vc5cb_98f6db_38
plugin-util-api:4.1.0
pubsub-light:1.18
resource-disposer:0.23
scm-api:689.v237b_6d3a_ef7f
script-security:1326.vdb_c154de8669
snakeyaml-api:2.2-111.vc6598e30cc65
sse-gateway:1.26
ssh-credentials:326.v7fcb_a_ef6194b_
ssh-slaves:2.948.vb_8050d697fec
sshd:3.322.v159e91f6a_550
structs:337.v1b_04ea_4df7c8
token-macro:400.v35420b_922dcb_
trilead-api:2.142.v748523a_76693
variant:60.v7290fc0eb_b_cd
workflow-aggregator:596.v8c21c963d92d
workflow-api:1291.v51fd2a_625da_7
workflow-basic-steps:1049.v257a_e6b_30fb_d
workflow-cps:3883.vb_3ff2a_e3eea_f
workflow-durable-task-step:1331.vc8c2fed35334
workflow-job:1400.v7fd111b_ec82f
workflow-multibranch:773.vc4fe1378f1d5
workflow-scm-step:427.v4ca_6512e7df1
workflow-step-api:657.v03b_e8115821b_
workflow-support:881.v7663695646cf
ws-cleanup:0.45

Hello @ABouin, and welcome to this community. :wave:

I think you could use the catchError step to handle the scenario when a node goes offline during the execution of the job.
This step should allow the pipeline to continue running even if there is an error in one of the stages.

You could use it in your generateTestStage function to catch any errors that occur when trying to run a stage on an offline node.

Here’s an untested proposal for your generateTestStage function:

def generateTestStage(job_name, nodes_label) {
    return {
        stage(job_name) {
            catchError(buildResult: 'SUCCESS', stageResult: 'FAILURE') {
                node(nodes_label) {
                    /* stage stuff */
                }
            }
        }
    }
}

In this modified function, if a node is offline when the stage tries to run, the catchError step should catch the error and mark the stage as a failure, but the overall build result will still be a success.
This means that the pipeline would continue running on the remaining online nodes. :person_shrugging:

As for running a job on a specific subset of nodes, I think you could modify your loop that generates the stagesMap to only include the nodes you want. :thinking:

You could do this by adding a condition inside the loop that checks if the current node is in the list of nodes you want to run the job on.

def nodesToRunOn = ['node1', 'node2'] // replace with your nodes

for(item in nodesByLabel(label: "myNodes", offline: true)) {
    if (item in nodesToRunOn) {
        stagesMap[ "${item}" ] = generatePrepareStage("${item}")
    }
}

Hi Bruno, thanks for your suggestions !
I was losing faith to get help on this point ^^

Here’s an untested proposal for your generateTestStage function:
[…]
In this modified function, if a node is offline when the stage tries to run, the catchError step should catch the error and mark the stage as a failure, but the overall build result will still be a success.

In this suggestion, if a test tries to execute on an offline node, it will fail in all cases, even if it should have been successful on an online node. I fear it will create a loop that evacuates all remaining test and set them as failures. So I’m not sure it will solve our problem ^^

As for running a job on a specific subset of nodes, I think you could modify your loop that generates the stagesMap to only include the nodes you want. :thinking:
You could do this by adding a condition inside the loop that checks if the current node is in the list of nodes you want to run the job on.

Ok, this idea looks pretty good. I still have a concern: I guessed jenkins pipeline works as a queue that pop sequentially test on whatever node is online and free. This way, running duration is quite balanced between all nodes. (fix me, I may be wrong !)
With your idea, tests will be locked to specific nodes, so we may have increased running time (unless we modify test order to balance on all the nodes). I’m not sure Jenkins is meant to work this way.
At least there is an idea of list, but how to tell jenkins to run on this list, not a label ?
I’m pretty sure there is some dark magic somewhere, but I don’t now where to find documentation on it …

BR,
Alex.