Pipeline stages are not sharing the agent defined at the pipeline level

Jenkins setup:
Jenkins LTS 2.541.1
Red Hat Enterprise Linux release 8.10 (Ootpa)

We recently upgraded our Jenkins in early March to LTS 2.541.1 along with all of our plugins. Since the upgrade, the pipeline stages no longer seem to be sharing the agent that is defined at the pipeline level.

Sample jenkinsfile:

pipeline { 
  agent { label ‘linux’ }

  stages {
    stage(‘build’) {
     echo “build job”
  }
}

The linux label matches two linux nodes: agent1 and agent2

One sample scenario would be the pipeline job runs on agent1, then the ‘build’ stage runs on the controller. We have not changed any configuration in our jobs since the upgrade, so it seems it may be related to the LTS upgrade or plugin upgrades.

agent1 and agent2 configuration
labels: linux
Usage: Use this node as much as possible
Availability: Keep this agent online as much as possible

Built-in Node configuration
labels: linux_master built-in
Usage: Use this node as much as possible

Below are the pipeline related plugins we upgraded to (although I don’t believe any of these should be related to the issue as the pipeline agent label is built-in):
pipeline-build-step:584.vdb_a_2cc3a_d07a_
pipeline-graph-analysis:245.v88f03631a_b_21
pipeline-groovy-lib:787.ve2fef0efdca_6
pipeline-input-step:540.v14b_100d754dd
pipeline-model-api:2.2277.v00573e73ddf1
pipeline-model-definition:2.2277.v00573e73ddf1
pipeline-model-extensions:2.2277.v00573e73ddf1
pipeline-npm:458.vfb_92e52f220a_
pipeline-rest-api:2.39
pipeline-stage-tags-metadata:2.2277.v00573e73ddf1
pipeline-stage-view:2.39
nodelabelparameter:851.vd94e5048d321

Please let me know if you need more info to assist in troubleshooting, thanks.

Okay, your sample Jenkinsfile will absolutely do what you’re claiming it did.

There are some steps that will always run on the controller regardless of whether you use the controller to run jobs or not; when you get into a large environment, you don’t want jobs running on the controller. One of those steps is echo, though I forget the full list of what will run on the controller node.

Now, since you’re talking about a production Jenkins instance that was upgraded, you may need to build a different job that does actual work like with a sh step as those do run on agents.

For your production instance, is anything new showing up in the system logs that might indicate why you’re seeing this behaviour?

@doofus_canadensis Thanks for the response.

Nothing stands out in the jenkins.log file regarding the label or node it is building on.

This issue only started occurring after our upgrade earlier this month. Is it unrelated and we have something misconfigured?

The sample I provided is quite minimal… to clarify, the build stage calls a job that builds the application which has the label defined as “linux || linux_master”.

Update: I just tested setting the build job configuration to ‘linux’ only and it’s working now as expected. Is there a reason linux_master is taking precedence over the linux label or simply not sharing the same label as the pipeline?

If you are building on the controller, it probably takes priority in an idle network of agents because the repo is already cloned there for reading the pipeline file.

Usually Jenkins tends to stick to the previous agent for jobs (at least freestyle, not sure if this applies to pipeline as well). So maybe in a previous run the agents with label linux had no free executors but the controller was available and then all following runs preferred to run on the controller. There are plugins that allow to influence the load balancing in Jenkins.

Do you mean you call another job in your pipeline with the build step?

@mawinter69 Yes, another job is called and that job has the labels configured as linux || linux_master where linux label is for the two agents, and linux_master is for the controller.

The labels from the calling pipeline have nothing to do with the labels of the called job. Those things are not inherited. Using the node label parameter you can probably pass that information if you like.

@mawinter69 Sorry I could not figure out how to quote a specific post or question if someone could guide me…

In response to your latest comment, this is the AI overview for the pipeline agent behavior:

“In a Jenkins pipeline, stages automatically share the same agent and workspace by default when the agent is defined at the top-level of the pipeline block.”

However I could not find a clear definition of the behavior in the official documentation.
There is also something else AI overview mentions that using && instead of || for label definitions sets an order of precedence, e.g. linux && linux_master. I will give this a try in the jenkinsfile, but this configuration did not work within the build job.
We do have the node label parameter plugin, but how can this be set so that it shares the same chosen agent that was used in the pipeline or previous stages? I just want to avoid hard-coding a specific agent. The goal is to use the labels, but share the agent throughout all stages (unless specified otherwise) that is chosen when the pipeline is run.

Your job looks like this
Main job:

pipeline { 
  agent { label ‘linux’ }

  stages {
    stage(‘build’) {
     build 'other_job'
  }
}

other_job:

pipeline { 
  agent { label ‘linux || linux_master’ }

  stages {
    stage(‘build’) {
     echo 'do something'
  }
}

In this setup the other_job has it’s own workspace it’s own labels the only relation is that the second job was triggered from the first. It can also be triggered by another job or manually (or if configured by an scm change).
Labels expressions are logical expression and use the syntax of most programming languages, i.e. && is the and operator, so linux && linux_master means the agent must have both labels. || is the or operator so any agent that has one of the 2 labels will match.

When you want to run other_job on the same agent as the calling job this can lead to deadlocks when too many jobs run in parallel. E.g. the agent has 2 executors and 2 jobs start running on that agent that both want to trigger another job that should re-use the agent then you have a deadlock as the 2 triggered jobs will wait for an executor to become free on the agent which will never happen as the 2 running jobs wait for completion of the triggered jobs.
In general I would try to avoid calling other jobs if possible. Better to to inline the other job or use a pipeline library that you call to avoid code duplication.

@mawinter69 Apologies if I did not explain clearly regarding the other job. What I meant by calling the other job, was that the ‘build’ stage is calling a separate build job (not pipeline) with its own label configuration.

e.g.
Pipeline job defines agent label as ‘linux’
Build job (that is called from pipeline) defines agent label as ‘linux || linux_master’

Previously we were using the same label definition for both. It was just recently updated to help troubleshoot this issue. It seems from the last paragraph that the expectation of sharing the same agent throughout the whole pipeline for all stages is not realistic?
I’ll outline a bit more of the design for context.

Pipeline job contains:

  • Build stage (runs a separate job that loads code from a repository into the workspace and builds the application)
  • Scan stage (runs an inline NexusIQ scan sharing the same workspace as the build stage, which means it needs to share the same agent as well)
  • Deploy stage (runs a separate job that uses the same workspace as the build stage and deploys the archived artifact to the app server)

Note: The reason for the scan stage using an inline approach is because the NexusIQ job-dsl snippet was not working so we could not embed this into the build job itself.

Within one pipeline job all stages share the same agent (when defined globally and stages don’t define their own agent).
But as soon as you call another job using the build step, that other job can run on a different agent and has its own workspace normally.
You will need to provide more details what exactly you mean with

runs a separate job that loads code from a repository into the workspace and builds the application

and how that is done in that separate job. Is that other job a freestyle job or also a pipeline job?

Also

Scan stage (runs an inline NexusIQ scan sharing the same workspace as the build stage, which means it needs to share the same agent as well)

That sounds dangerous, how do you guarantee that the build job is not running again in that same workspace?

and how that is done in that separate job. Is that other job a freestyle job or also a pipeline job?

Yes, the other job is a freestyle job with its own SCM configuration and label definition.

That sounds dangerous, how do you guarantee that the build job is not running again in that same workspace?

Sorry I’m not sure which job you’re referring to as build job, but the build job (for the 1st stage) is responsible for loading the code into its workspace, and the scan stage (inline, no separate job) shares the same workspace as the build job using the dir wrapper and runs the scan.

The job you call in the build stage

From the info you provided so far you have 3 jobs
main_job (the pipeline job)
build_job (freestyle job that does the scm checkout and runs the build)
deploy_job(freestyle? deploys to app server uses same workspace as build_job)
Each of the 3 jobs has it’s workspace normally that would be
/agent_root/workspace/<job_name>

So this is how I assume you pipeline job looks, lets call this job main_job

pipeline { 
  agent { label 'linux' }

  stages {
    stage('build') {
     build 'build_job'
  }
    stage(‘scan’) {
     dir('/agent_root/workspace/build_job') {
       sh 'echo execute nexusIQ scan'
    }
  }
    stage('deploy') {
     build 'deploy_job'
  }
}

Using the dir step in this way is dangerous. When the main_job allows concurrent execution or build_job can be triggered also by other jobs you’re in trouble.

And when build_job can run on more than one agent and was running on a different agent (you stated yourself that you have 2 agents with labeö linux) than main_job you would see an old state

To solve that best do not call another job to build but do that directly in the pipeline itself

pipeline { 
  agent { label 'linux' }

  stages {
    stage('build') {
      checkout ...
      sh '''
           #run build
           ...
      ''' 
    }
    stage('scan') {
        sh 'echo execute nexusIQ scan'
    }
    stage('deploy') {
      sh '''
        #deploy
        ...
      '''
    }
  }
}

That ensures that everything runs on the same agent and uses the same workspace

The build stage and deploy stage work ok in terms of using the same agent, because we are using runOnSameNodeAs for the deploy stage to share the same agent as the build stage.

wrappers {
	runOnSameNodeAs("${app_job}-build", true)
}
stage('build') {
    steps {
	    build job: "${app_job}-build", parameters: [string(name: 'param1', value: "${param1_value}"), string(name: 'param2', value: "${param2_value}")]
    }
}

I guess that leaves the scan stage in question which unfortunately had to be written inline (not through job-dsl) as the job-dsl nexusPolicyEvaluation method was not working (or not found - JOB-DSL configuration for nexusPolicyEvaluation). The pipeline behavior for this however is working ok when we define both the pipeline agent label as linux and build stage as linux, so I think we will just use this for now. We’d have to refactor quite a large code base of pipelines that are designed to call separate freestyle jobs. Will certainly keep the suggestions in mind. Thanks for the insight and feedback.

There is no guarantee that the pipeline job and the build job run on the same agent when there is more than one agent with label linux unless you use a node label parameter in app_job-build and pass that from the pipeline job.

This all sounds very fragile and can only work when there is no concurrent execution. You basically must ensure that app_job-build is not run another time between the build stage and the deploy stage. That at least requires the pipeline doesn’t allow concurrent execution.

And the runOnSameNodeAs comes from the Job Node Stalker plugin last released 10 years ago. It uses the approach of custom workspace. Which has this in its help:

If you are in a distributed build environment, unless you tie a job to a specific node, Jenkins may still move around jobs to different agents. Sometimes this is desirable, sometimes this is not. Also, you can map multiple projects to have the same workspace, but if you do so, make sure concurrent executions of those jobs won’t have nasty interference with each other.