Clarification about what can run on agents vs controller?

A somewhat newbie question - I set up a Jenkins controller last year to run one set of jobs, and everything has always just run on this one machine. So the controller is also the agent, I guess. But now we have some extra machines and want to use them in the build process. So I am learning about how to distribute jobs to different agents.

But I dimly remember reading warnings about how some things can only run on the controller. What are those? If there are some types of code that only run on the controller, how do you deal with moving a job from the controller to an agent?

The general advice is don’t run anything on the controller, only run it on agents, even if the agent is on the controller itself (ideally with a different user account so a job can’t access the controller config files and stuff).

Jobs generally don’t care though, they run on an agent, the the controller by default provides a built in agent, so it just runs there just fine.

If you are doing custom groovy code and using raw groovy classes and libraries (ex import java.io.File instead of writeFile/readFile in pipeline), that’ll run on the controller not the agent.

tl;dr you are probably fine, but hard to say without more info.

Hm - my entire job is run as a bunch of groovy code functions within several shared libraries. I am not sure what exactly you mean by “custom groovy code” - would not any groovy code be custom?

I do use import java.io.FileOutputStream in some places. So what does it mean when you say it will run on the controller and not the agent? If I start a job on an agent and it uses that code, does that code just not run, or it calls back to the controller somehow to run that code as part of the job? Just trying to wrap my head around it. If an agent cannot run groovy code it’s hard to see how agents do much of anything…

thanks for the reply!

yea, standard java classes, like java.io.*, don’t know about agents and will run on the controller directly. So if you create a file, it’ll write to disk on the controller and not on the agent code.

I recommend replacing them with any functions you can from Pipeline Utility Steps which are agent aware.

pipeline always run on the controller, but pipeline steps/plugins (like writeJson, or junit, etc) run on the agent. If you are doing your own file management (again, java.io.*), it isn’t aware of agents.

(again hypothetical based on comments)

Man, I think my pipeline’s going to require a pretty extensive rework in order to support agents. Do you know of any articles or sites that talk about porting a controller-only pipeline to an agent-capable one?

its not something that comes up very often. How many files are your files writing? can they be converted to writeFile, or sh/bat?

Most people seem to misunderstand how this works. Here is the real answer - All pipeline code runs on controllers. Always.

It is not actually as bad as it sounds. Agents are used for certain file IO - i.e. where special Jenkins steps (writeFile, etc) interact with filesystem, they will get their IO redirected to the node’s filesystem via the agent connector, and more importantly, any external commands by things like sh commands and many plugins will generate a process on the agent/node. In reality, those are the primary things you actually WANT to run on the agent. Your pipeline code should not be too heavy or involved, the things that are actually heavy are checking out the code and running external commands (and in most cases checking out code IS an external command anyway)

There is no way to actually change that, however you can make controller also be a node - so while it still will execute command on the “node”, the “node” iin that case will be the controller. (FWIW, that is a very bad idea and I advise you against it, but it is possible and iirc, the default)

As a side effect, as mentioned above, any IO code that is NOT using the Jenkins agent redirection code (for example Java/Groovy IO methods) will not be redirected and instead run on the controller.

HTH

1 Like

Thanks - I think I need to find a book or something, I’m still pretty unclear about how this all works. If all pipeline code runs on the controller, how does anything run on agents? Just shell commands that are ported, or these special Jenkins steps?

One thing I do is issue a bunch of perforce calls to figure out what has been submitted since our last build, and then with groovy and apache.poi calls assemble an Excel file that lists those changes and publishes it as an artifact and posts it to Slack. I am not sure if any of that can be done on an agent or not…
Ideally our actual compile / builds will be done on agents. Checking stuff out from perforce is done by shell scripts, so I guess that can run on the appropriate agent? But the code that calls that script is in groovy, so that runs on the controller? At what point is it handed off to the agent, at the point I call “bat”? Still pretty confused…

I think you actually understand it better than you think. The code that says, “I need to run perforce command X” runs on the controller, but when it goes to start the command, it redirects that to the agent and starts the “perforce command x” on the agent. It collects the output of the command and redirects it back to the controller, which handles that output, and continues executions of the next statement in the pipeline.

Consider this simplistic pipeline(not intended to be super useful or best practices, just something to follow) :

node('agent'){.                                                      // <1>
  echo 'Checking out the repository'
  checkout scm                                                       // <2>
  dir('/src/){                                                       // <3>
    echo 'Making a binary called "binary"'
    sh "make binary"                                                 // <4>
    def checkSum = sh(returnStdout: true, 
                                    script: "cat binary | md5sum")   // <5>
    echo "Binary Checksum: ${checkSum}"

    echo 'Creating Manifest file'
    writeFile(file: 'manifest.txt', text:"""
File: binary
Version: ${env.BUILD_ID}
CheckSum: ${checkSum}
""")                                                                 // <6>
    sh " tar -czvf package.tgz binary manifest.txt"                  // <7>
    archiveArtifact ('package.tgz')                                  // <8>
  }
}

Let’s go through this one step at a time. First of all, this is a snippet of Groovy code - this groovy code will execute on Controller - but, where specified, will perform some actions on the agent:

  1. We request an agent with label agent - first one available will be used. This step will connect to the agent and lock a workspace directory (creating it or using an existing one for this job, if available). Remote working directory is now the root of the workspace.

  2. Controller will checkout code based on values in the variable scm (this is usually auto-created by the job in case of SCM backed Pipeline, or Org or MultiBranch jobs (the truth is it is always the SCM backed Pipeline in the end). Depending on your configuration and scm configuration, one of two scenarios will happen:
    a. (not sure if this ever happens, but theoretically) In rare scenarios where SCM is configured to be native java code, it will run on the controller, but write the checked out repo files into the workspace over the Controller-to-Agent link into the current agent directory (workspace root at this point)
    b. In MOST cases the checkout will consist of one or more commands to the scm tool. Each of these commands will be run as an external process, ON THE AGENT, with CWD set to be workspace root (because that is our CWD) - Obviously since the scm tool is running on agent, it will checkout files on agent into the current directory

  3. We now change the agent’s working directory to “src” relative to previous working directory (was workspace root). If directory does not exist, it is created on the agent Everything inside the curly braces (in groovy speak - “closure”) is with agent’s CWD being that.

  4. We run a shell command. sh step (and its cousin bat) will use the connection to agent to run a shell interpreter relative to the agent’s cwd - in this case it will run make command with target binary - Since the process is executed on the agent, the heavy lifting of “making” build is all done on the agent using Makefile. “sh” step itslef is running on the Controller and will consume the shell command’s stdout/stderr stream and show them in your log/console window

  5. Here we get fancy, we want to compute a checksum of the generated file. Really, this should have been handled by the Makefile, but I added it here for sake of example. In this case Controller will execute the specified shell command on the agent, but instead of sending the logs to the console, it will store the captured stdout in a groovy variable checkSum in the pipeline script (i.e. on the Controller)

  6. Again, this should be part of the Makefile, I wanted a demonstration of using writeFile. In this case we run writeFile() step on the Controller. writeFile knows to redirect its IO and write to disk on agent, relative to CWD on agent (./src)

  7. We run another command on the agent to create a tarball. Once again, this really should have been part of the Makefile, because you want to build using build tools, and Jenkins is NOT a build tool. Not in this sense. By putting these commands into the Makefile, you are now allowing your local developer to build the package locally instead of relying on Jenkins. Now they can test and control the build workflow. (putting away the soap box) Anyway, we run tar command which is run on the agent and is Controlled by the Controller.

  8. Finally we run archiveArtifact command. This command runs on the Controller, but once again, the IO is redirected to the agent, so it is trying to read a file from workspace and add it to the list of build artifacts on the Controller

And there you have it. I wrote a book just to procrastinate and not do my work.

2 Likes

Thanks! That helps a lot. I won’t know how much trouble I am in until I get in there and start trying to port this to use agents. One thing we are doing differently, not sure if it matters, is that we don’t really use Jenkins to check out any workspaces or anything from source control. It seems like most people use Jenkins for that, but the workspaces we’re dealing with are huge, and already on the various machines. We run shell scripts to get latest on them, but the project workspaces are separate from jenkins “workspace.” It doesn’t sound like that will be an issue really…