We’re planning to implement our pipeline jobs using “Pipeline from SCM”, with scripts stored on our Perforce server alongside the code. I have a couple of questions:
The official Jenkins documentation recommends placing the pipeline script at the root of the branch. Could someone explain the reasoning behind this? We need multiple pipeline scripts for different purposes, and Jenkins allows us to configure the script path. Are there any downsides to not placing the scripts at the branch root? What benefits does placing them at the root provide?
When launching a job whose pipeline script is fetched from SCM:
Does storing the pipeline script in SCM (vs embedding it in the job config) impact the controller’s performance or ability to concurrently handle different projects?
Is anything synced to the Jenkins controller when a pipeline script is fetched from SCM? The pipelines are configured to run on slaves via their scripts.
Related to the previous point, are workspaces created on the controller specifically for the purpose of fetching the Pipeline scripts from the SCM?
Reasoning Behind Placing Pipeline Script at the Root of the Branch
Placing the pipeline script at the root of the branch is a common practice for several reasons:
Visibility and Accessibility: Having the pipeline script at the root makes it easily accessible and visible to all team members. It becomes straightforward to locate and update the script.
Standardization: It provides a standardized location for the pipeline script, making it easier to manage and maintain across multiple branches and repositories.
Simplicity: It simplifies the configuration of Jenkins jobs, as the default path does not need to be changed.
Downsides and Benefits of Not Placing Scripts at the Root
Downsides:
Complexity: Configuring different paths for multiple scripts can add complexity to job configurations.
Maintenance: It may become harder to maintain and locate scripts if they are scattered across different directories.
Benefits:
Organization: You can organize scripts based on their purpose, making it easier to manage multiple pipeline scripts.
Separation of Concerns: Different scripts for different purposes can be kept in separate directories, reducing the risk of conflicts.
Impact of Storing Pipeline Script in SCM vs. Embedding in Job Config
Performance and Concurrency:
As far as I know, storing the pipeline script in SCM does not significantly impact the Jenkins controller’s performance or its ability to handle concurrent projects. The script is fetched from SCM at the start of the job, which is a lightweight operation.
Syncing to Jenkins Controller:
When a pipeline script is fetched from SCM, Jenkins only fetches the script file and any necessary metadata. This operation is minimal and does not involve syncing the entire repository to the controller.
Workspaces on the Controller:
I’m almost sure Jenkins does not create workspaces on the controller specifically for fetching pipeline scripts. The script is fetched directly from SCM and executed. Workspaces are typically created on the agents where the actual build and test operations are performed.
For our pipelines we use GitHub.com and have several repos but have some larger repos with multiple pipelines.
To simplify things we adopted this model:
all repos get a CI pipeline for all PRs automatically if they have a /Jenkinsfile at there root of the repository.
all other pipelines are under a /jenkins/ folder and are encouraged to follow a directory tree structure that would match the UI (we use the folders plugin). For example we would have /jenkins/qe/regression_tests.jenkins.
Note that the official documentation is probably targeted toward the git model where microrepos are encouraged, while with Perforce you would have a monorepo with a directory tree providing subprojects, so the model would be significantly different.
I recommend to have the least amount business logic in the job config and move that down to the jenkinsfile. This will allow you to change your pipeline workflow depending on the branches without having to reconfigure your job configuration.
Furthermore avoid putting ‘options’ in the jenkinsfile as it would change the input parameters between branches (the config from the options is applied after the jenkinsfile is loaded, which is after your start the pipeline, which is a mess if you use Branch Parameters).
Finally, put the least amount of business logic in your declarative or scripted pipelines. The groovy interpretation is not pure groovy but a CPS-transformed version of it, in short every single step causes one or more xml files to be written to disk on the controller, which is quite inefficient. So you want to put as much business logic as possible inside a shell/bat/python script instead.
To answer your other questions.
Launching a pipeline requires the controller to read the jenkinsfile. To do that the controller will make a clone of your target project under the controller’s home directory, under workspace, if the same pipeline is launched in parellel, multiple copies can occur. This could lead to your controller running out of disk space or inodes. You can mitigate that by checking the “Lightweight checkout” option for the pipeline which will only copy the targetted jenkinsfile, but if you use the branch parameter plugin, this is incompatible, at least with Git repos, it might work with Perforce. This bug is very well known but unlikely to be fixed in the near future so our workaround is to have a fake git wrapper to do sparse clones. For git repos we also use reference clones on the controller to speed up this part, but only the controller will benefit of this, you can do something similar on the agents. I know perforce has a similar feature (have a local server), but IDK if this is available through the jenkins plugin for perforce.