Speeding up SCM-Pipelines

Hey there!

I am pretty new to Jenkins but have been getting deeper and deeper into it ever since I took an existing setup over from a colleague who left our company.

What my Jenkins does is find all files of a certain type inside a specified folder and use a 3rd party software on each of the found files.

Since the amount of found files and therefore also the created jobs can go into the thousands, I have been working on minimising the execution time of each of those jobs but I’m currently facing an issue where I can’t seem to get to a satisfying solution.

The thing is that I am using gitHub to store all my pipeline scripts and I am chaining three pipelines together:
Pipeline 1 finds all the files and stores their paths in a bunch of .txt files, then activates Pipeline 2.
Pipeline 2 takes these files, goes through them one by one and starts Pipeline 3 on each filepath in the current file. It then waits for the current Pipeline 3 job to finish, then goes onto the next filepath and so on.
Pipeline 3 takes the filepath and starts the 3rd party tool. This is what is being done many thousand times.

The problem is that currently I have all three pipelines set up to clone from gitHub and in case of Pipelines 1 and 2 that’s not a big deal because these run only once.
However, since Pipeline 3 runs so many times and since it currently contacts gitHub and then clones the repo, there are at least 7 seconds in between the start of the pipeline and the actual work that I want performed wasted.

tl;dr:
What I would like to have: I’d like to find a way how to only clone the remote once for every start of Pipeline 1, so that Pipeline 3 does not need to get a current copy of the remote for every started job. This will save so much time and since I won’t change the pipeline code that often, fetching the repo anew for every run of Pipeline 3 is absolutely unnecessary.

Any help on this one would be greatly appreciated!

Jenkins setup:
Jenkins: 2.452.2
OS: Windows 10 - 10.0
Java: 17.0.9 - Azul Systems, Inc. (OpenJDK 64-Bit Server VM)

ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
asm-api:9.7-33.v4d23ef79fcc8
bootstrap5-api:5.3.3-1
bouncycastle-api:2.30.1.78.1-233.vfdcdeb_0a_08a_a_
branch-api:2.1169.va_f810c56e895
build-timeout:1.32
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.2.0
cloudbees-folder:6.928.v7c780211d66e
command-launcher:107.v773860566e2e
commons-lang3-api:3.14.0-76.vda_5591261cfe
commons-text-api:1.12.0-119.v73ef73f2345d
credentials:1337.v60b_d7b_c7b_c9f
credentials-binding:677.vdc9d38cb_254d
dark-theme:439.vdef09f81f85e
display-url-api:2.204.vf6fddd8a_8b_e9
durable-task:555.v6802fe0f0b_82
echarts-api:5.5.0-1
eddsa-api:0.3.0-4.v84c6f0f4969e
email-ext:1814.v404722f34263
font-awesome-api:6.5.2-1
git:5.2.2
git-client:5.0.0
github:1.39.0
github-api:1.318-461.v7a_c09c9fa_d63
github-branch-source:1789.v5b_0c0cea_18c3
gradle:2.12
gson-api:2.11.0-41.v019fcf6125dc
http_request:1.18
instance-identity:185.v303dc7c645f9
ionicons-api:74.v93d5eb_813d5f
jackson2-api:2.17.0-379.v02de8ec9f64c
jakarta-activation-api:2.1.3-1
jakarta-mail-api:2.1.3-1
javax-activation-api:1.2.0-7
javax-mail-api:1.6.2-10
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jjwt-api:0.11.5-112.ve82dfb_224b_a_d
joda-time-api:2.12.7-29.v5a_b_e3a_82269a_
jquery:1.12.4-1
jquery3-api:3.7.1-2
json-api:20240303-41.v94e11e6de726
json-path-api:2.9.0-58.v62e3e85b_a_655
junit:1265.v65b_14fa_f12f0
ldap:725.v3cb_b_711b_1a_ef
locale:511.v212370760160
mailer:472.vf7c289a_4b_420
matrix-auth:3.2.2
matrix-project:832.va_66e270d2946
metrics:4.2.21-451.vd51df8df52ec
mina-sshd-api-common:2.12.1-113.v4d3ea_5eb_7f72
mina-sshd-api-core:2.12.1-113.v4d3ea_5eb_7f72
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pam-auth:1.11
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:61.v629f2cc41d83
pipeline-graph-analysis:216.vfd8b_ece330ca_
pipeline-graph-view:287.v3ef017b_780d5
pipeline-groovy-lib:710.v4b_94b_077a_808
pipeline-input-step:495.ve9c153f6067b_
pipeline-milestone-step:119.vdfdc43fc3b_9a_
pipeline-model-api:2.2198.v41dd8ef6dd56
pipeline-model-definition:2.2198.v41dd8ef6dd56
pipeline-model-extensions:2.2198.v41dd8ef6dd56
pipeline-stage-step:312.v8cd10304c27a_
pipeline-stage-tags-metadata:2.2198.v41dd8ef6dd56
plain-credentials:182.v468b_97b_9dcb_8
plugin-util-api:4.1.0
resource-disposer:0.23
scm-api:690.vfc8b_54395023
script-security:1341.va_2819b_414686
snakeyaml-api:2.2-111.vc6598e30cc65
ssh-credentials:337.v395d2403ccd4
ssh-slaves:2.968.v6f8823c91de4
sshd:3.330.vc866a_8389b_58
structs:337.v1b_04ea_4df7c8
theme-manager:262.vc57ee4a_eda_5d
timestamper:1.27
token-macro:400.v35420b_922dcb_
trilead-api:2.147.vb_73cc728a_32e
variant:60.v7290fc0eb_b_cd
workflow-aggregator:596.v8c21c963d92d
workflow-api:1316.v33eb_726c50b_a_
workflow-basic-steps:1058.vcb_fc1e3a_21a_9
workflow-cps:3894.3896.vca_2c931e7935
workflow-durable-task-step:1353.v1891a_b_01da_18
workflow-job:1400.v7fd111b_ec82f
workflow-multibranch:783.va_6eb_ef636fb_d
workflow-scm-step:427.v4ca_6512e7df1
workflow-step-api:657.v03b_e8115821b_
workflow-support:907.v6713a_ed8a_573
ws-cleanup:0.46

merge the code from pipeline 3 into pipeline 2

To my knowledge there is no way of avoiding the git tax when using an scm pipeline.
It would be great if Jenkins controller could maintain some kind of local mirror of the github pipeline source, and refresh it in the background.
Agents could pull their config from the controller instead of poking directly the original repo, and it would help a lot reduce the git tax, and be more reliable to communication errors with the real source.
Do anyone know if such feature is planned in future Jenkins versions ?

1 Like

If your agents are persistent and not ephemeral, use a reference clone.

The way it works is that you have yet an other pipeline that makes a clone of the git repo in an absolute path that your 3rd pipeline will be able to access.

Here is a blog post about this.

We run jenkins on Kubernetes so all the agents are ephemeral but we plan on attaching an EFS volume to address this. Our controllers have persistent volumes and they do use reference clones and sparse checkouts since they only need to see the most recent jenkinsfiles which are all under one folder in our case.

If you cannot do a reference clone, try to do a shallow clone so you do not pull the entire git history and finally if possible do a sparse checkout.

1 Like

Thanks for your replies, but none of them were the solution to my issue. However, the website linked by @sodul put me on the right path.

Here is what I did:

  1. Add some code to pipeline 1 so that it clones the remote repo once into a local folder in the workspace folder. I am using declarative pipelines, so here is the code for that:

     def repoDir = "${env.WORKSPACE}/../repository"
    
     stage("Clone Repo To Local Machine"){
         //clone the remote once, so that following jobs don't have to
         dir(repoDir){
             git branch: "master",
             credentialsId: "git_pat",
             url: <remote-url>
         }
    
  2. Set up Pipeline 2 and 3 so that they get access to the local repo freshly cloned by Pipeline 1:

  3. prevent pipelines 2 and 3 from doing the default checkout done at the start of a scm-pipeline by inserting the following line of code into the pipeline code:

    skipDefaultCheckout()

And that’s it! Now pipelines 2 and 3 use the locally stored clone of the remote that has been freshly updated by pipeline 1 beforehand, instead of re-cloning every time they run.

Good to hear you found a way to make it work for you. Each situation can be a bit unique and can require a custom solution. The good news is that Jenkins is very flexible which makes most problems solvable.