Declarative SCM checkout taking 60 times slower for a simple repo in windows ephemeral agent

A static agent takes 4 seconds and an ephemeral windows agent taking 4 minutes for the same repo.

14:58:32  The recommended git tool is: NONE
15:01:15  using credential token
15:01:15  Cloning the remote Git repository
15:01:16  Cloning repository https://github.com/Ranjith9/game-of-life.git
15:01:17   > git init C:\Jenkins\workspace\windows-ephem---384c7443 # timeout=10
15:01:19  Fetching upstream changes from https://github.com/Ranjith9/game-of-life.git
15:01:19   > git --version # timeout=10
15:01:19   > git --version # 'git version 2.47.1.windows.1'
15:01:19  using GIT_ASKPASS to set credentials ci user GitHub Token
15:01:19   > git fetch --tags --force --progress -- https://github.com/Ranjith9/game-of-life.git +refs/heads/*:refs/remotes/origin/* # timeout=10
15:01:30  Avoid second fetch
15:01:30  Checking out Revision 7e35e9b748c37c748c37c59695596957e35e9b (refs/remotes/origin/windows-ephemeral)
15:01:29   > git config remote.origin.url https://github.com/Ranjith9/game-of-life.git # timeout=10
15:01:29   > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
15:01:30   > git rev-parse "refs/remotes/origin/windows-ephemeral^{commit}" # timeout=10
15:01:31   > git config core.sparsecheckout # timeout=10
15:01:31   > git checkout -f 7e35e9b748c37c748c37c59695596957e35e9b # timeout=10
15:02:24  Commit message: "Update Jenkinsfile"
15:02:24   > git rev-list --no-walk 7e35e9b748c37c748c37c59695596957e35e9b # timeout=10

Some speculations first:

If the static agent has a locally attached SSD and the ephemeral Windows agent has a network attached hard drive, that result could be reasonable.

If the static agent has a fast network connection to the source repository and the ephemeral Windows agent has a slow network connection to the source repository, that result could be reasonable.

If the static agent has a fast network connection to the Jenkins controller and has already cached the jar files that it needs in order to perform the checkout, that would be faster than an ephemeral agent with a slower network connection that needs to copy the jar files that will be used to perform the checkout on the agent.

If the static agent has disabled virus scanning in the Jenkins directories and the ephemeral agent is performing virus scanning, that result could be reasonable.

Some detailed observations second:

163 seconds (almost 3 minutes) spent between the selection of the preferred git tool and the first reference to a credential token might indicate jar file download is happening

21 seconds to perform the git fetch may indicate that the local hard drives on the agent are slow or that the network connection to the source code repository is slow.

53 seconds to perform a git checkout may indicate that the discs on the agent are slow or that virus scanning is causing a serious performance penalty on file operations. It could also indicate that the git repository has large files enabled and the checkout is downloading the large files through Git LFS.

1 Like

Thankyou @MarkEWaite for the response.

on the speculations part,

  • We are using a AWS gp3 machines so they are using SSD.
  • We have the same network for linux ephemeral agents too, which is taking around 30-40 seconds(which is slow but fine for now) but windows ephemeral agents are taking more time.
  • the static agents would already have the required jars so its faster.
  • We have not modified the virus scanning part in both type of agents.

Other hand,

  • We already have git installed in the ephemeral agents AMI, I did verify it by logging into the machine, disabled the auto checkout, added sleep for 10 minutes, tried to clone the same repo which took 4 seconds after the agent is up.
  • The repo I am cloning is around 152K

Hello Ranjith,

Some of this seems also related to JENKINS-72226 and the prefetch of Git plugin / Git Client classes over remoting. Slow network / disk can cause the initilization to take a lot longer. Once the agent has the cached jars, it will initialize the Git Client quickly. After agent restart, the remote class loading will kick off again though. (and Ephemeral agents would actually always go through the remote class loading and pay that toll on the first use of Git client).

Now as Mark pointed out to git checkout being possible caused by slow FS, maybe file operations is the more dominant factor here.