I inherited a rather complex Jenkins setup, and now they’re telling me I’m the expert! I have a lot of CI/CD experience with other platforms, but only the most basic knowledge of Jenkins itself.
We’re using Jenkins 2.222.4 and I’d love to upgrade, but a test run of that path was full of tears so for now we’ve stayed on this old version.
We launch build nodes through AWS ASG (EC2 Fleet),which might be through a plugin- I’m not sure which one. We build an AMI with most of what we need, then run Chef Zero on them again after launching a new node- at which point they are available to jenkins.
The new problem we’re seeing on one type of our nodes is that the build seems to be running a few minutes before the node is actually ready.
Here is an example error:
11:05:34 Building remotely on minion-2xl i-abcdef12345678 (main mysql bazel-worker frontend) in workspace /mnt/jenkins/workspace/main/master/bazel-build
11:05:34 [WS-CLEANUP] Deleting project workspace...
11:05:34 [WS-CLEANUP] Deferred wipeout is used...
11:05:34 Running Prebuild steps
11:05:34 [bazel-build] $ /bin/bash /tmp/jenkins1234567890.sh
11:05:34 fatal: not a git repository: '/mnt/jenkins/main.git'
11:05:34 Failed build for hudson.tasks.Shell@3b2ac32e
...
If we wait 3-4 more minutes and run again, it runs fine. This failure returns when we spin up new nodes- until they have also been up for several minutes. Since we launch our nodes through ASG, we might only run a few builds on the node before it shuts down. So a high percentage of our builds are on “fresh” nodes, and failing due to overfreshness.
As far as I understand it- and I don’t really- it’s running a git command near “/mnt/jenkins/main.git” and it’s not quite ready. /mnt/jenkins is setup by the chef process before we build the ami- it’s also verified/corrected when chef runs again after launching the node. The chef code doesn’t do anything with git in that directory. Forgive my ignorance, but is a link to that git repo something that is set up by jenkins when it connects to a new node? Or is that something being set up in this particular job? My team uses bazel and groovy- which are also new to me, so I’m not too sure what steps the build process is actually executing.
The easiest thing would probably be to delay the node presenting as “ready” to the jenkins server. Is there somewhere to set that? Is there a check which is presenting the node as ready, which is missing something?
Any other advice is welcome. My googles- they have done nothing.
Thanks in advance.