Our jenkins pipelines have implemented “Send an email on jenkins pipeline failure” based on these main stackoverflow answers, which are fairly straightforward:
The pipelines are processing github pull requests. It’s often the nature of pull requests to get merged. Not only merged, but the source branch deleted.
When those events happen there are spurious failure notifications.
The “Send an email on jenkins pipeline failure” feature sends a failure email because the PR was closed and/or the PR branch is not reachable any more.
If the PR was merged it’s no longer a concern to run CI tests on that PR.
It’s not a failure to be investigated.
should the jenkins pipeline have builtin support for this situation?
or, also helpful, is there boilerplate copy/paste code to handle this case?
or just advice: “you need to query the github api”
BTW - it is a race condition. But a super slow tortoise race condition. The job starts. After it has run 15-30 minutes the PR is closed. The job finishes later on, but with a failure.
I assume by “The pipelines are processing github pull requests” you mean that you querying all your PRs from Jenkins side?
If so, the solution to your problem is:
a) use GitHub webhooks to initiate Jenkins jobs,
b) make sure PR’s cannot be merged until CI job finished (that’s configurable on GitHub side).
Hi Vilius,
It may be these suggestions are indeed one way to solve the problem.
We have a very distributed organizational structure, with dozens of repositories, that are controlled by an equally large number of people, and for this application it’s convenient to avoid the tasks of installing webhooks which requires access, permission, coordination. By simply polling every 5 minutes, the jobs can be triggered. Without setup.
I believe what happens is the PR exists at the time the job starts. If the job takes 30 minutes, by the time it is finished, the PR might be gone. Or, the source branch gone. Therefore, the job fails in the middle, someplace. There are different scenarios one can imagine. Even with a webhook, the same thing could occur: at the time the job begins the PR exists. But by the end, it’s gone.
“make sure PR’s cannot be merged”. Yes, that would fix these notifications. But it would replace a trivial problem (spurious emails) with a larger one.
You can create a GitHub App and install it on all repositories at once + it will be automatically installed on all new repositories. GitHub Apps can send webhooks and you don’t need any complicated setup.
Well… yes, but what do you suggest? There is no way for Jenkins to know the status of the PR in realtime. And without webhooks, there is no way to signal Jenkins job to interrupt it in graceful manner. Unless you check for errors between every action and come up with error system of “this error is due to PR actually failing” and “this error is due to PR not existing” there is no way to fix that. Also, such pipeline would still be prone to other kind of errors, like you said it yourself with branches not existing, etc.
What larger problem? I don’t know what exactly you are running in your pipelines, but most of the time you want pipelines to actually finish and produce actionable result, be it a failure or success. Why are you running these pipelines if you cannot ensure if they have completed? To me it looks like “spurious emails” are just a symptom of the problem, not the problem itself.
Interesting. So a pipeline job that is part way through can be signaled by a webhook to exit gracefully? I wasn’t thinking of that feature. In another scenario it might indeed be helpful. That is, if we installed webhooks everywhere…
For our case, these jobs are applied to PRs. Which are essentially less important than when you merge something into the master branch, which means it’s “real code” now, to be deployed someplace. A PR may never be merged. What’s running are tests and previews. If someone ignored the results, and merged the PR anyway, that is their decision. Every organization is different, certainly sometimes such tests are more critical, and ought to block a merge. But not in every imaginable scenario.
what do you suggest?
I guess what I will do is compose code that checks the status of the pull request (was it closed?), before sending a notification. The job failed, but I don’t need to be alerted.
What larger problem?
That if there is any problem with the pipelines themselves, it would prevent merging. You are imagining the frequent case where the pipeline is critical, and it should prevent a merge. But again, applications of Jenkins vary, a “preview” job for us might not be critical.
AFAIK, there is no in-built functionality to interrupt the job via a webhook. My observation was more a theoretical one to illustrate how communication happens. You could make another job which starts and terminates you original pipeline, but I’m not sure how prone it will be.
The real question is then why those jobs are failing? Are you using GitHub API to perform some task with PR or a branch itself just after the job is finished, like writing to GitHub comments? Usually, after the pipeline is started, the build process doesn’t really directly depend on the state of the PR.
Another question, if these pipelines are not critical, why you want to get an email if they fail? I assume you are using these emails as monitoring? If so, the better way would be to monitor the trends of such pipelines and alert only when, let’s say, build stability reaches worst level.
Exactly. Writing a comment.
Imagine though… just because a PR is closed, why would a comment fail? Comments can still be added. So, it’s not 100% certain that would be the cause of the issue. It is odd.
Right. It’s helpful to know, and even to know immediately, that a job is failing in order to debug it, presuming the notification is correct and not spurious.
In order to fix your email problem you need to debug where it fails exactly. If it’s GitHub API issue, you could use catchError with desired stageResult value as a workaround. Or just simply contact GitHub to fix it at the source.
You may say this is not the best option for some reason or another.
But I am going to try it.
In the post processing always section, determine the state of the pull request:
In the post processing failure section don’t send an error notification about a closed PR:
failure {
echo 'This will run only if failure'
script {
if (env.PRSTATE == "closed") {
echo 'Don't send an email about a closed PR.'
}
else {
mail ...
}
}
}
There are two possible places where it executes something like post_comment_to_github_api().
When we literally post a comment to github. Which this Jenkins job does.
and
After the job finishes, maybe Jenkins tries to automatically post a status report, a success/failure type message, every time.
or
Some yet unknown error.
At this moment, I don’t know which it is, and would need to retrofit the script with debugging, because there is just some small stack trace, but I don’t understand what caused that error.
However, logically, why should it be possible to “post comment to github api” when the PR is open, but then as soon as it closes, we can’t “post comment to github api”? Almost certainly that particular step isn’t to blame. It doesn’t make sense, right? Posting a comment is posting a comment is posting a comment. It’s the same in both cases.
so the problem should be (2) or (3), above. But I could be wrong.
Although, it still might be the line pullRequest.comment(commenttext). I will try your idea.
Right around the pullRequest.comment are these errors,
Also: org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: e9a26e8f-4a88-42eb-9661-29e6e2cd64e2
java.lang.IllegalArgumentException: Job's SCM is not GitHub.
at PluginClassLoader for pipeline-github//org.jenkinsci.plugins.pipeline.github.GitHubHelper.getGitHubClient(GitHubHelper.java:75)
at PluginClassLoader for pipeline-github//org.jenkinsci.plugins.pipeline.github.PullRequestGroovyObject.getGitHubClient(PullRequestGroovyObject.java:97)
Regarding earlier items #2 and #3:
Jenkins doesn’t do anything with PR or comments automatically. All it does is updating status of the built commit, but that’s completely different GitHub API endpoint and it is tied to the commit ID so most definitely not affected by the status of the PR.
That’s why I suggested to debug it further. If the error is just a plugin bug or is related to other GitHub APIs your proposed workaround won’t help.
Very interesting!
Presuming it was a bug in the plugin, I upgraded all plugins, which hadn’t been done in a few months. Since then, no error.
That isn’t conclusive. The issue only occurred when a pull request closed during a job, so maybe it hasn’t happened yet.
It’s indeterminate, but maybe this fixed the problem.