Send an email on jenkins pipeline failure

Hi,

Our jenkins pipelines have implemented “Send an email on jenkins pipeline failure” based on these main stackoverflow answers, which are fairly straightforward:

https://stackoverflow.com/questions/39720225/send-an-email-on-jenkins-pipeline-failure
https://stackoverflow.com/questions/43721207/jenkins-pipeline-email-not-sent-on-build-failure

PROBLEM:

The pipelines are processing github pull requests. It’s often the nature of pull requests to get merged. Not only merged, but the source branch deleted.

When those events happen there are spurious failure notifications.

The “Send an email on jenkins pipeline failure” feature sends a failure email because the PR was closed and/or the PR branch is not reachable any more.

If the PR was merged it’s no longer a concern to run CI tests on that PR.

It’s not a failure to be investigated.

  • should the jenkins pipeline have builtin support for this situation?
  • or, also helpful, is there boilerplate copy/paste code to handle this case?
  • or just advice: “you need to query the github api”

BTW - it is a race condition. But a super slow tortoise race condition. The job starts. After it has run 15-30 minutes the PR is closed. The job finishes later on, but with a failure.

I assume by “The pipelines are processing github pull requests” you mean that you querying all your PRs from Jenkins side?

If so, the solution to your problem is:
a) use GitHub webhooks to initiate Jenkins jobs,
b) make sure PR’s cannot be merged until CI job finished (that’s configurable on GitHub side).

Hi Vilius,
It may be these suggestions are indeed one way to solve the problem.
We have a very distributed organizational structure, with dozens of repositories, that are controlled by an equally large number of people, and for this application it’s convenient to avoid the tasks of installing webhooks which requires access, permission, coordination. By simply polling every 5 minutes, the jobs can be triggered. Without setup.
I believe what happens is the PR exists at the time the job starts. If the job takes 30 minutes, by the time it is finished, the PR might be gone. Or, the source branch gone. Therefore, the job fails in the middle, someplace. There are different scenarios one can imagine. Even with a webhook, the same thing could occur: at the time the job begins the PR exists. But by the end, it’s gone.
“make sure PR’s cannot be merged”. Yes, that would fix these notifications. But it would replace a trivial problem (spurious emails) with a larger one.

You can create a GitHub App and install it on all repositories at once + it will be automatically installed on all new repositories. GitHub Apps can send webhooks and you don’t need any complicated setup.

Well… yes, but what do you suggest? There is no way for Jenkins to know the status of the PR in realtime. And without webhooks, there is no way to signal Jenkins job to interrupt it in graceful manner. Unless you check for errors between every action and come up with error system of “this error is due to PR actually failing” and “this error is due to PR not existing” there is no way to fix that. Also, such pipeline would still be prone to other kind of errors, like you said it yourself with branches not existing, etc.

What larger problem? I don’t know what exactly you are running in your pipelines, but most of the time you want pipelines to actually finish and produce actionable result, be it a failure or success. Why are you running these pipelines if you cannot ensure if they have completed? To me it looks like “spurious emails” are just a symptom of the problem, not the problem itself.

Interesting. So a pipeline job that is part way through can be signaled by a webhook to exit gracefully? I wasn’t thinking of that feature. In another scenario it might indeed be helpful. That is, if we installed webhooks everywhere…

For our case, these jobs are applied to PRs. Which are essentially less important than when you merge something into the master branch, which means it’s “real code” now, to be deployed someplace. A PR may never be merged. What’s running are tests and previews. If someone ignored the results, and merged the PR anyway, that is their decision. Every organization is different, certainly sometimes such tests are more critical, and ought to block a merge. But not in every imaginable scenario.

what do you suggest?

I guess what I will do is compose code that checks the status of the pull request (was it closed?), before sending a notification. The job failed, but I don’t need to be alerted.

What larger problem?

That if there is any problem with the pipelines themselves, it would prevent merging. You are imagining the frequent case where the pipeline is critical, and it should prevent a merge. But again, applications of Jenkins vary, a “preview” job for us might not be critical.

AFAIK, there is no in-built functionality to interrupt the job via a webhook. My observation was more a theoretical one to illustrate how communication happens. You could make another job which starts and terminates you original pipeline, but I’m not sure how prone it will be.

The real question is then why those jobs are failing? Are you using GitHub API to perform some task with PR or a branch itself just after the job is finished, like writing to GitHub comments? Usually, after the pipeline is started, the build process doesn’t really directly depend on the state of the PR.
Another question, if these pipelines are not critical, why you want to get an email if they fail? I assume you are using these emails as monitoring? If so, the better way would be to monitor the trends of such pipelines and alert only when, let’s say, build stability reaches worst level.

Exactly. Writing a comment.
Imagine though… just because a PR is closed, why would a comment fail? Comments can still be added. So, it’s not 100% certain that would be the cause of the issue. It is odd.

Right. It’s helpful to know, and even to know immediately, that a job is failing in order to debug it, presuming the notification is correct and not spurious.

In order to fix your email problem you need to debug where it fails exactly. If it’s GitHub API issue, you could use catchError with desired stageResult value as a workaround. Or just simply contact GitHub to fix it at the source.