Jenkins build jobs occasionally interrupt abnormally

When I build a same jenkins job,it fails sometime,and there is irregular.
this is my job’s pipeline script:

pipeline{
    agent{
        node{
            // kubernetes slave pod template label
            label 'jdk8'
        }
    }
    stages{
        stage("clone"){
            steps{
                git credentialsId: 'xxx', url: 'xxx', branch: 'master'
            }
        }
        stage("build"){
            steps{
                echo "[package]"
                sh"""
                     mvn clean package
                """
            }
        }
    }
}

the following is the wrong ,as we can see, the problem occurs when executing the sh fragment in the Pipeline script.

[package]
process apparently never started in /home/jenkins/workspace/jenkins-build-app@tmp/durable-5d574913
(running Jenkins temporarily with -D 
org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem 
clearer)

My jenkins controller is running on Kubernetes,then I configured JAVA_OPTS in jenkins controller deployment yaml according to the log prompts.

      containers:
        - name: jenkins
          image: jenkins/jenkins:2.387.3-lts
          env:
            - name: JAVA_OPTS
              value: "-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true"

Then I configured a scheduled task execution to catch exceptions. the log is indeed clearer.

nohup: cannot run command 'sh': Input/output error
process apparently never started in /home/jenkins/workspace/jenkins-build-app@tmp/durable-5d5g43d3

In addition, I found that the abnormal stage failed every time after five minutes and six seconds of execution.

Does anyone know what is causing this error? How should I fix this problem? :cold_sweat:

You might want to try adding

-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.USE_BINARY_WRAPPER

to the JAVA_OPTS. This avoids the nohup by using a binary instead of a shell script to decouple from the java agent process.

Thank you very much! I will try to add this script. But what is the reason? I have checked my controller configuration and Kubernets Nodes, their status are normal.

I’ve seen the input/output error on my instances when the XFS file system got corrupted. Don’t know if this can be caused by other things as well.

My experience with the binary wrapper is that it is much more stable than the default shell script based wrapper where I’ve seen problems when the machine was under high load or memory pressure.

I have solved the problem!
After controlled testing and analysis,I find the root cause maybe due to disk IO performance.
After I replaced the storage device that needs to read and write data in the storage compilation with a new NAS,this problem disappeared! :smile: