Jenkins slows down when multiple builds are triggered (master and agents are all kubernetes pods)

Hi All,

Our Jenkins is deployed on Google Kubernetes Engine. We use dynamic build agents, that are created via pipeline, to schedule jobs. Our concurrency limit is 10.

Some of our jobs take 3min for build, some take an average of 5 min etc. However, when we run jobs parallelly, ( more than 2 or 3 jobs ) , jenkins slows down a lot and the same build takes around 9 min or sometimes even much more than that.

As we have a lot of services, this has become a bottleneck in our delivery. Please can someone help…

Tried by increasing resources on Master, by increasing the default JNLP resources ( from 256Mi , 100m CPU to 500Mi , 200m CPU ) and also increasing the cluster resources . None of it helped.

Here is what our pipeline looks like … 3 stages-- build, scan and deploy. Creates a Pod agent with specified container for each stage .
Additionally, overriding the resources of jnlp container in the “Build stage” Pod.

   stage('CheckoutAndBuild ') {

    agent {
        kubernetes {
            yaml """

apiVersion: v1
kind: Pod
metadata:
  name: kaniko
spec:
  containers:
  - name: kaniko-build
    image: gcr.io/kaniko-project/executor:v1.9.1-debug
    env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /secret/${SECRETFILENAME}.json
    command:
    - /busybox/cat
    tty: true
    volumeMounts:
    - name: ${SECRETFILENAME}
      mountPath: /secret
  - name: jnlp
    image: jenkins/inbound-agent:4.11-1-jdk11
    resources:
      limits:
        cpu: "1"
        memory: 700Mi
      requests:
        cpu: "200m"
        memory: 500Mi    
    env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /secret/kaniko-secret.json
    tty: true
    volumeMounts:
    - name: kaniko-secret
      mountPath: /secret      
  restartPolicy: Never
  volumes:
  - name: kaniko-secret
    secret:
      secretName: kaniko-secret
  - name: kaniko-prod-secret
    secret:
      secretName: kaniko-prod-secret
"""
    }
}
        when { anyOf { branch 'demo'; branch 'develop'; branch 'preprod'; branch 'master' } }
        steps {
            container(name: "kaniko-build", shell: '/busybox/sh') {
            checkout scm
                sh 'pwd'
                sh """
                #!/busybox/sh
                /kaniko/executor --dockerfile Dockerfile --build-arg NODEVERSION=${NODEVERSION} --build-arg REDIS_TLSCERT_IMAGEKEY=${REDIS_TLSCERT_IMAGEKEY} --build-arg BRANCH_NAME=${BRANCH_NAME} --context `pwd`/ --verbosity debug --insecure --single-snapshot --skip-tls-verify --destination ${DOCKER_REPOSITORY}:${env.IMAGE_TAG}
                """
            }
        }
    }
    stage ('Scan') {

    agent {
        kubernetes {
            yaml """

apiVersion: v1
kind: Pod
metadata:
  name: kaniko
spec:
  containers:
  - name: vuln-scanner
    image: <image link>
    command:
    - cat
    env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /secret/kaniko-secret.json
    tty: true
    volumeMounts:
    - name: kaniko-secret
      mountPath: /secret      
  restartPolicy: Never
  volumes:
  - name: kaniko-secret
    secret:
      secretName: kaniko-secret
  - name: kaniko-prod-secret
    secret:
      secretName: kaniko-prod-secret
"""
    }
}
            when { anyOf { branch 'demo'; branch 'develop' } }
            steps {
              container(name: "vuln-scanner"){

              sh 'pwd'
                sh """
                #!/busybox/sh
                gcloud auth activate-service-account --key-file=/secret/${SECRETFILENAME}.json
                sh """  
              sh "trivy image ${DOCKER_REPOSITORY}:${env.IMAGE_TAG}"

            }
        }
    }

   stage ('Deploy') {


       agent {
        kubernetes {
            yaml """

apiVersion: v1
kind: Pod
metadata:
  name: kaniko
spec:
  containers:
  - name: helmfile-deploy
    image: gcr.io/leap-metrics-dev/leap-helmfile-tools:v1.1
    command:
    - cat
    env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /secret/${SECRETFILENAME}.json
    tty: true
    volumeMounts:
    - name: ${SECRETFILENAME}
      mountPath: /secret
  restartPolicy: Never
  volumes:
  - name: kaniko-secret
    secret:
      secretName: kaniko-secret
  - name: kaniko-prod-secret
    secret:
      secretName: kaniko-prod-secret
"""
    }
}
            when { anyOf { branch 'demo'; branch 'develop'; branch 'preprod'; branch 'master' } }
            steps {
              container(name: "helmfile-deploy"){

              sh 'pwd'
                sh """
                #!/busybox/sh
                gcloud auth activate-service-account --key-file=/secret/${SECRETFILENAME}.json
                gcloud container clusters get-credentials $GKE_CLUSTER --zone $ZONE --project $PROJECT
                cat ~/.kube/config
                helmfile -e $ENV diff
                helmfile -e $ENV sync
                """

            }
        }
    }
1 Like

The general recommendation is to grab a threaddump and see whats going on. Though in general, jenkins community volunteers don’t have the time to help debug threaddumps, so you may want to look at a vendor for help.

I will say (and @dduportal) can probably back me up, that switching containers inside of pods is very expensive (though I never fully understood why), so recommend against using them when possible.

1 Like

Sure , I will take a look at the dump …

By this - “switching containers inside of pods is very expensive” , did you mean using containers for builds instead of VM agents?

If docker on docker isnt a good way, we might have to spin up VMs as agents instead. Do you think that’ll help.