Participants
Damien Duportal (@dduportal ), Hervé Le Meur (@hlemeur ), Stephane Merle (@smerle), Mark Waite (@MarkEWaite ), Tim Jacomb (@timja)
Official minutes on GitHub.
Announcement
- Core Security release last week: Jenkins 2.334 + Jenkins 2.319.3
- Plugins Security release today (in progress)
- Weekly 2.335 delayed for tomorrow (16th of February)
- Changes Linux installer from System V init to systemd
- Mark wants to write a blog about the change
Notes
-
ci.jenkins.io outage (Linux Container agent failing to allocate) last friday 10th of Feb.
- https://status.jenkins.io/issues/2022-02-10-agents-not-allocating-ci/
- Root cause:
- AWS required an additional IAM permission to allow autoscaling the EKS cluster nodes. No notification, no documentation: not sure how it happened.
- Symptom: autoscaler component in EKS was logging “Go panic” traces with “Unauthorized 403” error for AWS API
- Multiple fixes:
- EKS Terraform module upgraded (with a LOT of breaking changes) hoping that it would help. Spoiler: it did not. But was success though.
- Faster autoscaling, less resources to pay for (no public IP, etc.)
- Major version change of the plugin, disruptive shift
- We found the missing permission and fixed it (thanks @smerle @lemeurherve !!!)
- EKS Terraform module upgraded (with a LOT of breaking changes) hoping that it would help. Spoiler: it did not. But was success though.
- TODO:
- Write a Post Mortem
- PR to the Terraform EKS module examples/doc to add the new permission
- Docker Hub credential (see below)
- Contact AWS support to improve their autoscaler documentation
- AWS shows less than Terraform
- Terraform shows less than actual required permissions
-
DockerHub API Rate Limit (again)
- By switching the EKS node pool to private IPS, we were rate-limited (because only 1 egress IP)
-
ci.jenkins.io Kubernetes workload:
- Short term fix: add a docker-registry secret in the workspace + specify it
- Helm chart: feat(jenkins-kubernetes-agents) allow inserting a dockerhub secret credential for imagepullpolicy by dduportal · Pull Request #94 · jenkins-infra/helm-charts · GitHub
- Deployment to cik8s: fix(autoscaler) use the updated node labels + bump jenkins-agent helm charts by dduportal · Pull Request #2013 · jenkins-infra/kubernetes-management · GitHub
- ci.jenkins.io updated configuration for pod templates:
- Consider using the account on the Docker builds
- Long Term fix proposals:
- use Docker images in another repository (mirror?)
- add a Docker image proxy in the cluster (better perf, avoid storing the credential in a “jenkins-agent” namespace, more resilient when DockrHub down)
- switch back to public IP for worker nodes (but we pay in $$, configuration, and in allocation time)
- Short term fix: add a docker-registry secret in the workspace + specify it
- WiP: datadog was in the same case (because we specify DockerHub’s hosted custom images)
- Short term fix (learning path for @smerle): helm chart that would copy the docker-registry credential to specified namespaces
- Specify account in one location to be used in all namespaces
- Long term fix for datadog: stop using DockerHub since datadog provides different sources without rate limit (AWS, GHCR, GCR, etc.)
- Switch to standard Datadog images
- Short term fix (learning path for @smerle): helm chart that would copy the docker-registry credential to specified namespaces
-
Digital Ocean
- Work in progress
- Cluster is working with a local Jenkins instance
- Created with Terraform on Digital Ocean GitHub - jenkins-infra/digitalocean: Documentation, tooling and other resources related to the Jenkins Infrastructure Project parts hosted in Digital Ocean.
- Need to adjust Puppet configuration of ci.jenkins.io to match
- Two kubernetes clusters in Jenkins may show new surprises
- May consider a single control plane with autoscaling to clouds
-
Updatecli for terraform
- @smerle added more tracking with updatecli for Terraform components
- infra.ci and release.ci have virtual machine templates updated
- Using same environment on infra.ci and release.ci
- Moving towards the retirement of trusted.ci
- Running packer image build
- Still need to deploy images to a remote registry
- Moving toward single image on multiple uses (Docker, VM)
- Still need to deploy images to a remote registry
-
ci.jenkins.io agent upgrade
- VM templates 0.15.x: JDK 17.0.2 and JDK 8u322-b06
- Release 0.15.1 · jenkins-infra/packer-images · GitHub
- Release 0.15.0 · jenkins-infra/packer-images · GitHub
- Java release naming pattern changes happen
- JDK 11.0.14.1 released, needs updates to packer images
- Need to handle 4 digit pattern in tooling
- VM templates 0.15.x: JDK 17.0.2 and JDK 8u322-b06
-
Building Docker Images on infra.ci/release.ci
- release.ci and infra.ci have the full VM Docker capability
- TODO: update docker build shared library to use it instead of img
-
Security of infra.ci
- Lot of jobs and credentials: need to start “separating” things
- Deploy web site previews
- Terraform
- Puppet
- TODO: split jobs in folder and move credential per folders
- Scope credentials by folder
- Find the jobDSL directive to load the credential
- Should we move the Netlify preview deployment credential to ci.j.io
- Could split the Jenkins instance
- Could use a remote vault system
- Still accessible by API call to the vault from any job
- More complicated than we’re ready to do now
- TODO: do not use shared library changelog as in ci.j
- TODO: disable github check’s feedback for sensitive jobs (such as kube or terraforms)
- Lot of jobs and credentials: need to start “separating” things
-
Alibaba mirror
- TODO: add the mirror - not yet added to the mirrors
-
ci.jenkins.io as code
- Done: shared library
- TODO: finish milestone in helpdesk
- @dduportal working that
- TODO: matrix 3.0 with casc (instead of groovy “lockbox” init.d) - [INFRA-3167] Move security settings to configuration-as-code for puppet managed instances · Issue #2708 · jenkins-infra/helpdesk · GitHub
- TODO: fix JDK tools update (s390x + powerPC are always far behind arm) - s390x and ppc64le JDK 11.0.14 tool download not available · Issue #2760 · jenkins-infra/helpdesk · GitHub
-
census.jenkins.io: Damien still has to ask Tyler/Olivier about the “what does it do?”