Attendees
- @dduportal (Damien Duportal)
- @hlemeur (Hervé Le Meur)
- @MarkEWaite (Mark Waite)
- @smerle33 (Stéphane Merle)
- @poddingue (Bruno Verachten)
Announcements
- Weekly: 2.359 is published in artifactory + artifactory. Checklist not (yet) complete but no issues detected
- Further adoption on the JDK/JS/dependencies that might introduces instabilities. Be warned for infra.ci.jenkins.io
- Next weekly will have a Jetty 9 → 10 upgrade. Major update on a key component.
- Start using JDK17 on our weekly instances (infra.ci and weekly.ci)
- No known issues
- Challenge for agents: we need to update Docker images to have both JDK11 and JDK17 (VM already have)
Notes
-
Done:
-
Update analysis-model and warnings-ng on ci.j
- ci.jenkins.io is not a “infra as code with plugins.txt” so we want such requests in the helpdesk while it’s done manually
- Remove my account
-
Remote access API on every non-ci.j.io instance
- Discovered while working on removal of the embeddable build status plugin
- No need to block API on privtae instances: request from security team to allow them doing more automation
-
Infra meeting notes as helpdesk releases
- Thanks Herve! We now have a GitHub action to generate the meeting notes
-
Docker-compose in JenkinsCI.io
- A plugin maintainer needed to use testcontainer which requires Docker. It was using VM agents (label
linux
by default) but it was choosing randomly ARM or Intel agent which caused obvious issues (Intel binary on ARM64…) - Hotfix:
linux
label means “Linux Ubuntu 20.04 AMD64” for now. Labels should be rationalized: but we need to update both pipeline-library, ci.j config AND ci.j documentation.
- A plugin maintainer needed to use testcontainer which requires Docker. It was using VM agents (label
- Add kmartens27 to jenkins.io triage team
-
Weekly release: 2.358 packaging step fails due to AKS CSI Persistent Volume issue (after Kubernetes 1.22 upgrade)
- Fixed with the help of Herve and Stephane. Same issue as for LDAP/get.jenkins.io during the kubernetes 1.22 upgrade.
- We check ALL the other azurefile persistent volumes: no more left to fix
-
Remove groovy tool configuration from cert.ci JCasC
- Remove tool config, then the plugin.
-
Grant permission to update-center
- Not done: Tim guided the requesterr to the corect documentation
-
502 proxy error when accessing PR view for jenkinsci/jenkins
- Tricky issue. Thanks Daniel and Alex for diagnosing
- Fixed in the culprit plugin by Uli, many thanks!
- Applied to ci.jenkins once available: immediate fix!
-
[ci.jenkins.io] Provides both
powershell
andpwsh
on all agent templates- VM agent templates: check
- Windows Docker image: won’t do (no blocker)
- Real value of this issue: we were able to update the public doc. for Jenkins pipeline
-
DockerHub rate limiting
- We’ve been granted a team plan on
jenkins4eval
andjenkinsciinfra
: no more rate limit seen on ci.j after enabling authentication again
- We’ve been granted a team plan on
-
[INFRA-1633] Stop building PR merges
- Old issue. Some “heavy” jobs should not rebuilds PRs when the destination branch is updated, to avoid wasting precious time.
- Tim set up the aformentioned job. Thanks!
- We have to write down an issue to manage ci.jenkins.io jobs as code.
-
ci.jenkins.io agents are very flaky
- Initial problem: Issues on container agents for BOM builds (a lot of executors requested but unable to scale up efficiently leading to a build queue slowly emptying)
- Correlated to a lot of ATH builds at the same time, spawning a lot of EC2 AWS instances of type “highmem” which are the same size as EKS worker nodes
- Root cause: We hit our limit for EC2 spot instances of this size in this region (
us-east-2
) leading to a lot of spot reclaims, shutting down agents abruptly during builds - Solved by switching from spot to on-demand instances (both AWS and Azure to be safe for ATH). The overcost is low and will be compensated by less builds
-
Update analysis-model and warnings-ng on ci.j
-
-
Downloads /latest directory out of date
- Issue on the azurefile bucket storing the “reference” files for the mirror system (get.jenkins.io): it does not support symlink in the current way that we use it (blobxfer with an old version)
- fixed manually for LTS (initial request of this issue)
- Need more work with feedbacks from Olivier V., Tim and Daniel if they recall how/why
-
Consider removing embeddable-build-status plugin
- Still to be removed from ci.jenkins.io
- Batch PR ready to go (prepared and tested by Herve, after a lot of nice tips from Tim, Joseph and Alex)
-
enable Development integration in Jira
- Nothing done, still to do
-
[JENKINS-49707] Evaluate
retry
conditions to improve the stability of the builds- Almost closable: kubernetes plugin to be released and deployed (Jesse needs 1 or 2 last minors changes in the code)
- We are happy with the outcome, really valuable
-
Replace s390x Ubuntu 18.04 agent with s390x Ubuntu 20.04 agent
- Mark did not had time yet: nothing done, still to do
- Request to share SSH access, to test if the machine can run a puppet agent v6 (or v7)
-
Upgrade to Kubernetes 1.22
- Almost closable: Last steps: merge documentation PR + create issue for Kuberneets 1.23
-
[INFRA-3100] Migrate updates.jenkins.io to another Cloud
- Huge work by Stephane: Oracle infra is set up with Terraform and the machine has a puppet agent connected to our puppetmaster
- Next steps: create Puppet role for this machine, make it work and start copying JSON files on it (additionnaly to the actual machine)
-
Downloads /latest directory out of date
-
New issues:
- Broken taglib docs on reports.jenkins.io => Assigned to Mark, no action required from infra team because it is a javadoc (e.g. dev. scope) thing
- [Hosted javadocs for LTS versions of Jenkins] => No action from the infra team expected: it is a dev. question and should be raised to the contributors usual channels
-
- [terraform:AWS] manage EKS modules as code
- Ubuntu 22.04 upgrade campaign
- [Azure Teraform] Import existing resources
- [ci.jenkins.io] collect Datadog metrics for ephemeral VMs
- temp-privatek8s cluster backup
- [pkg.jenkins.io,releases] Finish cleanup of mirrorbrain
- Migrate Blue Ocean remaining jobs from ci.blueocean.io to the OSS Infra
- [Documentation] add a public page with the “add a jenkins mirror” procedure
- Weekly release build does not resume
- Separate terraform backends & repositories: “Azure Net” and “Azure”
- Keycloak performance horrific when looking up / modifying users
- GC AWS Old Images (from packer)
- AKS: add cluster
privatek8s
- Monitor builds on our private instances (trusted.ci.jenkins.io / infra.ci.jenkins.io / release.ci.jenkins.io)
- [minor] infra.ci logs are mentioning an expired datadog API key
- Updatecli: Use separated pipelines + organization scanning for all updatecli processes in jenkins-infra
- Add observability for the build agents
- [INFRA-3137] Terraform: Import unmanaged Oracle Cloud resources
- [INFRA-3135] Terraform 2021 Winter (/ 2022 Summer )
- [INFRA-3125] Migrate jenkins-infra repositories from branch “master” to “main”
- [INFRA-3080] Migrate Windows Server from 2019 to 2022
-
ToDo (next milestone) (https://github.com/jenkins-infra/helpdesk/milestone/[ID+1])
- Self-hosted shield.io instance => related to removal of embedabble-build plugin. Herve works on it to evaluate if we can host a shields.io service on the VM for ci.jenkins.io to avoid “breaking” the badge feature.
- Create issues to track:
- JDK17 for infra.ci + Agent with both JDK11 and JDK17
- ci.jenkins.io “as code”: Docker image with a plugins.txt
- ci.jenkins.io jobs “as code” (Job-DSL)
- Datadog update for monitors (to avoid false positives)