Participants
Damien Duportal (@dduportal ), Hervé Le Meur (@hlemeur ), Stephane Merle (@smerle), Mark Waite (@MarkEWaite )
Official minutes on GitHub.
Announcement
- Weekly 2.337 release
- Release is available, Docker images not yet all visible
- Release checklist still to be run
Notes
-
Issues on ci.jenkins.io:
- Major: Maven container agents: https://github.com/jenkins-infra/helpdesk/issues/2802
- upgraded packages and plugins and restarted yesterday
- this morning it appeared that no containers had been spawned (race condition)
- Screenshot of stack trace to be added to the helpdesk ticket
- Missing classes, classes provided by plugins
- As though plugins were not loaded during startup
- Build queue processed quickly
- May need to test reboot the container, then test reboot the virtual machine
- Minor: Puppet run failed Puppet run failed on azure.ci.jenkins.io (`Could not back up <...> Error 500 on SERVER: Server Error: Permission denied `) · Issue #2800 · jenkins-infra/helpdesk · GitHub
- Puppet run was failing
- Agent was asking puppet master to perform a backup / snapshot
- Space limit on the puppet master
- Snapshot backup failed because the destination had incorrect permissions
- Directories with incorrect permissions were 4 years old (unclear why this failed now)
- Entered a portion of the tree that had incorrect permissions
- See the helpdesk issue
- Major: Maven container agents: https://github.com/jenkins-infra/helpdesk/issues/2802
-
Issue on VPN: Cannot connect to VPN: server certificate for vpn.jenkins.io expired · Issue #2798 · jenkins-infra/helpdesk · GitHub
- Expiring certificate: thanks to @olblak we have been able to add documentation on how to easily regenerate the certificate
- Server side certificate expired Feb 26, 2022
- Needed more documentation on how to generate the server-side certificate
- Generated certificates with @smerle but were missing some specific attributes
- Pointed to the location, is now documented
- @lemeurherve and @smerle both have access
-
Issues on trusted.ci.jenkins:
- Major: Unable to spawn Azure VM agents: Credential Expired for Jenkins Controllers · Issue #2801 · jenkins-infra/helpdesk · GitHub causing RPU + update-center delays (Missing CD credentials · Issue #2799 · jenkins-infra/helpdesk · GitHub)
- Delay running update center and repository permissions updater
- Two days of delayed jobs, half were the GitHub reports
- User reported issue (Alex Brandes)
- Monday was the day to expire credentials (end of month)
- Credential rotated on trusted.ci, rebooted the virtual machine, and user issue resolved
-
Post incidents: calendar updated for credentials expiration routines (Azure SP secrets + VPN certificates)
- No more expired credentials in Azure SPs as for today
- infra.ci’s Azure packer credential to be rotated - 2 weeks left
- All of the
codevalet-*
,rtyler*
andolblak*
app had been removed - All apps with credentials expired since 2 years+ removed
- Removed expired credentials for service principal applications
- Still a few that need more detailed review
- Should have enough permissions for @smerle and @lemeurherve able to rotate
- Credential to be rotated expires in next two weeks
-
Azure AD permissions
- Stephane + Herve have the same rights as Damien
- We could not assign “privileged roles” to groups (neither create “custom groups”) with our current Azure plan (require a premium account)
- Not needed, so let’s manage “manually” (less than 15 people) + it seems that Terrraform might be able to manage this part
- Enforced MFA to everyone on the Azure Portal / API
- Had to enforce per-user for now
-
Digital Ocean :party:
- Added to ci.jenkins.io since 12 days
- Added and they are operating as expected
- TODO:
- Measure costs consumed (no visible access to the billing page)
- Updating ci.jenkins.io documentation for agents
- DigitalOcean sponsorship
- Add DigitalOcean on the sponsor section of home page
- We have a blog post to start
- Our cluster will be updated in 6 days with their Kubernetes updates / patches
- Similar policy to Azure, changes are applied on their version, visible to us
- Added to ci.jenkins.io since 12 days
-
Request from security team to add Windows agent on cert-ci
- Done, thanks Stephane! VPN routes are
- Weird issue: upgrading LTS from 2.319.1 to 2.319.3 deleted the cloud config!
- No explanation for the removal
- Manually managed configuration was deleted during the upgrade
- Should regularly store copies of the configuration
- Recreated templates
- Tried new label pattern:
- Agents label only with “kernel”-related dimensions (OS, CPU, Docker, Size). For cert.ci:
azure vm linux
/azure vm windows
for instance - Using tools, with the method shown by Mark, with “fallbacks” (e.g. using shell script to define local path, otherwise fallback to default installer), for “easy to get” tools: Git, JDK, Maven, etc.
- Manage them with the global tools system inside Jenkins
- Use locally installed JDK from the agent
- Puppet templating allows to provide “improved” naming:
jdk8
andjdk-8
for tools, to handle user typos for instance- Same definition for the two labels
- Could use the implied labels plugin rather than duplicating the definition
- Implied labels does not have configuration as code (keep an XML file) - Implied Labels
- Automate of labels for agent - Platform Labeler
- see Jessie’s comment at the end: fix(buildPlugin) handle container nodes with JDK > 11 by dduportal · Pull Request #302 · jenkins-infra/pipeline-library · GitHub
Maybe we should retire maven and make it maven-8?
- Container OK, but not VM: todo
- pipeline-library change then
- Agents label only with “kernel”-related dimensions (OS, CPU, Docker, Size). For cert.ci:
-
infra-report to be migrated out from trusted.ci into infra.ci
- helpdesk initial issue: https://www.jenkins.io/doc/developer/publishing/source-code-hosting/ misses repositories · Issue #2788 · jenkins-infra/helpdesk · GitHub
- Migrate to infra.ci (from trusted)
- Migrate infra-report from trusted.ci to infra.ci · Issue #2789 · jenkins-infra/helpdesk · GitHub
- Bump & bruises around Docker image to use (Alpine Linux 3.14+, used by JNLP k8s agents, require a recent Container Engine, and our current setup with
img
blocks it) => WiP on building with fully-fledged Docker Engine on ephemeral VM agents (thanks to Stephane’s work earlier this year). Allows Windows Container builds + ARM64 builds + docker buildx-bake full support.
- Switch from github bot user to Github App
- Github App created:
- Need to tune required permissions
- Might need to use another langage than Ruby (NodeJS?) as the GraphQL library doesn’t seem to return a full error log
-
iptables on ci.jenkins.io after spam: closed (iptables rules cleared by reboot + no more spam)
-
IRC notifs: the new IRC channel is
- We had to run the puppet agent on the puppet master itself + reboot to apply the changes