Attendees
- @dduportal (Damien Duportal)
- @MarkEWaite (Mark Waite)
- @smerle33 (Stéphane Merle)
- @kmartens27 (Kevin Martens)
Announcements
- Weekly: 2.402 release OK, package and Docker image incoming
- ci.jenkins.io agent outage today: switched to DigitalOcean for container agents
- Digital Ocean SSH bruteforce attack reported from one of our IPs
- issue with details to open this week
- no impact as far as we can tell (might be a false positive)
- incentive to use VMs instead of Kubernetes in DO
Upcoming Calendar
- Next Weekly: 2.403, 2 of May
- Next LTS: 2.387.3, May 3
- Kris Stern is release lead
- Next Security Release as per jenkinsci-advisories: None
- Next major event:
- cdCon, May 8-9, Vancouver
Notes
-
Done:
- [ci.jenkins.io] Remove the plugin “JobConfigHistory”
- Archive
branch-source-aged-refs-traits-plugin
Jira component - [ci.jenkins.io] Use Artifact Manager to store archived artifacts (and stashes)
- Install gitpod on
jenkins-infra/jenkins.io
- [ci.jenkins.io] Define a default build discarder policy for all jobs
- Cleanup of lettuce disk
- [ci.jenkins.io] Switch system disk to SSD
- [datadog] Change the “Disk space is below 1GB free” monitor to a “80% disk usage”
- Not able to create an account for jenkins
- Jobs failing due to
repo.maven.apache.org: Name or service not known
- [infra.ci.jenkins.io, ci.jenkins.io] ARM64 VM agent unavailable
-
-
accounts.jenkins.io and email errors:
- cant create account
- Password reset email not coming through
-
Create an Azure Sendgrid account
- WIP: tried with mailgun but hit a DMARC issue with gmail
- Reverted to the previous Sendgrid account for now
- Gotta diagnose the DMARC and (legacy blocking) DNS records to unblock this subject, tune the setup and test individually before trying back sendgrid → mailgun
-
Migrate trusted.ci.jenkins.io from AWS to Azure
- Started with the 1/3 VM (bounce, the SSH bastion) to check the Terraform setup on a closer scope than “3 VMs at all)”
- First reveiw done, ready for 2nd
- Next steps:
- Network (firewall rules)
- Then controller VM
- Then permanent agent VM
- (new) Change region to US East 2 for the (currently used) ephemeral agents
- Started with the 1/3 VM (bounce, the SSH bastion) to check the Terraform setup on a closer scope than “3 VMs at all)”
-
[ci.jenkins.io] separate container agent resources between
bom
and other builds- 1 node pool, identical at the current one, to deliver the “split”
- Experiments with a bigger nodepool is not good enough, so not delivering it for now
- Blocked by the ci.jenkins.io cluster lock-out
-
Ubuntu 22.04 upgrade campaign
- docker-openvpn image (had to fix minor issues because openssl went from 1.1.1 to 3.x)
- Next steps:
- trusted.ci
- ci.j
- eventually the AKS node pools
- Spring 2023: Decrease AWS costs
-
Make “Environment” and “Description” fields mandatory for “Bug” type issues
- Might need help from @en3hD3iMRx6_6IXLNY0Rag to apply this setup, gotta check and ask in the issue if needed
-
Renew update center certificate (crawler and update-center)
- Waiting for new cert. from Olivier (and next board for the key to Damien)
-
[ci.jenkins.io] Use a new VM instance type
- New data disk (smaller), in the new network, new instance (size and generation 2), Ubuntu 22.04, full SSD from scratch, backup policy for data disk, fully managed by Terraform (with audit trail)
- Let’s start ASAP
- Will “test” match puppet profile
jenkins:controller
on Ubuntu 22.04 before @smerle has to do it on trusted.ci
-
migrate google analytics to v4
- Waiting for Olivier
-
[pkg.origin.jenkins.io] puppet agent keeps updating the GPG
- Back to backlog until it’s too annoying
- Back to backlog until it’s too annoying
-
Add Launchable to agents
- need to add to ACI (Windows container) agents to unblock Basil
- Already on all other agents on ci.jenkins (Linux all and Windows VMs)
-
Artifact caching proxy is unreliable
- DO errors almost gone with the “BOM node pool”
- Azure “connection error” errors
- Issue related to networks/regions
- Tim Jacomb published a new Azure VM Agent plugin’s feature: INBOUND => that should help us migrating agents workload closer to ACP
-
Feat(Infra.ci): add Azure ARM64 VMs
- Packer build successful \o/
- Hit a bump regarding to version overwrite on PRs and staging (main) branch builds
- Fixed by @lemeurherve (a unique version is generated by the pipeline on each run)
-
Past Release sites are taking long time to load
- First set of configuration changes that fixed the problem for
/war-stable
(LTS list), but not/war
(Weekly history) - No more errors: only slow for weekly listing
- Need to observe: datadog integration
- We suspect that the CIFS protocol sued for the underlying azurefil htdocs, creates these slowness when listing directory
- Eventually use NFS for full POSIX, but would require a premium storage
- Remediation: using nginx ingress caching for these pages
- First set of configuration changes that fixed the problem for
-
accounts.jenkins.io and email errors:
-
New issues:
- Datadog error about disk full: systempool disk needs to be increased: Increase disk space for systempool on privatek8s · Issue #3539 · jenkins-infra/helpdesk · GitHub
- Some application are running on the system pool: Migrate applications from systempool to linuxpool on privatek8s · Issue #3540 · jenkins-infra/helpdesk · GitHub
- Backup ci.jenkins.io new datadisk: [ci.jenkins.io] Enable disk backup for datadisk · Issue #3527 · jenkins-infra/helpdesk · GitHub to merged into [ci.jenkins.io] Use a new VM instance type
- AKS: add cluster `publick8s` · Issue #3351 · jenkins-infra/helpdesk · GitHub => ldap, then get.jenkins.io
-
ToDo (next milestone) (infra-team-sync-2023-05-02 Milestone · GitHub)