Attendees
- @dduportal (Damien Duportal)
- @MarkEWaite (Mark Waite)
- @smerle33 (Stéphane Merle)
- @poddingue (Bruno Verachten)
- @kmartens27 (Kevin Martens)
- @hlemeur (Hervé Le Meur)
Announcements
- Weekly: not today, tomorrow
- Tomorrow: Core release (LTS, Weekly)
- Damien unavailable next meeting
- FOSDEM (Brussels) next week
- Let’s cancel the weekly meeting the 6th of February. Next: 13th of February
- ci.jenkins.io unavailable (for migration) either Thursday and/or Friday
- After the 2.426.3 core upgrade as it depends on the security release
Upcoming Calendar
- Next Weekly: tomorrow, 2.442
- Next LTS: tomorrow, 2.426.3
- Next Security Release as per jenkinsci-advisories: tomorrow (https://groups.google.com/g/jenkinsci-advisories/c/QZiecB2ArMs)
- Next major event:
- FOSDEM next week (Brussels)
- SCalex 13/14th March 2024
Notes
-
Done:
- Lost permission gitlab-branch-source-plugin
- Wadeck’s security browser extension dedicated to plugin maintainers: GitHub - Wadeck/extension-jenkins-security: Cross-browser extension to ease maintainer's life with advisory preparation
- Jira license on issues.jenkins.io expires in 30 days
- Thanks LF and Mark!
- Crowdin for next-executions-plugin
- ci.jenkins.io jobs on Windows agents are much slower than 21 days ago
- Main problem (not completely gone): network issues (SNAT exhaustion)
- Azure incidents (network… also)
jenkins/jenkins:lts-jdk11
missingarm64
ands390x
- Removed all “faulty” tags to get rid of this issue at all.
- Split docker-jenkins-weekly and docker-jenkins-weekly.ci for infra.ci and weekly.ci
- Different lifecycles for weekly.ci and infra.ci (same core version but different plugin stes)
- Matter of release lifecycle and distinct updates
- Lost permission gitlab-branch-source-plugin
-
-
[get.jenkins.io/mirrors/mirrorbit - Azure] High costs due to usage of Azure File Storage
- Migration to premium storage (still shared storage)
- Let’s create an empty SA of kind premium (from updates.jenkins.io.tf) => @smerle
- (related question: does LDAP could benefit from the same upgrade)
-
Migrate
ci.jenkins.io
to the sponsored subscription- WiP: created a new empty VM in the subscription
- Then: copy data one time
- Once the security release is done, we can start the real migration (and run rsync on the diff for data)
- Expected gain: is $500 monthly
-
Azure Kubernetes
publick8s
suffers from SNAT port exhaustion: network slowness- Decreased TCP timeout from 30min to 4min
- Adding more port statically (instead of dynamically) => did not had any effect, and made operations worse
- Added more public IP, but it is paid and rare resource
- WiP: using NAT gateway in addition (and precedence)
- privatek8s is now using NAT gateway
- Don’t forget to allow NAT gateway on the control plane of AKS!
- Todo: publick8s
-
Agents aren’t spawning on infra.ci
- Fixed manually: autoscaling on
arm64
fails when minimum amount of nodes is zero, as soon as there is at least 1 then it works - Support case opened to AKS. Related to spot?
- Migrate to new subscription?
- Fixed manually: autoscaling on
-
Unexpected delays building small plugin on linux agent
- Issue related to with DigitalOcean ACP service
- No problem with Azure and AWS ACPs services
- Challenge: we would want to not consume too much data from Jfrog this month so let’s not clean it up nor disable it
- Given priorities, let’s wait with “slow” DigitalOcean builds
- Long term solutions:
- Kube 1.27 upgrade could help, but “later”
- Switch to VM instead of Kube for ACP
- Short term:
- Disable ACP DO? => consume more bandwidth
- Disable DO agents temporarily instead => let’s roll for this
-
infra.ci.jenkins.io on
arm64
(controller and agents)- 2 container images already moved to “all in one” with
arm64
: docker-helmfile and docker-hashicorp- Archived these images resources
- Jobs migrated
- Terraform 1.6 is now used (instead 1.1 since months)
- Less PR to review \o/
- WiP: container image
docker-builder
(aka. “webbuilder”)- Not an easy one: used on both ci.jenkins.io and infra.ci.jenkins.io
- Jobs such as jenkins.io (websites), contributors, etc. depends on it
- Also: infra-reports
- 2 container images already moved to “all in one” with
-
Check if we could replace
blobxfer
byazcopy
- Fact: we cannot revoke actual SAS
- WiP: change operations to storage account file shares to use short lived SAS
- Adding a “policy” to SA to allow large scale revoking
- Need to generate 1 “Service Principal” in each controler to allow them writing to file shares (websites, mirrors, etc.)
- Then the
az
command line will use the Service Principal + policy to generate 1 hour lived SAS to authenticat to SAs
- Target file share for contributors.jenkins.io and javadoc first, then the others
- Puppet: install
azcopy
on pkg VM
-
[INFRA-3100] Migrate updates.jenkins.io to another Cloud
- Good candidate for the above “azcopy” short lived TTL
-
- One corrupted record deleted from the table
- WiP: Searching for another corrupted record
-
[Jenkins Agents] Clean up deprecated JNLP arguments
- Delay for later (after FOSDEM)
-
Revoke an OpenVPN cert for NotMyFault
- Delay for later (after FOSDEM)
-
To host versioned jenkins.io docs on docs.jenkins.io
- No progress last week (other priorities)
- Given our GSoc candidate is in exams right now and Kevin is not blocked immediately, let’s delay after FOSDEM
-
Intermittent out of memory for Java 21 builds of Jenkins core on ci.jenkins.io
- No progress last week (other priorities)
- Delay for later (after FOSDEM)
-
Migration left over from publicK8s to arm64
- LDAP:
- Image update (was an old one) thanks to Docker Bake
- arm64 require running on a different availability zone. Current storage is in different zone than arm64 nodes
- Delay for later (after FOSDEM)
- LDAP:
-
Export download mirrors list to a textual representation
- Goal is now clearer and scoped
- No progress last week (other priorities)
-
(new) infra.ci GitHub API rate limiting
- We could use different GH apps to spread the API rate limits and avoid blocking us from delivering from infra.ci
-
Docker Hub:
- Last 3 releases of the Docker inbound agent images due to “HTTP/429 Rate Limit” on DockerHub. Was an issue on DockerHub
- Looks like we’ve been removed and added back (confirmed by them), our system were removed from Open Source programe (hence the API rate limit). But it is fixed
- Thanks Docker for continuing to sponsor us!
- And new feature are available to us: Docker Scout!
-
-
ToDo (next milestone) (infra-team-sync-2024-01-30 Milestone · GitHub)