Infrastructure Team Meeting - Apr 26, 2022

Attendees

Announcements :loudspeaker:

  1. Weekly
  • 2.345 Build is complete, most of checklist complete
  • LTS baseline not yet selected, discussion on developer list

Notes :book:

  • Done (infra-team-sync-2022-04-26 Milestone 路 GitHub)

    • Packer images
      • Fix GPG issue
      • Git / GIT-LFS CVE (Windows and Linux images updated)
      • JDK17 CVE fixed in 17.0.3 (does not affect JDK 11 or JDK 8)
    • [vpn.jenkins.io]
      • CRL expiration
      • Wadeck鈥檚 Cert. Renewal
    • [issues.jenkins.io] JIRA upgrade
      • Friday upgrade happened (Apr 22), resolved the warning
      • 10 minutes of downtime - Admin UI update completed as well
    • [digitalocean] back to business
      • Credentials rotated + restricted
      • Cluster re-created to Kube 1.21
      • Funds remain for ~2 weeks of use before budget is consumed
    • [pipeline-library] Docker Build/Publish
      • Annotated tags (thanks to Daniel Beck for that refinement)
      • Groovy string interpolation fixes
    • infra-reports is fully migrated from trusted.ci to infra.ci
      • Ongoing project to move jobs from trusted.ci to infra.ci cluster
    • Crowdin Enterprise CNAME for easier Jenkins translation
  • Work in Progress (infra-team-sync-2022-04-26 Milestone 路 GitHub)

    • [ci.jenkins.io] Incident - outage Thursday Apr 23, 2022
      • TODO: post mortem
      • Identify short term, medium term, and long term improvements
      • Wave of builds (1000+ in the build queue)
        • Reached the Dockerhub rate limit while pulling containers
          • Six hour window for the rate limit
          • Docker discussions to have all our organizations be part of the Docker open source program
            • Could use other registries or a proxy
        • UI and logs were not being updated even though builds were proceeding
          • No errors, no warnings related to issue
        • Enlisting help from Basil to diagnose more deeply
        • Agent allocation failures, not agent disconnect
          • Both issues exist, but not necessarily related to each other
        • Basil thinks there were at least 5 different issues exposed during the outage
          • Willing to share that before our meeting to retrospect on the
          • Attach it to the help desk issue
    • Non Intel Docker Images for Jenkins controller are unpublished: `jenkins/jenkins:lts` images are missing `arm64` & `s390x` 路 Issue #2890 路 jenkins-infra/helpdesk 路 GitHub
      • WiP
      • Root cause was a branch being built in production that should not have been built
        • Was using outdated base code
        • Root cause has been resolved, now check for published versions of platform and all tags
        • Holding Docker publication of weekly today
      • Expect to be ready and working for next weekly and thus for LTS the day after
    • Kubernetes 1.21 Upgrade
      • EKS + DOKS => Done
      • AKS in progress. ETA 27th of April
    • DockerHub credentials (VM agents, Kubernetes, Accounts)
    • Migrating ratings.jenkins.io to Kubernetes
      • Delayed to this milestone
    • JDK 17 campaign upgrade
      • Other JDK to plan?
        • JDK 11.0.15 builds from Eclipse Temurin are only available for Windows today
        • When JDK 11.0.15 builds available for other platforms, update pull requests will happen
        • JDK 8u332 builds from Eclipse Temurin not available for any platform yet
      • Packer + Tools OK
      • TODO: container agents
        • Community based containers may need update
        • Infra based containers need an update
    • [Add a email alias for press dns email mail] => blocked, waiting for LF
    • Memory analysis of core and PR builds
      • Improved observability of JVM options and their impacts
        • How much memory are we using in our builds
        • May be areas we can improve by adjusting memory configuration
        • Size our containers to better match our actual usage
          • If too large, save money by reducing size
          • If too small, increase reliability by increasing size
        • How is it being measured?
          • JVM prints settings with a command line flag, then collect data
          • Look at heap sizes of the JVM鈥檚 as they are running
    • Improvements to the pipeline shared library
      • Allow use of the Maven wrapper call mvnw instead of mvn
        • Allows to download the expected Maven version (many benefits including allowing failures)
      • Two primary consumers - buildPlugin and Jenkins core
        • Allow consumers to adopt Maven build wrapper for consistent maven version in source code
        • Will it use a local maven if it finds the matching version locally (download avoidance)?
      • Basil can check the download
  • New/Importants infra-team-sync-next

    • Add access for Basil to ci.jenkins.io (admin UI + VM) and datadog (JVM)
      • +1 from Mark Waite, Damien Duportal, Herve, Stephane, and Bruno
    • Crowdin work: SSO, DNS record, might be more requests to help this (new) translation system in the upcoming
    • Mirror in Singapore
    • Ping Alibaba about the status of their mirror in China
    • Change BlueOcean link
    • Build our own Windows Image
    • Sunset
  • ToDo (next milestone) (infra-team-sync-2022-05-03 Milestone 路 GitHub)