Hi Jenkins Community!
I’m Abhijeet, been contributing to Jenkins for a couple weeks now. Here’s what I’ve been working on:
Recent Contributions:
Jenkins Docker:
• PR #2166 (Merged): Read initial admin password from file instead of logs by Abhijeet212004 · Pull Request #2166 · jenkinsci/docker · GitHub — Fixed initial admin password extraction in tests
Jenkins Core:
• PR #26056 (Under Review): Fix category headers showing as raw HTML on New Item page by Abhijeet212004 · Pull Request #26056 · jenkinsci/jenkins · GitHub — Fixed category headers showing as raw HTML on New Item page
Kubernetes Plugin:
• PR #2787 (Deployed to Incrementals): Fix counter leak when Jenkins restarts with ephemeral templates by Abhijeet212004 · Pull Request #2787 · jenkinsci/kubernetes-plugin · GitHub — Fixed counter leak when Jenkins restarts with ephemeral templates
• PR #2785 (In Progress): Fix Reaper not detecting ImagePullBackOff from reason field by Abhijeet212004 · Pull Request #2785 · jenkinsci/kubernetes-plugin · GitHub — Fixing Reaper not detecting ImagePullBackOff properly
What I’ve Been Researching:
While working on these kubernetes-plugin fixes, I noticed a pattern of related issues around agent cleanup and lifecycle management:
• Thread leak issue (JENKINS-76095): PR #1747 fixed it in Sept 2025, but caused a memory leak and was reverted in Nov 2025. PR #2788 submitted today is the third attempt.
• Orphaned agents (#2737): Agents not being cleaned up properly
• Failed pod cleanup (#1942): 7+ years old - pods that fail to start never get cleaned up
• Termination issues (#2746): CRITICAL priority - unable to automatically terminate agent pods
These feel like symptoms of a deeper architectural gap - state drift between Jenkins and Kubernetes, cleanup happening reactively instead of proactively, edge cases not being handled systematically.
My Question:
I’m thinking about whether this problem space would benefit from a more systematic approach:
• Adding reconciliation loops (like Kubernetes operators) to detect and fix state drift
• Improving state persistence across restarts (like my PR #2787 did for counters)
• Better resource lifecycle tracking
• Comprehensive testing for edge cases (evictions, restarts, network failures)
Has anyone looked at this holistically? Would this be worth exploring as a larger project, or should I keep fixing individual issues as they come up?
I’ve started documenting my research and would love feedback from kubernetes-plugin maintainers or anyone familiar with this area.
Thanks!
GitHub: Abhijeet212004