Hi everyone,
I am exploring the ‘Use OpenTelemetry for Jenkins Jobs’ project for GSoC 2026. Following a discussion with maintainers on Gitter, I am opening this thread to document the architectural design and trade-offs for ci.jenkins.io.The Challenge Enabling the OTel plugin on the Jenkins infrastructure scale is not just about installation, it requires an engineering strategy to handle High Cardinality and Storage Costs.
Proposed Architecture I have been researching a Backend-Agnostic OTLP Pipeline where the Collector acts as a gateway. The key feature is Tail-based Sampling, which allows us to:
-
Retain 100% of failed build traces (for debugging).
-
Aggressively sample successful builds (to save storage).
-
Switch backends (Jaeger, Tempo, Prometheus) without changing Jenkins configuration.
Proof of Concept & Trade-off Analysis I have set up a local lab (Jenkins → Collector → Jaeger) to validate the probabilistic_sampler and load generation. I have compiled my findings, including a detailed analysis of Cardinality vs. Query Speed, in the draft (RFC)Architecture Design: Scalable OpenTelemetry for Jenkins - Google Docs
I would appreciate any feedback from the Infra team, specifically regarding the retention policies for successful builds.
Amanraz Thakur
