Help and Input needed, repeatable k8s based CI for Apache Cassandra

In the Apache Cassandra community we have a significant project to improve our CI, with the aim of making it more accessible to every contributor. The project is significant, with Cassandra’s CI currently ~50k tests of different types, with a backlog of tests to continue to add. Our current jenkins installation is at ci-cassandra .apache.org

I’m reaching out for guidance and expertise in creating an initial top-level script that can clone the ci-cassandra .apache.org Jenkins installation.

My questions to the list are…

  • Do you think the following can be done?
  • Do you know of other valuable examples to look at?
  • What Jenkins components/approach would you be using?
  • What challenges and concerns come to mind?
  • Are you interested in joining us?

The Goal…

From the command line we want to be able to set up on any k8s cluster a full ci-cassandra like installation, then run the Jenkinsfile CI pipeline, pull the results and then (optionally) tear it down.

We envision the use of the Jenkins-Operator (or the helm chart ??), the JSaC plugin, the jenkins-k8s-plugin, and some seeding groovy dsl along with our .jenkins/Jenkinsfile located in the Cassandra source repo ( new wip version at here ).

Brief Background…

The CI pipeline’s ~50k tests are split to up to 850+ container runs. We also take advantage of CircleCI, as a separate CI system: at least for those that can afford a premium plan; and it has proven to be a valuable reference point for us, being very fast, stable and intuitive to use. In CircleCI unqueued it takes ~40 minutes. In our current jenkins environment: ci-cassandra.a.o; which has 100 containers (so must queue) it takes ~7 hours.

The Goal in more detail…

Our full objective is to create a reproducible jenkins k8s based CI environment that contributors, collaborating and downstream companies can easily clone. We see this as important to building an inclusive open-source culture and leading in OSS as an example of how an early-majority complex distributed technology can do QA to a professionally high standard.

In the spirit of open source I believe it is an exciting project: one that will also be a great example of what Jenkins is capable of. I am unaware of other such projects, if you are please let me know.

A number of committers in the Cassandra project will be working on this. As with most OSS projects, we rely a lot on volunteers, so all and any help is welcome and highly appreciated.

It is unusual to send out such a request for such significant scope to a volunteer based mailing list, but given it’s entirely OSS and could be quite prestigious to Jenkins, with room for volunteers in any shape or form, this unusual callout is being made nonetheless. You’re also welcome to reach out to me 1:1 if you have private questions/info.

A bit more technical info…

  • Further goals to the project are as follows:
  1. Faster turnaround times, to match those of our CircleCI setup.
  2. Ensure the CI implementation is intuitive and accessible to new contributors.
  3. Establish an accepted “test result output” format that can certify a commit irrespective of the CI environment, and be permanently archived.
  • We also wish to address several challenges that we currently face:
    A. Easier to debug and tune the setup. Feeding back to a more stable ci-cassandra.a.o platform for our post-commit CI: which remains as our canonical CI and runs on donated heterogeneous hardware around the world.
    B. Make it easier to identify and debug our very rare flaky tests.
    C. Be able to scale pre-commit testing using only OSS solutions.

  • Tearing down the jenkins setup in k8s should be optional, making it easy for devs involved testing cycles (i.e. re-using an already built setup).

More information can be found under the epic: CASSANDRA-18137

regards,
Mick, on behalf of the Apache Cassandra PMC