Help and Input needed, repeatable k8s based CI for Apache Cassandra

In the Apache Cassandra community we have a significant project to improve our CI, with the aim of making it more accessible to every contributor. The project is significant, with Cassandra’s CI currently ~50k tests of different types, with a backlog of tests to continue to add. Our current jenkins installation is at ci-cassandra .apache.org

I’m reaching out for guidance and expertise in creating an initial top-level script that can clone the ci-cassandra .apache.org Jenkins installation.

My questions to the list are…

  • Do you think the following can be done?
  • Do you know of other valuable examples to look at?
  • What Jenkins components/approach would you be using?
  • What challenges and concerns come to mind?
  • Are you interested in joining us?

The Goal…

From the command line we want to be able to set up on any k8s cluster a full ci-cassandra like installation, then run the Jenkinsfile CI pipeline, pull the results and then (optionally) tear it down.

We envision the use of the Jenkins-Operator (or the helm chart ??), the JSaC plugin, the jenkins-k8s-plugin, and some seeding groovy dsl along with our .jenkins/Jenkinsfile located in the Cassandra source repo ( new wip version at here ).

Brief Background…

The CI pipeline’s ~50k tests are split to up to 850+ container runs. We also take advantage of CircleCI, as a separate CI system: at least for those that can afford a premium plan; and it has proven to be a valuable reference point for us, being very fast, stable and intuitive to use. In CircleCI unqueued it takes ~40 minutes. In our current jenkins environment: ci-cassandra.a.o; which has 100 containers (so must queue) it takes ~7 hours.

The Goal in more detail…

Our full objective is to create a reproducible jenkins k8s based CI environment that contributors, collaborating and downstream companies can easily clone. We see this as important to building an inclusive open-source culture and leading in OSS as an example of how an early-majority complex distributed technology can do QA to a professionally high standard.

In the spirit of open source I believe it is an exciting project: one that will also be a great example of what Jenkins is capable of. I am unaware of other such projects, if you are please let me know.

A number of committers in the Cassandra project will be working on this. As with most OSS projects, we rely a lot on volunteers, so all and any help is welcome and highly appreciated.

It is unusual to send out such a request for such significant scope to a volunteer based mailing list, but given it’s entirely OSS and could be quite prestigious to Jenkins, with room for volunteers in any shape or form, this unusual callout is being made nonetheless. You’re also welcome to reach out to me 1:1 if you have private questions/info.

A bit more technical info…

  • Further goals to the project are as follows:
  1. Faster turnaround times, to match those of our CircleCI setup.
  2. Ensure the CI implementation is intuitive and accessible to new contributors.
  3. Establish an accepted “test result output” format that can certify a commit irrespective of the CI environment, and be permanently archived.
  • We also wish to address several challenges that we currently face:
    A. Easier to debug and tune the setup. Feeding back to a more stable ci-cassandra.a.o platform for our post-commit CI: which remains as our canonical CI and runs on donated heterogeneous hardware around the world.
    B. Make it easier to identify and debug our very rare flaky tests.
    C. Be able to scale pre-commit testing using only OSS solutions.

  • Tearing down the jenkins setup in k8s should be optional, making it easy for devs involved testing cycles (i.e. re-using an already built setup).

More information can be found under the epic: CASSANDRA-18137

regards,
Mick, on behalf of the Apache Cassandra PMC

Looping back… We pulled it off !!

The pipeline is now ~200k tests, and we can now run it in under 3 hours. Not only is this the same time as CircleCI, but running it on spot instances in AWS/EKS it’s proving to be under one-tenth of the cost of CircleCI.

We took the approach of using jenkinsci/helm-charts with auto-scaling k8s agents inside it. An example aspect of it was the project’s Jenkinsfile (and all the test scripts and commands) still had to work in other (non-k8s) environments. On top we also added a command line interface, so you can use this system from the project command line (and not even know/care it’s Jenkins doing the work in k8s for you).

The announcement for the launch of our public pre-commit CI system is at LinkedIn – Pre-commit CI for Apache Cassandra is now available

Big thanks goes to InfraCloud that did some of the crucial work involved. ( infracloud.io )

The final commit in all the work was K8s immutable provisioning of ci-cassandra.apache.org jenkins instances · apache/cassandra@05f0e51 · GitHub