Jenkins job to clone multiple GitHub repos

assifiqbal68 · September 18, 2025, 5:28pm

Jenkins setup: Here’s the scenario…Overall goal is that, we need to have reduced build time.

Clone only few specific list of files from a GitHub repository. Not the entire repo itself.
Repeat this for multiple repositories (say 200)
Transfer these files to a remote server simultaneously.

What we are planning to try to achieve this:

Parallel builds: Chunking jobs that will take care of cloning few repos (say 25) and parallel jobs will do the same.
Instead of performing the full clone of the GitHub repo, invoke GitHub through APIs to retrieve only the files needed for this build. Does API invocation to these repos simultaneously time consuming and what’s the typical error rate?

Are there any other possible options does Jenkins recommend?

Thanks…

poddingue · September 18, 2025, 7:40pm

Hello and welcome to the community, @assifiqbal68!

Let’s summarize and see what we can do there. I, unfortunately, don’t see any good idea % pure Jenkins, so let’s go with my usual tinkering.

Goal

Process many GitHub repositories from Jenkins, but only fetch specific files from each, and do so as fast as possible.

Key Challenges

Git cannot clone individual files, it only clones whole repositories.
Fetching individual files via the GitHub API is possible, but:
- Can be slow if there are many files.
- Is subject to API rate limits (especially for 200+ repos).
Parallelization is needed to make the process fast enough.

Suggested Approach

1. Use the GitHub API to fetch individual files

Use curl or similar to call the /repos/{org}/{repo}/contents/{file} endpoint with Accept: application/vnd.github.v3.raw.
Requires a GITHUB_TOKEN with read access.
Add retry/backoff logic if scaling up to hundreds of files.

2. Parallelize using GNU parallel

Write a small shell script to build a list of curl commands and run them concurrently:

#!/usr/bin/env bash
set -euo pipefail

if ! command -v parallel >/dev/null; then
  echo "Please install GNU parallel" >&2
  exit 10
fi

GITHUB_TOKEN="${GITHUB_TOKEN:?GITHUB_TOKEN not set}"

declare -A REPOS
REPOS["repo1"]="file1.txt file2.txt"
REPOS["repo2"]="file3.txt"

commands=()
for REPO in "${!REPOS[@]}"; do
  for FILE in ${REPOS[$REPO]}; do
    commands+=("curl -sSL -H \"Authorization: token $GITHUB_TOKEN\" \
      -H \"Accept: application/vnd.github.v3.raw\" \
      \"https://api.github.com/repos/yourorg/$REPO/contents/$FILE\" \
      -o \"${REPO}_${FILE}\"")
  done
done

export GITHUB_TOKEN
printf "%s\n" "${commands[@]}" | parallel -j 8 bash -c '{}'

3. Run the script from a Jenkins Pipeline

pipeline {
  agent any
  environment {
    GITHUB_TOKEN = credentials('your-token-id')
  }
  stages {
    stage('Fetch Files') {
      steps {
        sh './scripts/my_script.sh'
      }
    }
  }
}

Optional Enhancements

Transfer results using scp, rsync, or similar after download.
Rate limit handling: add retries and exponential backoff for large-scale jobs.
Sparse-checkout: if files are grouped in folders, you can git sparse-checkout instead of using the API.
Separate artifact repository: if you control the repos, consider storing these files in a central small repo or as release artifacts.
Use GitHub Actions runners to fetch files close to the source and then send results to Jenkins.

Summary

Fastest option: GitHub API + GNU parallel + Jenkins Pipeline.
Caveats: API rate limits and network latency.
Future optimization: Group files into a dedicated repo or use sparse-checkout if possible.

assifiqbal68 · September 22, 2025, 3:25pm

Thanks Bruno Verachten!

This recommendation is in alignment with what we were planning to implement.

Topic		Replies	Views
Speeding up SCM-Pipelines Ask a question	5	831	July 2, 2024
Jenkins not cloning Using Jenkins	1	1692	October 3, 2022
Skip git clone, use local repo Ask a question	2	1398	July 13, 2022
ERROR: Error cloning remote repo 'origin' Using Jenkins question	1	21767	August 7, 2023
Jenkins Fails to pull from github randomly Using Jenkins	2	982	November 18, 2024