Jenkins setup: Here’s the scenario…Overall goal is that, we need to have reduced build time.
Clone only few specific list of files from a GitHub repository. Not the entire repo itself.
Repeat this for multiple repositories (say 200)
Transfer these files to a remote server simultaneously.
What we are planning to try to achieve this:
Parallel builds: Chunking jobs that will take care of cloning few repos (say 25) and parallel jobs will do the same.
Instead of performing the full clone of the GitHub repo, invoke GitHub through APIs to retrieve only the files needed for this build. Does API invocation to these repos simultaneously time consuming and what’s the typical error rate?
Are there any other possible options does Jenkins recommend?
Thanks…
poddingue
(Bruno Verachten)
September 18, 2025, 7:40pm
2
Hello and welcome to the community, @assifiqbal68 !
Let’s summarize and see what we can do there. I, unfortunately, don’t see any good idea % pure Jenkins, so let’s go with my usual tinkering.
Goal
Process many GitHub repositories from Jenkins, but only fetch specific files from each, and do so as fast as possible.
Key Challenges
Git cannot clone individual files , it only clones whole repositories.
Fetching individual files via the GitHub API is possible, but:
Can be slow if there are many files.
Is subject to API rate limits (especially for 200+ repos).
Parallelization is needed to make the process fast enough.
Suggested Approach
1. Use the GitHub API to fetch individual files
Use curl or similar to call the /repos/{org}/{repo}/contents/{file} endpoint with Accept: application/vnd.github.v3.raw.
Requires a GITHUB_TOKEN with read access.
Add retry/backoff logic if scaling up to hundreds of files.
2. Parallelize using GNU parallel
Write a small shell script to build a list of curl commands and run them concurrently:
#!/usr/bin/env bash
set -euo pipefail
if ! command -v parallel >/dev/null; then
echo "Please install GNU parallel" >&2
exit 10
fi
GITHUB_TOKEN="${GITHUB_TOKEN:?GITHUB_TOKEN not set}"
declare -A REPOS
REPOS["repo1"]="file1.txt file2.txt"
REPOS["repo2"]="file3.txt"
commands=()
for REPO in "${!REPOS[@]}"; do
for FILE in ${REPOS[$REPO]}; do
commands+=("curl -sSL -H \"Authorization: token $GITHUB_TOKEN\" \
-H \"Accept: application/vnd.github.v3.raw\" \
\"https://api.github.com/repos/yourorg/$REPO/contents/$FILE\" \
-o \"${REPO}_${FILE}\"")
done
done
export GITHUB_TOKEN
printf "%s\n" "${commands[@]}" | parallel -j 8 bash -c '{}'
3. Run the script from a Jenkins Pipeline
pipeline {
agent any
environment {
GITHUB_TOKEN = credentials('your-token-id')
}
stages {
stage('Fetch Files') {
steps {
sh './scripts/my_script.sh'
}
}
}
}
Optional Enhancements
Transfer results using scp, rsync, or similar after download.
Rate limit handling : add retries and exponential backoff for large-scale jobs.
Sparse-checkout : if files are grouped in folders, you can git sparse-checkout instead of using the API.
Separate artifact repository : if you control the repos, consider storing these files in a central small repo or as release artifacts.
Use GitHub Actions runners to fetch files close to the source and then send results to Jenkins.
Summary
Fastest option : GitHub API + GNU parallel + Jenkins Pipeline.
Caveats : API rate limits and network latency.
Future optimization : Group files into a dedicated repo or use sparse-checkout if possible.
Thanks Bruno Verachten!
This recommendation is in alignment with what we were planning to implement.