@timja and I have been chatting in #jenkins-hosting for a while now about finding a way to get permissions for repos managed by code instead of issuing commands to bots.
I did a quick bit of prototyping last weekend and found out that GitHub graphql APIs make it very easy to pull team membership down - getAllTeamsAndMembers.graphql · GitHub - so we can get the existing data, which makes it easy to see who we’d need to invite.
name: "acceptance-test-harness"
github: "jenkinsci/acceptance-test-harness"
paths:
- "org/Jenkins-ci/acceptance-test-harness"
maintainers: # intentionally renamed from developers to possibly make it easier to adapt between old and new format, may not be needed
- jenkins_id: "jglick"
github: "jglick"
- jenkins_id: "olivergondza"
github: "ogondza"
- group: cloudbees-developers # or team maybe
----
name: "cloudbees-developers"
maintainers:
- jenkins_id: "teilo"
github: "jtnord"
# ....
Which has the advantages of creating a mapping of Jenkins LDAP ids, and GitHub accounts. Maybe even something we can use or map in keycloak/beta.accounts.jenkins.io
The downside is currently how to populate that. I think the mapping would have to be done by hand. I think for now I can get away with having one row for jenkins_id, and one for GitHub, and not merge them yet.
developers:
- jenkins_id: "jglick"
release: false # don't give them publish permissions, just commit permissions.
# Not really needed when things are split up.
- gitHub: "jglick"
- jenkins_id: "olivergondza"
- github: "ogondza"
- team: cloudbees-developers
So while I prototype it a bit, I figured I would ask others if they had any ideas for layouts or other feedback.
which doesn’t actually read the component name, just for keeping the data or just to delete it.
Edit: Looks like the # componentname convention was only for the initial import, so its probably not a big deal if it gets deleted. Just depends on how I want to populate existing data.
The RPU data structure is trash. It’s (justifiable historically) oriented at Maven artifacts, but we keep adding stuff that’s more applicable to GitHub repos. Multi-module permissions are already error-prone, but JEP-229 already cannot handle multi-module projects at all IIUC. This needs to be overhauled to support more repo-focused content, like this proposal.
GitHub permission management is far from trivial, and the way we’ve set them up (and let maintainers change them) is a giant mess: Team names do not always match repo names. Teams grant access to additional (or just different) repos than their name indicates. Tons of maintainers use “external collaborators” to grant access.
This won’t get external collaborators, of which we have 500 (or ~20% of contributors). I’ve struggled with the shitty GraphQL API for a long time to get something that’s mostly working, but that’s only useful for reporting, not for assignment, and it takes forever to scan the entire org with about 4.6M results.
So my quick import script runs in 9seconds, but I can only see the public visible scripts. I’m hoping @timja or org github admin can run it and see what it says for the entire repo. I’m leaning towards just doing teams because it doesn’t allow outside contributors, and its easier to get data about (org admins show up as a contributor on every repo, but not every team)
import os
import sys
import requests
import json
from ruamel.yaml import YAML
repositories = {}
yaml = YAML(typ='rt') # default, if not specfied, is 'rt' (round-trip)
yaml.preserve_quotes = True
for entry in os.scandir("permissions/"):
if not entry.is_file():
continue
if entry.path.endswith(".yml") or entry.path.endswith(".yaml"):
with open(entry.path) as stream:
permission = yaml.load(stream)
if "github" in permission:
repositories[permission['github']] = {
'filename': entry.path,
'yaml': permission
}
# Provide a GraphQL query
query = """
query getTeamsAndMembers($after: String) {
organization(login: "jenkinsci") {
teams(first: 100, after: $after, query: "Developers") {
edges {
node {
name
combinedSlug
privacy
invitations(first: 100) {
nodes {
invitee {
login
}
}
}
members {
edges {
node {
login
}
}
}
}
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
"""
after = None
while (True):
# Execute the query on the transport
r = requests.post(
'https://api.github.com/graphql',
json={'query': query, 'variables': {'after': after}},
headers={'Authorization': f"Bearer {os.environ['GITHUB_PASSWORD']}"}
)
result = json.loads(r.text)['data']
if not result['organization']['teams']['pageInfo']['hasNextPage']:
break
after = result['organization']['teams']['pageInfo']['endCursor']
for edge in result['organization']['teams']['edges']:
repo = edge['node']['combinedSlug'].replace('-developers', '')
if repo not in repositories:
print("Skipping", repo)
continue
filename = repositories[repo]['filename']
parsed = repositories[repo]['yaml']
if 'githubteam' not in parsed:
parsed['githubteam'] = {}
parsed['githubteam']['visible'] = True if edge['node']['privacy'] == 'VISIBLE' else False
parsed['githubteam']['members'] = []
for invitationEdge in edge['node']['invitations']['edges']:
parsed['githubteam']['members'].append(
invitationEdge['node']['invitee']['login'])
for memberEdge in edge['node']['members']['edges']:
parsed['githubteam']['members'].append(
memberEdge['node']['login'])
with open(filename, 'w', encoding='utf8') as outfile:
yaml.dump(parsed, outfile)
The next step for me is to figure out a way to validate any new github logins mentioned, without looking up every user in every team every run, which I think is going to be expensive.
I wholeheartely support this work, thanks for pushing this Gavin.
I think we should have a separate location to store the mapping github <=> Jenkins. In the current proposal IIUC we’d put both in every single repo’s declarations, which seems like a gigantic duplication (and a recipe for mistakes)?
While I think the mapping is useful, I don’t think we can assume a 1:1 mapping of committers and releasers.
But there’s no reason the same team logic can’t be applied for sharing users between plugins.
My vote is keep it simple. Get it to work, then refactor.
Edit:
For example, anything with cd: true might want no releasers but a bot user.
I think if beta.accounts.jenkins.io Ever takes over (or earlier) could handle GitHub mapping via oauth. So we could use that and just have two lists of Jenkins ids
Ideally this would get an entire design with objectives and explanations etc. (nothing huge, just something like my old Gist on issues metadata. It is really difficult for me to see what you problems you’re addressing, how this will interact with existing stuff like the bot, what to do about migration of current data, and how you plan to address certain edge cases (collaborators!).