[GSoC 2023 Project Idea] exponential backoff and jitter for agent reconnections

(Copied from a private channel conversation with @basil )

Wanted to get people’s feedback on a potential GSoC project idea: exponential backoff and jitter for agent reconnections.

When I was a Jenkins administrator for a medium-sized company, I faced a problem that many others seem to face: when restarting a large controller there was a thundering herd of agent reconnections. Remoting and Swarm (a Remoting wrapper) have (separate) retry implementations. Swarm’s supports exponential backoff but not jitter. Remoting’s supports neither. Users have requested both in the form of bug reports and pull requests, but initial pull requests need a lot of improvement before they can be merged. Additionally, it would be desirable to unify the two implementations rather than to continue to maintain separate logic.

I feel this would be a good intern project for the following reasons:

  • Intellectually stimulating and therefore a high “brag factor”: who couldn’t resist spending a summer working on a topic that the AWS Architecture Blog discusses?
  • High demand: Companies like Netflix needs this feature and would likely be willing to test it
  • Feasible within three months: while Remoting is a complex subsystem, retrying logic is at the outer edges of the subsystem. One does not need to concern oneself with what happens inside of the guts of Remoting, just how to restart the connection process at a high level.
  • Stretchable and shrinkable scope: If the intern is struggling, scope can be reduced to unification of the existing implementations, or merely implementing exponential backoff rather than exponential backoff and jitter. If the intern is doing better than expected, the scope can be expanded to writing a stress testing framework for the end result.
  • Not fatal if the project fails: Even if all the intern manages to complete is a unification of the existing two implementations, that would be a net benefit. If the project fails, no harm: the status quo would simply remain.

How do people feel about this project idea? Is it desirable in general? What about as an intern project specifically? If the answer to both of these questions is “yes” I would be happy to mentor it.

2 Likes

Sounds great in any case, be it for an intern or not.