Hi everyone,
I am researching the “AI Chatbot to Guide User Workflow” project for GSoC 2026. My vision is to build a Local-First Agent that doesn’t just answer documentation questions but actively assists users by diagnosing build failures and suggesting configuration fixes.
To ensure this is viable without cloud APIs (OpenAI/Claude), I have drafted an architecture based on a Java Plugin + Python Sidecar pattern running a quantized local model (e.g., Llama-3-8B or Phi-3).
Before finalizing my proposal, I would love feedback from the mentors on a few architectural assumptions:
1. Deployment & Architecture
-
Question: To run the Python-based AI stack (
llama-cpp-python), would you prefer a Managed Process approach (where the plugin manages a localvenvand subprocess) or a Docker Sidecar approach?- My Preference: Managed Process, as it allows the plugin to work on instances where Docker might not be available or the socket is restricted.
2. Hardware Constraints
-
Question: What is the “Minimum Viable Hardware” I should target for a standard Jenkins Controller?
- Assumption: I am designing for instances with at least 4GB of spare RAM available for the AI. If the detected RAM is lower, I plan to disable the feature or fallback to a tiny model (e.g., Qwen-1.5B).
3. Agent Autonomy Level
-
Question: For “Workflow Guidance,” do we want the agent to be purely advisory (Read-Only), or can it propose actions (e.g., “Trigger build with clean parameters”)?
- Proposal: “Human-in-the-Loop” execution. The agent proposes a tool call, but the UI requires explicit user confirmation to execute it.
4. Binary Distribution
- Question: Since LLM models are large (GBs), I plan to make the plugin a “Loader” that downloads the GGUF model from HuggingFace on the first launch, rather than bundling it in the
.hpifile. Is this acceptable?
Any guidance would be greatly appreciated!