[GSoC 2026] Contribution & Strategy: Srinivasan V | Hardening RAG Pipelines & Agentic Workflows

Hi Jenkins Community,

I’m Srinivasan, a 2nd-year CSE student focused on the intersection of RAG (Retrieval-Augmented Generation) and autonomous agentic systems. Over the past week, I have conducted a technical audit of the resources-ai-chatbot-plugin repository to identify bottlenecks in the current developer onboarding and data ingestion pipelines.

Current Contributions: I believe in “Code First, Proposal Second.” I’ve already submitted PR #283, which successfully passed Jenkins CI. This PR resolves critical issues for the Windows development environment and enhances the robustness of the FAISS indexing logic:

  • Schema Resiliency: Patched extract_chunk_docs.py to handle nested dictionary structures from the scraper.

  • Deterministic Encoding: Enforced UTF-8 standards to prevent UnicodeDecodeError in multi-platform environments.

  • Dynamic Fallback: Implemented a logic gate to switch to IndexFlatL2 when training data is below the nlist threshold, preventing system crashes during development testing.

My Vision for GSoC 2026: I aim to move the chatbot from a “Simple Retriever” to an “Autonomous Jenkins Guide.” My proposal focuses on:

  1. Hybrid Retrieval: Merging BM25 lexical search with FAISS dense vectors to handle specific Jenkins syntax (e.g., DSL keywords).

  2. Contextual Re-ranking: Implementing a Cross-Encoder step to ensure the most relevant documentation is prioritized before generation.

  3. Agentic Tool-Use: Allowing the chatbot to validate user-provided Jenkinsfiles against official documentation in real-time.

I’m looking forward to refining this roadmap with the mentors!

GitHub PR: fix: resolve windows encoding and harden FAISS indexing logic by srinivasan-ai-dev · Pull Request #283 · jenkinsci/resources-ai-chatbot-plugin · GitHub