Hey Devs,
so i’ve been digging into the chatbot core lately, was working on fixing the retrieve_context() blocking issue (#333) and the plugin name lookup perf thing (#352),and i ran into something that’s been bugging me.
the codebase has this weird mix where some endpoints are sync and some are async. like the normal chat endpoint (POST /sessions/{id}/message) is fully sync, but the file upload one right below it uses asyncio.to_thread() for the exact same get_chatbot_reply() call. the websocket streaming is async too but was calling sync retrieval internally until #333.
so basically the same operation,user asks chatbot a question,can either block a fastapi worker or not, depending on which endpoint hits it. right now its fine because traffic is low probably, but if this gets deployed with real jenkins instances handling multiple users, this could become a bottleneck pretty quickly.
i wanted to ask two things:
-
@berviantoleo @krisstern @cnu1812 , do you think its worth standardizing this now (make everything async consistently), or should we just keep patching individual blocking calls as they come up? i don’t want to open a big refactor PR if the plan is to keep things simple.
-
this one i genuinely don’t understand yet, on the jenkins side, when the plugin makes a request to the chatbot backend, does it go through jenkins’s own HTTP client? like does jenkins have some kind of request queue or thread pool for plugin HTTP calls, or does each user’s chat request just fire off independently? asking because if jenkins itself is throttling on its end, then fixing async on the python side alone might not matter as much.
thanks,
sharma-sugurthi