Problem (one or two sentences)
RouterProvider subclasses (OpencodeGoHandler, VercelAiGatewayHandler, LiteLLMHandler) each re-implement both the OpenAI streaming chunk processing loop and completePrompt inline, duplicating logic that already exists — in a more complete form — in BaseOpenAiCompatibleProvider. Each copy is a subset of the original and diverges in subtle ways.
Context (who is affected and when)
Affects contributors adding or maintaining dynamic-model-list providers (those that extend RouterProvider). The two class hierarchies — RouterProvider for dynamic model fetching, BaseOpenAiCompatibleProvider for static model lists — have no common base for stream processing, so every new RouterProvider subclass is forced to reinvent it.
createMessage divergences
| Feature |
BaseOpenAiCompatibleProvider |
OpencodeGoHandler |
VercelAiGatewayHandler |
<think> tag support via TagMatcher |
✅ |
❌ |
❌ |
Checks both reasoning_content and reasoning keys |
✅ |
❌ (only reasoning_content) |
❌ |
tool_call_end on finish_reason === "tool_calls" |
✅ |
❌ |
❌ |
| Cost calculation in usage chunk |
✅ |
❌ |
❌ |
handleOpenAIError wrapper |
✅ |
❌ |
❌ |
completePrompt divergences
All three subclasses also re-implement completePrompt with minor variations. LiteLLMHandler has a legitimate divergence (max_tokens vs max_completion_tokens depending on GPT-5 model detection) that would need to be preserved in any extraction.
Desired behavior (conceptual, not technical)
The chunk-processing and completePrompt logic each live in one place. All OpenAI-compatible providers — whether they use static or dynamic model lists — process streaming chunks and non-streaming completions consistently.
Constraints / preferences
RouterProvider subclasses need fetchModel() (dynamic model loading) which BaseOpenAiCompatibleProvider doesn't have, so a simple inheritance change won't work.
- The solution should not require duplicating model-fetching logic into
BaseOpenAiCompatibleProvider.
LiteLLMHandler's GPT-5 max_tokens/max_completion_tokens branching must be preserved.
- Behaviour should be unchanged for all existing providers.
Proposed approach
Extract the streaming loop and completePrompt from BaseOpenAiCompatibleProvider into standalone utility functions, e.g.:
// src/api/providers/utils/openai-stream.ts
export async function* processOpenAIStream(
stream: AsyncIterable<OpenAI.Chat.ChatCompletionChunk>,
modelInfo?: ModelInfo,
): ApiStream { ... }
BaseOpenAiCompatibleProvider and all RouterProvider subclasses then delegate to these utilities. The three affected files are:
src/api/providers/opencode-go.ts
src/api/providers/vercel-ai-gateway.ts
src/api/providers/lite-llm.ts
Trade-offs / risks
- Small risk of behavioural change if the extraction misses any provider-specific nuance (e.g. Vercel's prompt-caching logic that runs before the loop, LiteLLM's GPT-5 token parameter branching).
- Best done with the existing provider test suites as a safety net — all three affected providers have unit tests covering streaming output.
Problem (one or two sentences)
RouterProvidersubclasses (OpencodeGoHandler,VercelAiGatewayHandler,LiteLLMHandler) each re-implement both the OpenAI streaming chunk processing loop andcompletePromptinline, duplicating logic that already exists — in a more complete form — inBaseOpenAiCompatibleProvider. Each copy is a subset of the original and diverges in subtle ways.Context (who is affected and when)
Affects contributors adding or maintaining dynamic-model-list providers (those that extend
RouterProvider). The two class hierarchies —RouterProviderfor dynamic model fetching,BaseOpenAiCompatibleProviderfor static model lists — have no common base for stream processing, so every newRouterProvidersubclass is forced to reinvent it.createMessagedivergencesBaseOpenAiCompatibleProviderOpencodeGoHandlerVercelAiGatewayHandler<think>tag support viaTagMatcherreasoning_contentandreasoningkeysreasoning_content)tool_call_endonfinish_reason === "tool_calls"handleOpenAIErrorwrappercompletePromptdivergencesAll three subclasses also re-implement
completePromptwith minor variations.LiteLLMHandlerhas a legitimate divergence (max_tokensvsmax_completion_tokensdepending on GPT-5 model detection) that would need to be preserved in any extraction.Desired behavior (conceptual, not technical)
The chunk-processing and
completePromptlogic each live in one place. All OpenAI-compatible providers — whether they use static or dynamic model lists — process streaming chunks and non-streaming completions consistently.Constraints / preferences
RouterProvidersubclasses needfetchModel()(dynamic model loading) whichBaseOpenAiCompatibleProviderdoesn't have, so a simple inheritance change won't work.BaseOpenAiCompatibleProvider.LiteLLMHandler's GPT-5max_tokens/max_completion_tokensbranching must be preserved.Proposed approach
Extract the streaming loop and
completePromptfromBaseOpenAiCompatibleProviderinto standalone utility functions, e.g.:BaseOpenAiCompatibleProviderand allRouterProvidersubclasses then delegate to these utilities. The three affected files are:src/api/providers/opencode-go.tssrc/api/providers/vercel-ai-gateway.tssrc/api/providers/lite-llm.tsTrade-offs / risks