Skip to content

[ENHANCEMENT] Extract shared OpenAI-compatible chunk processing from RouterProvider subclasses #338

@edelauna

Description

@edelauna

Problem (one or two sentences)

RouterProvider subclasses (OpencodeGoHandler, VercelAiGatewayHandler, LiteLLMHandler) each re-implement both the OpenAI streaming chunk processing loop and completePrompt inline, duplicating logic that already exists — in a more complete form — in BaseOpenAiCompatibleProvider. Each copy is a subset of the original and diverges in subtle ways.

Context (who is affected and when)

Affects contributors adding or maintaining dynamic-model-list providers (those that extend RouterProvider). The two class hierarchies — RouterProvider for dynamic model fetching, BaseOpenAiCompatibleProvider for static model lists — have no common base for stream processing, so every new RouterProvider subclass is forced to reinvent it.

createMessage divergences

Feature BaseOpenAiCompatibleProvider OpencodeGoHandler VercelAiGatewayHandler
<think> tag support via TagMatcher
Checks both reasoning_content and reasoning keys ❌ (only reasoning_content)
tool_call_end on finish_reason === "tool_calls"
Cost calculation in usage chunk
handleOpenAIError wrapper

completePrompt divergences

All three subclasses also re-implement completePrompt with minor variations. LiteLLMHandler has a legitimate divergence (max_tokens vs max_completion_tokens depending on GPT-5 model detection) that would need to be preserved in any extraction.

Desired behavior (conceptual, not technical)

The chunk-processing and completePrompt logic each live in one place. All OpenAI-compatible providers — whether they use static or dynamic model lists — process streaming chunks and non-streaming completions consistently.

Constraints / preferences

  • RouterProvider subclasses need fetchModel() (dynamic model loading) which BaseOpenAiCompatibleProvider doesn't have, so a simple inheritance change won't work.
  • The solution should not require duplicating model-fetching logic into BaseOpenAiCompatibleProvider.
  • LiteLLMHandler's GPT-5 max_tokens/max_completion_tokens branching must be preserved.
  • Behaviour should be unchanged for all existing providers.

Proposed approach

Extract the streaming loop and completePrompt from BaseOpenAiCompatibleProvider into standalone utility functions, e.g.:

// src/api/providers/utils/openai-stream.ts
export async function* processOpenAIStream(
  stream: AsyncIterable<OpenAI.Chat.ChatCompletionChunk>,
  modelInfo?: ModelInfo,
): ApiStream { ... }

BaseOpenAiCompatibleProvider and all RouterProvider subclasses then delegate to these utilities. The three affected files are:

  • src/api/providers/opencode-go.ts
  • src/api/providers/vercel-ai-gateway.ts
  • src/api/providers/lite-llm.ts

Trade-offs / risks

  • Small risk of behavioural change if the extraction misses any provider-specific nuance (e.g. Vercel's prompt-caching logic that runs before the loop, LiteLLM's GPT-5 token parameter branching).
  • Best done with the existing provider test suites as a safety net — all three affected providers have unit tests covering streaming output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions