Fix fallback model leak; add configurable fallback model per provider by atomantic · Pull Request #588 · atomantic/PortOS

atomantic · 2026-06-01T02:46:54Z

Summary

Editorial Review with Codex CLI was failing with 400 Bad Request, logging 🤖 AI run [pipeline-manuscript-completeness]: LM Studio/codex-configured-default. Root cause: when a stage's primary provider was unavailable (Codex had been benched after an OpenAI content-safety refusal), PortOS fell back to another provider but carried the model id it had resolved against the primary (codex-configured-default) onto the fallback. LM Studio has no such model → 400; Claude Code rejected it the same way ("issue with the selected model (codex-configured-default)"). So a perfectly healthy fallback (Claude Code) was being knocked out by a leaked model name.

This PR fixes the leak and adds the ability to pin both a fallback provider and a fallback model.

Fix: no more model leak across fallback

stageRunner.runStagedLLM now re-resolves the model against the fallback provider instead of forwarding the primary's already-resolved concrete model.
The toolkit createRun does the same for its pre-flight provider swap, so the run record and the first 🤖 AI run … log line show the correct model.
runPromptThroughProvider's runtime-retry path honors the configured fallback model instead of always sending model: undefined.

Feature: choose a fallback model, not just a provider

New fallbackModel field on providers (createProvider + Zod providerSchema; validated on POST and PUT, returned via sanitizeProvider).
providerStatus.getFallbackProvider() now returns { provider, source, model } — the configured fallbackModel for a provider-level fallback, a task-level model when supplied, or null (use the fallback's own default). It is never the primary's model.
AIProviders editor gains a Fallback Model selector beside Fallback Provider, populated from the chosen fallback provider's model list (blank = that provider's default). Selecting a new fallback provider clears a stale model. The provider card shows Fallback: <name> (<model>).

Test plan

server: full suite green — 8813 passed, 7 skipped. New coverage:
- providerStatus.test.js — asserts the configured fallbackModel (and task-level model) ride along on the returned object; system-priority picks return null.
- promptRunner.test.js — asserts a pinned fallbackModel reaches the fallback run and is neither the primary's model (the leak) nor the fallback's own default (the pin must win).
client: full suite green — 717 passed.
Verified POST (providerSchema) and PUT (providerSchema.partial()) both validate fallbackModel, and sanitizeProvider returns it to the client.

When a stage's primary provider was unavailable, the model id resolved against the primary (e.g. codex's 'codex-configured-default') was carried verbatim onto the fallback provider — so the run logged 'LM Studio/codex-configured-default' and 400'd because the fallback has no such model. stageRunner and the toolkit createRun now re-resolve the model against the fallback provider instead of forwarding the primary's. Adds a 'fallbackModel' field on providers (schema + createProvider + UI selector beside Fallback Provider) so a fallback can pin both provider and model; getFallbackProvider returns the configured model and both the pre-flight and runtime fallback paths run it.

…RunOnce; honor pin in agent lifecycle The createRun swap inside executeProviderRunOnce (the common path for callers that don't pre-create a runId) still re-resolved the model against the primary, leaking codex-configured-default onto the fallback. Re-resolve against the fallback using the surfaced fallbackModel, mirroring stageRunner. Also forward the task-level fallback model through the PortOS providerStatus wrapper and let a provider-/task-level fallbackModel pin override per-task model selection in agentLifecycle, so the feature applies to CoS agent runs too.

…don't pin onto user-override provider POST /api/runs executed API/TUI fallbacks with the original request model (resolved against the benched primary) and ignored the pin for CLI fallbacks. Derive runModel from createRun's usedFallback/fallbackModel so a fallback swap runs the fallback's model across all three provider types; non-fallback runs are unchanged. In agentLifecycle, a task-metadata provider override could replace the fallback provider while leaving fallbackModelPin set, applying the fallback's pinned model to the user-chosen provider. Clear the pin on that override.

atomantic added 3 commits May 31, 2026 19:44

atomantic merged commit 93ec350 into main Jun 1, 2026
2 checks passed

atomantic deleted the provider-fallback-model branch June 1, 2026 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fallback model leak; add configurable fallback model per provider#588

Fix fallback model leak; add configurable fallback model per provider#588
atomantic merged 3 commits into
mainfrom
provider-fallback-model

atomantic commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

atomantic commented Jun 1, 2026

Summary

Fix: no more model leak across fallback

Feature: choose a fallback model, not just a provider

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant