Skip to content

Fix fallback model leak; add configurable fallback model per provider#588

Merged
atomantic merged 3 commits into
mainfrom
provider-fallback-model
Jun 1, 2026
Merged

Fix fallback model leak; add configurable fallback model per provider#588
atomantic merged 3 commits into
mainfrom
provider-fallback-model

Conversation

@atomantic
Copy link
Copy Markdown
Owner

Summary

Editorial Review with Codex CLI was failing with 400 Bad Request, logging 🤖 AI run [pipeline-manuscript-completeness]: LM Studio/codex-configured-default. Root cause: when a stage's primary provider was unavailable (Codex had been benched after an OpenAI content-safety refusal), PortOS fell back to another provider but carried the model id it had resolved against the primary (codex-configured-default) onto the fallback. LM Studio has no such model → 400; Claude Code rejected it the same way ("issue with the selected model (codex-configured-default)"). So a perfectly healthy fallback (Claude Code) was being knocked out by a leaked model name.

This PR fixes the leak and adds the ability to pin both a fallback provider and a fallback model.

Fix: no more model leak across fallback

  • stageRunner.runStagedLLM now re-resolves the model against the fallback provider instead of forwarding the primary's already-resolved concrete model.
  • The toolkit createRun does the same for its pre-flight provider swap, so the run record and the first 🤖 AI run … log line show the correct model.
  • runPromptThroughProvider's runtime-retry path honors the configured fallback model instead of always sending model: undefined.

Feature: choose a fallback model, not just a provider

  • New fallbackModel field on providers (createProvider + Zod providerSchema; validated on POST and PUT, returned via sanitizeProvider).
  • providerStatus.getFallbackProvider() now returns { provider, source, model } — the configured fallbackModel for a provider-level fallback, a task-level model when supplied, or null (use the fallback's own default). It is never the primary's model.
  • AIProviders editor gains a Fallback Model selector beside Fallback Provider, populated from the chosen fallback provider's model list (blank = that provider's default). Selecting a new fallback provider clears a stale model. The provider card shows Fallback: <name> (<model>).

Test plan

  • server: full suite green — 8813 passed, 7 skipped. New coverage:
    • providerStatus.test.js — asserts the configured fallbackModel (and task-level model) ride along on the returned object; system-priority picks return null.
    • promptRunner.test.js — asserts a pinned fallbackModel reaches the fallback run and is neither the primary's model (the leak) nor the fallback's own default (the pin must win).
  • client: full suite green — 717 passed.
  • Verified POST (providerSchema) and PUT (providerSchema.partial()) both validate fallbackModel, and sanitizeProvider returns it to the client.

atomantic added 3 commits May 31, 2026 19:44
When a stage's primary provider was unavailable, the model id resolved
against the primary (e.g. codex's 'codex-configured-default') was carried
verbatim onto the fallback provider — so the run logged
'LM Studio/codex-configured-default' and 400'd because the fallback has no
such model. stageRunner and the toolkit createRun now re-resolve the model
against the fallback provider instead of forwarding the primary's.

Adds a 'fallbackModel' field on providers (schema + createProvider + UI
selector beside Fallback Provider) so a fallback can pin both provider and
model; getFallbackProvider returns the configured model and both the
pre-flight and runtime fallback paths run it.
…RunOnce; honor pin in agent lifecycle

The createRun swap inside executeProviderRunOnce (the common path for callers
that don't pre-create a runId) still re-resolved the model against the primary,
leaking codex-configured-default onto the fallback. Re-resolve against the
fallback using the surfaced fallbackModel, mirroring stageRunner.

Also forward the task-level fallback model through the PortOS providerStatus
wrapper and let a provider-/task-level fallbackModel pin override per-task model
selection in agentLifecycle, so the feature applies to CoS agent runs too.
…don't pin onto user-override provider

POST /api/runs executed API/TUI fallbacks with the original request model
(resolved against the benched primary) and ignored the pin for CLI fallbacks.
Derive runModel from createRun's usedFallback/fallbackModel so a fallback swap
runs the fallback's model across all three provider types; non-fallback runs
are unchanged.

In agentLifecycle, a task-metadata provider override could replace the fallback
provider while leaving fallbackModelPin set, applying the fallback's pinned
model to the user-chosen provider. Clear the pin on that override.
@atomantic atomantic merged commit 93ec350 into main Jun 1, 2026
2 checks passed
@atomantic atomantic deleted the provider-fallback-model branch June 1, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant