Skip to content

fix(verifier+ui): prevent false partials from housekeeping steps; show summary for partial runs#3

Open
maxbeech wants to merge 2 commits into
mainfrom
fix/partial-run-verifier-and-ui-terminal
Open

fix(verifier+ui): prevent false partials from housekeeping steps; show summary for partial runs#3
maxbeech wants to merge 2 commits into
mainfrom
fix/partial-run-verifier-and-ui-terminal

Conversation

@maxbeech
Copy link
Copy Markdown
Owner

Summary

  • Verifier prompt: Added HOUSEKEEPING-STEPS EXCLUSION rule so agent-initiated bonus steps (logging to internal data tables, metrics dashboards, etc.) that weren't in the job prompt are never counted as missed deliverables. Root cause of run f891f7b7 being falsely marked partial — the agent tried to upsert to a Content Log table as a courtesy step, the openhelm-data CLI wasn't available, and the verifier incorrectly scored it as 1 of 5 required deliverables.
  • UI: Added partially_succeeded to isTerminal in run-detail-view and dashboard-system-section so the summary panel, terminal button, and action controls render correctly for partial runs. The summary was present in the DB but hasSummary was always false because partially_succeeded was missing from the terminal status list.
  • App.tsx: Added partially_succeeded to terminalStatuses so completion notifications fire for partial runs.
  • Migration: Added 20260526120000_runs_partial_status.sql to sync the runs.status CHECK constraint with the TypeScript type (adds partially_succeeded).

DB fix applied directly

Run f891f7b7-623c-4e5f-8a7d-31199c373066 was updated to succeeded via REST API — the blog post was fully published (post written, committed, pushed, Vercel Ready, live HTTP 200 verified).

Test plan

  • Trigger a job that produces a genuinely partial result (some deliverables missing) — verify it still shows Partial badge
  • Trigger a job where the agent attempts to log to a missing data table as a bonus step — verify it scores succeeded not partial
  • View an existing partial run in the UI — verify the summary section renders
  • Complete a run that ends as partially_succeeded — verify the desktop notification fires

maxbeech added 2 commits May 26, 2026 11:26
…ine_exceeded failures

Cold E2B template boots can take 15-25 s. Commands fired immediately after
Sandbox.create() (Xvfb launch, mkdir -p /tmp/workspace) were hitting their
short timeoutMs ceilings and throwing deadline_exceeded. The Xvfb failure
was caught and logged as a warning, but the mkdir in prepareWorkspace was
not caught, cascading into a fatal run failure after only ~19 s.

Fix: gate on a 60 s sandbox readiness check (echo ready) before any other
command; also increase mkdir timeout from 5 s to 30 s as belt-and-suspenders,
and Xvfb background-launch timeout from 10 s to 30 s.
…w summary for partial runs

Three related fixes:

1. Verifier prompt: add HOUSEKEEPING-STEPS EXCLUSION rule so agent-initiated bonus steps
   (logging to internal data tables, metrics dashboards, etc.) that weren't in the job prompt
   are never counted as missed deliverables. This was the root cause of run f891f7b7 being
   marked partial — the agent tried to upsert to a Content Log table as a bonus, the CLI
   wasn't available, and the verifier incorrectly counted it as 1 of 5 required deliverables.

2. UI: Add partially_succeeded to isTerminal in run-detail-view and dashboard-system-section
   so the summary panel and action buttons render correctly for partial runs. Previously the
   summary was in the DB but hasSummary was always false for partial status.

3. App.tsx: Include partially_succeeded in terminalStatuses so completion notifications fire
   for partial runs too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant