fix(verifier+ui): prevent false partials from housekeeping steps; show summary for partial runs#3
Open
maxbeech wants to merge 2 commits into
Open
fix(verifier+ui): prevent false partials from housekeeping steps; show summary for partial runs#3maxbeech wants to merge 2 commits into
maxbeech wants to merge 2 commits into
Conversation
…ine_exceeded failures Cold E2B template boots can take 15-25 s. Commands fired immediately after Sandbox.create() (Xvfb launch, mkdir -p /tmp/workspace) were hitting their short timeoutMs ceilings and throwing deadline_exceeded. The Xvfb failure was caught and logged as a warning, but the mkdir in prepareWorkspace was not caught, cascading into a fatal run failure after only ~19 s. Fix: gate on a 60 s sandbox readiness check (echo ready) before any other command; also increase mkdir timeout from 5 s to 30 s as belt-and-suspenders, and Xvfb background-launch timeout from 10 s to 30 s.
…w summary for partial runs Three related fixes: 1. Verifier prompt: add HOUSEKEEPING-STEPS EXCLUSION rule so agent-initiated bonus steps (logging to internal data tables, metrics dashboards, etc.) that weren't in the job prompt are never counted as missed deliverables. This was the root cause of run f891f7b7 being marked partial — the agent tried to upsert to a Content Log table as a bonus, the CLI wasn't available, and the verifier incorrectly counted it as 1 of 5 required deliverables. 2. UI: Add partially_succeeded to isTerminal in run-detail-view and dashboard-system-section so the summary panel and action buttons render correctly for partial runs. Previously the summary was in the DB but hasSummary was always false for partial status. 3. App.tsx: Include partially_succeeded in terminalStatuses so completion notifications fire for partial runs too.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HOUSEKEEPING-STEPS EXCLUSIONrule so agent-initiated bonus steps (logging to internal data tables, metrics dashboards, etc.) that weren't in the job prompt are never counted as missed deliverables. Root cause of runf891f7b7being falsely marked partial — the agent tried to upsert to a Content Log table as a courtesy step, theopenhelm-dataCLI wasn't available, and the verifier incorrectly scored it as 1 of 5 required deliverables.partially_succeededtoisTerminalinrun-detail-viewanddashboard-system-sectionso the summary panel, terminal button, and action controls render correctly for partial runs. The summary was present in the DB buthasSummarywas alwaysfalsebecausepartially_succeededwas missing from the terminal status list.partially_succeededtoterminalStatusesso completion notifications fire for partial runs.20260526120000_runs_partial_status.sqlto sync theruns.statusCHECK constraint with the TypeScript type (addspartially_succeeded).DB fix applied directly
Run
f891f7b7-623c-4e5f-8a7d-31199c373066was updated tosucceededvia REST API — the blog post was fully published (post written, committed, pushed, Vercel Ready, live HTTP 200 verified).Test plan
Partialbadgesucceedednotpartialpartially_succeeded— verify the desktop notification fires