Skip to content

[Chore] Unskip VS Code e2e replay for subtasks#94

Merged
edelauna merged 8 commits into
mainfrom
chore-unskip-e2e-subtasks
Jun 1, 2026
Merged

[Chore] Unskip VS Code e2e replay for subtasks#94
edelauna merged 8 commits into
mainfrom
chore-unskip-e2e-subtasks

Conversation

@roomote
Copy link
Copy Markdown
Contributor

@roomote roomote Bot commented May 13, 2026

Opened on behalf of Elliott de Launay. View the task or mention @roomote for follow-up asks.

Related GitHub Issue

#355 — Parallelizable Tasks implementation roadmap — broader architectural follow-up. The immediate bugs in this PR were discovered while unskipping existing e2e coverage.

Description

Unskips the subtasks VS Code e2e replay suite and fixes three delegation lifecycle bugs it exposed:

  1. Cancelled child could reopen parent on completion (AttemptCompletionTool): A cancelled-then-resumed delegated child would unconditionally delegate back to the parent via attempt_completion, even if the parent had already detached. Added an idempotency guard that checks parentHistory.awaitingChildId === task.taskId before delegating; otherwise falls through to the normal completion-ask flow.

  2. Parent not detached on child cancellation (ClineProvider.cancelTask): Cancelling a delegated child left the parent in delegated status with awaitingChildId still pointing at the cancelled child. Now cancelTask transitions the parent back to active and clears awaitingChildId before rehydrating the child, while preserving the persisted parentTaskId/rootTaskId lineage for history grouping.

  3. Child started despite parent metadata failure (ClineProvider.delegateParentAndOpenChild): If updateTaskHistory failed when persisting the parent's delegation metadata, the child was started anyway with corrupted state. Now createTask accepts startTask: false so the child is created but not started until metadata is persisted. On failure, the paused child is removed from the stack, its history entry is deleted, and the parent is restored from history before the error is surfaced.

The e2e test was rewritten to exercise the actual cancellation-resumption flow: it cancels the delegated child mid-followup, resumes the same child (rather than starting a fresh task), validates the follow-up tool result content, and asserts clean completion without parent revival.

Why other VS Code e2e suites changed

This PR distinguishes two cleanup operations:

  • cancelCurrentTask() simulates a user cancellation and may rehydrate the current task so it can be resumed.
  • clearCurrentTask() removes the current task without entering the cancellation-resumption flow.

The existing tool e2e suites used cancelCurrentTask() as teardown. Once the subtasks cancellation lifecycle was exercised correctly, that teardown behavior could leave rehydrated tasks active between tests. The affected suites now use clearCurrentTask() for cleanup because they are not testing cancellation behavior.

Only subtasks.test.ts intentionally uses cancelCurrentTask().

Interim mitigation

These lifecycle fixes are intentionally bounded mitigations for the current single-task execution model. They preserve recoverability and add regression coverage for races exposed by the unskipped e2e replay.

The broader architectural fix is tracked in #355. That work introduces explicit task lifecycle ownership, atomic history transitions, startup reconciliation, and TaskScheduler at maxConcurrency = 1 before enabling parallel execution.

The abort path also avoids rejecting from fire-and-forget assistant-message presentation. The awaited task loop now exits its pWaitFor when aborted or abandoned and throws from that owned control-flow boundary.

Test Procedure

  • TEST_FILE=subtasks.test pnpm --dir apps/vscode-e2e test:ci:mock — e2e replay covering the full cancel → resume → complete → parent-stays-dormant flow
  • pnpm --dir src test core/tools/__tests__/attemptCompletionTool.spec.ts — unit tests for the parent-awaiting-child guard (delegates when matched, falls through when detached)
  • pnpm --dir src test core/webview/__tests__/ClineProvider.flicker-free-cancel.spec.ts — unit tests for runtime parent-link detachment on cancel and graceful fallback when detach fails
  • pnpm --dir src test src/__tests__/provider-delegation.spec.ts — unit test for child cleanup when parent metadata persistence fails

Pre-Submission Checklist

  • Issue Linked: #355 tracks the broader task-scheduler and lifecycle-ownership refactor.
  • Scope: Changes are focused on delegated subtask cancellation lifecycle and its test coverage. The other VS Code e2e edits only migrate teardown from cancelCurrentTask() to clearCurrentTask().
  • Self-Review: Done.
  • Testing: New unit tests and rewritten e2e test cover all three fixes.
  • Documentation Impact: No documentation updates required.
  • Contribution Guidelines: Read and agreed.

Screenshots / Videos

N/A — no UI changes. This is delegated-task lifecycle and e2e replay coverage work.

Documentation Updates

  • No documentation updates are required.

Additional Notes

  • Codecov patch coverage: 95.65% (2 lines uncovered in error-handling catch paths).
  • All CI checks pass: unit tests (ubuntu + windows), e2e-mock, CodeQL, knip, compile.

Related PRs

@roomote roomote Bot added the roomote:auto-resolve-conflicts Allow Roomote to auto-resolve merge conflicts for this PR label May 13, 2026
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 13, 2026

No actionable issues found. See task

  • apps/vscode-e2e/src/suite/subtasks.test.ts is unskipped again on the current head.
  • The spawned-child wait now follows the active task from the end of api.getCurrentTaskStack(), so the replay reaches the delegated cancellation path instead of timing out on the parent task.
  • apps/vscode-e2e/src/suite/subtasks.test.ts: the cancellation probe now resumes cancelledChildTaskId, so the replay exercises the cancelled-child completion path this suite is meant to cover.
  • apps/vscode-e2e/src/fixtures/subtasks.ts: the follow-up fixtures now validate the actual tool result payload.
  • src/core/webview/ClineProvider.ts: clearing runtime parent references on the rehydrated cancelled child no longer detaches its persisted lineage, so resumed cancelled subtasks keep their back-to-parent affordance and history grouping.
  • src/core/webview/ClineProvider.ts: if parent metadata persistence fails after creating the paused child, the cleanup path now removes that child from task history instead of leaving a phantom active no_messages subtask behind.
  • src/core/webview/ClineProvider.ts: if parent metadata persistence fails after creating the paused child, the rollback now reopens the disposed parent and restores its mode before surfacing the failure.
  • e2e-mock passed on the current head.
  • platform-unit-test (ubuntu-latest) passed on the current head.
  • platform-unit-test (windows-latest) passed on the current head.
  • codecov/patch passed on the current head.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 85.38961% with 45 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/core/webview/ClineProvider.ts 86.14% 35 Missing and 2 partials ⚠️
src/core/task/Task.ts 0.00% 6 Missing ⚠️
.../core/assistant-message/presentAssistantMessage.ts 0.00% 1 Missing ⚠️
src/core/tools/AttemptCompletionTool.ts 96.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@roomote roomote Bot force-pushed the chore-unskip-e2e-use-mcp-tool branch from 9f0ddd1 to c31a59a Compare May 17, 2026 15:11
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 17, 2026

I see some merge conflicts here. Working on them now...

@edelauna edelauna force-pushed the chore-unskip-e2e-use-mcp-tool branch from 879e0e8 to 2a54c07 Compare May 17, 2026 23:02
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 17, 2026

I see some merge conflicts here. Working on them now...

@edelauna edelauna force-pushed the chore-unskip-e2e-use-mcp-tool branch from 77052fc to fa293b2 Compare May 19, 2026 02:19
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 19, 2026

I see some merge conflicts here. Working on them now...

@edelauna edelauna force-pushed the chore-unskip-e2e-use-mcp-tool branch from fa293b2 to c946c38 Compare May 19, 2026 02:36
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 19, 2026

I see some merge conflicts here. Working on them now...

Base automatically changed from chore-unskip-e2e-use-mcp-tool to main May 19, 2026 12:48
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 19, 2026

I see some merge conflicts here. Working on them now...

@roomote roomote Bot force-pushed the chore-unskip-e2e-subtasks branch from 895516b to ca06a86 Compare May 19, 2026 12:57
@roomote roomote Bot closed this May 19, 2026
@roomote roomote Bot force-pushed the chore-unskip-e2e-subtasks branch from ca06a86 to 7de61e6 Compare May 19, 2026 12:58
@roomote roomote Bot reopened this May 19, 2026
@roomote roomote Bot force-pushed the chore-unskip-e2e-subtasks branch from f62a72d to 9aedf31 Compare May 19, 2026 16:21
@roomote
Copy link
Copy Markdown
Contributor Author

roomote Bot commented May 19, 2026

Addressed the three open review threads and pushed the follow-up in d5d5ae3.

The shipped diff now keeps the subtasks probe on the same cancelled child task, captures that resumed child's final completion_result and error messages, and asserts the task stack still ends on the cancelled child instead of spinning up a fresh anotherTaskId. subtasks.ts now validates the actual follow-up answer with toolResultContains(...) and adds a resumed-child replay fixture so mock mode only completes after the resumed follow-up path really answers with 9. ClineProvider.cancelTask() also clears the delegation link before rehydrating a user-cancelled delegated child, with focused unit coverage in ClineProvider.flicker-free-cancel.spec.ts, so finishing that cancelled child no longer reopens the parent it was cancelled from.

Validation on the pushed head is clean: TEST_FILE=subtasks.test pnpm --dir apps/vscode-e2e test:ci:mock, pnpm --dir src test core/webview/__tests__/ClineProvider.flicker-free-cancel.spec.ts, pnpm --dir apps/web-roo-code test, and http://127.0.0.1:3000 all passed. Browser proof was not applicable because this is delegated-task lifecycle and e2e replay coverage work, not a rendered UI change.

Comment thread apps/vscode-e2e/src/suite/subtasks.test.ts Outdated
Comment thread apps/vscode-e2e/src/fixtures/subtasks.ts Outdated
Comment thread apps/vscode-e2e/src/suite/subtasks.test.ts Outdated
Comment thread src/core/webview/ClineProvider.ts Outdated
@edelauna edelauna force-pushed the chore-unskip-e2e-subtasks branch 4 times, most recently from 9f2e99a to da52b4e Compare May 23, 2026 21:25
Comment thread apps/vscode-e2e/src/suite/subtasks.test.ts Outdated
@edelauna edelauna force-pushed the chore-unskip-e2e-subtasks branch from 7cb1196 to 620fe7c Compare May 25, 2026 12:57
taltas
taltas previously approved these changes May 26, 2026
@edelauna
Copy link
Copy Markdown
Contributor

Reminder: This PR is better served by the parallelizable tasks refactor

After a deeper look at the subtask architecture, the whack-a-mole edge cases here are symptoms of an architectural constraint rather than gaps in the test coverage.

The core issue: delegateParentAndOpenChild() and reopenParentFromDelegation() have 7–9 carefully ordered steps specifically to work around three race conditions caused by parent and child sharing a single globalState write path (saveClineMessages → updateTaskHistory). E2e tests expose the timing windows that the step ordering doesn't fully close.

What to do with this PR:
Narrow it to regression tests that pin the existing race mitigations in place (so they don't silently break during the refactor). Avoid expanding the e2e surface here.

The real fix — parallelizable tasks refactor (incremental, no feature flag needed for most of it):

  1. De-staticize Task.lastGlobalApiRequestTime (safe, no behavior change) — unblocks concurrent task instances
  2. Add TaskScheduler with proper task lifecycle ownership — replaces the hand-sequenced delegation with atomic state transitions
  3. Add delegated as a first-class state in agent-state.ts (currently invisible to the CLI state machine)
  4. Only gate parallel tool execution itself behind an experiment flag (PARALLEL_TOOL_EXECUTION) — everything else is additive

With proper task isolation, the state transitions become deterministic and the e2e tests become straightforward.

@edelauna edelauna dismissed taltas’s stale review May 27, 2026 20:50

After review, I'm going to implement / re-enable these e2e tests differantly.

@edelauna edelauna force-pushed the chore-unskip-e2e-subtasks branch 5 times, most recently from e65249b to be02bf6 Compare May 31, 2026 02:10
@edelauna edelauna force-pushed the chore-unskip-e2e-subtasks branch from be02bf6 to 1305342 Compare June 1, 2026 00:42
Copy link
Copy Markdown
Contributor

@edelauna edelauna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prod changes reviewed as correctness changes revealed by e2e tests, not new features

@edelauna edelauna added this pull request to the merge queue Jun 1, 2026
Merged via the queue into main with commit 3df406e Jun 1, 2026
11 checks passed
@edelauna edelauna deleted the chore-unskip-e2e-subtasks branch June 1, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

roomote:auto-resolve-conflicts Allow Roomote to auto-resolve merge conflicts for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants