Skip to content

[ENHANCEMENT] Track todo progress step by step instead of all at once #342

@melck

Description

@melck

Problem (one or two sentences)

The current update_todo_list tool has three recurring issues: the full list is resent on every status change (costly), LLMs tend to batch all updates at the end instead of updating incrementally, and the markdown string format causes parsing failures when models emit literal \n instead of real newlines.

Context (who is affected and when)

Any user relying on the todo panel to track multi-step task progress. The batching issue makes the panel useless during execution; the newline issue makes the list unreadable (all items on one line); and the token cost grows with every update call for long tasks.

Desired behavior (conceptual, not technical)

  1. The todo list is created once at the start of a task with all planned steps.
  2. Each status change (start / complete / fail a step) is a lightweight atomic call — not a full list replacement.
  3. The panel reflects real-time progress after each individual step.
  4. A step that fails can be marked as such, not silently left as in_progress.

Constraints / preferences (optional)

  • Must not increase token consumption vs. the current implementation.
  • The current update_todo_list could be kept for backward compatibility or bulk resets.
  • Silent failures on invalid status transitions should surface an error to the model instead of returning false unnoticed.

Acceptance criteria (optional)

Given a multi-step task is planned
When the model calls create_todo_list(["Step A", "Step B", "Step C"])
Then the server returns short stable IDs: [{ id: "t0", content: "Step A" }, ...]

Given a step is starting
When the model calls update_todo("t1", "in_progress")
Then only that item is updated, no full list is resent

Given a step fails
When the model calls update_todo("t1", "failed")
Then the item is marked failed and the model can continue or stop

Given the model passes an invalid transition
When update_todo is called with an illegal status
Then an explicit error is returned to the model (not a silent false)

But the model does NOT send the full list on every status transition

Proposed approach (optional)

Replace the single tool with two:

create_todo_list(todos: string[]) → TodoItem[]
  Returns: [{ id: "t0", content: "Step A", status: "pending" }, ...]

update_todo(id: string, status: "in_progress" | "completed" | "failed")
  Returns: updated item, or error if transition is invalid

Key design decisions:

  • todos is a plain string[] (not objects) — compact at creation
  • IDs are short sequential strings (t0, t1...) generated server-side and returned at creation — the model reuses them, no calculation needed
  • Add "failed" to todoStatusSchema — currently missing, leaving models stuck when a step errors
  • Invalid transitions return an explicit error instead of silently returning false (current bug in updateTodoStatusForTask)

Token cost comparison (5-item task, 10 status transitions):

Approach Schema create 10 × updates Total
Current (markdown, full list) ~200 incl. ~180 ~380
JSON full list replacement ~200 incl. ~450 ~650
Proposed split (shortId) ~300 ~40 ~70 ~410

The split approach is token-neutral vs. current for short tasks and becomes increasingly efficient for longer tasks (10+ items). A JSON full-list replacement would be 70% more expensive and is not recommended.

Trade-offs / risks (optional)

  • Two tool schemas cost ~100 more tokens in the system prompt than one — break-even is around 8 status transitions.
  • The current update_todo_list (full list replace) remains useful for bulk resets or reordering mid-task; keeping it alongside the new tools avoids a breaking change.
  • If IDs are lost from context (e.g., after compaction), the model cannot call update_todo. Mitigation: create_todo_list could be idempotent and re-callable to retrieve IDs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions