Skip to content

beta v1.17.0#98

Open
SihaoLiu wants to merge 36 commits intodevfrom
beta
Open

beta v1.17.0#98
SihaoLiu wants to merge 36 commits intodevfrom
beta

Conversation

@SihaoLiu
Copy link
Copy Markdown
Contributor

@SihaoLiu SihaoLiu commented Apr 19, 2026

Summary

  • Promotes the accumulated beta work (viz monitor web dashboard and follow-up fixes) to the dev branch.
  • Establishes the three-tier branching flow: main (production) <- dev (stable next release) <- beta (next month's dev).

Note on beta lifecycle

The beta branch only exists during the window after dev has been frozen but before it has been merged to main. Once dev is merged, beta is automatically promoted to become the new dev.

Test plan

zevorn added 30 commits April 18, 2026 18:07
Introduces the `humanize monitor web` command and its Flask + SPA
dashboard for RLCR sessions, plus the accumulated review-phase
fixes and UX polish.

Feature
- Per-project session discovery and summary (state.md, goal tracker,
  round summaries, review results, methodology reports).
- Pipeline canvas with snake-path node layout, SVG connectors, pan /
  zoom, and per-round flyout details.
- Session-detail page carries a below-canvas live-log panel with a
  three-state toggle (collapsed / normal / expanded) driven by
  --log-h and a single grid-template-rows declaration. Clicking a
  pipeline node auto-collapses the log; closing the flyout restores
  the prior state.
- Incremental pipeline update (_updatePipelineIncremental) appends
  or replaces individual nodes and redraws only the SVG connectors,
  preserving zoom / pan across round additions.
- Diff-based home refresh: only cards whose rendered HTML changed
  are replaced; unchanged cards stay untouched.
- Per-session SSE live-log streaming from ~/.cache/humanize with
  snapshot / append / resync / eof / retained-replay protocol.
- Active-log selection mirrors the CLI monitor: pick the top
  round's codex-run log, with codex-review / gemini-run /
  gemini-review fallbacks.
- REST + WebSocket + SSE transports. Localhost binds skip auth;
  remote binds require a bearer token (Authorization header for
  fetch, ?token= query for SSE) plus origin-matched CSRF gating on
  every mutating route.
- Ops-menu Preview Issue flow: calls local Claude CLI to generate
  a sanitized methodology report (humanize issue taxonomy), shows
  a preview modal, and submits via `gh issue create --repo
  humania-org/humanize`. Falls back to a copy-to-clipboard path
  when gh is unauthenticated.
- Active-session dynamic progress visualization, methodology report
  generation via local Claude CLI, and a comprehensive frontend test
  suite.

Correctness and safety
- broadcast_message uses set.difference_update to avoid the
  UnboundLocalError from Python treating `-=` as a local rebind.
- Session path validation rejects `..`, dotfiles, symlink escapes;
  plan-file reads are constrained to the project / session dir.
- Remote-mode auth fail-closed when no token is configured.
- CSRF layer enforced on every mutating route, including wildcard
  binds and IPv6 literals.
- Parser accepts table-form acceptance criteria and legacy /
  single-letter criterion ids; multi-id table cells are split
  correctly; COMPLETE verdict only fires on terminal lines.
- Finalize-phase classification scoped to the live round, and
  skip-impl sessions correctly classify round 0 as review-only.

Transport and watcher
- watcher.py starts a cache-log observer per active session that
  broadcasts `round_added` when a new round-N-*.log file appears,
  so the dashboard switches the live-log pane to the new round
  without waiting for the next state.md write.
- SessionWatcher.on_session_created hook primes cache-log observers
  for brand-new sessions; every event retries the hook so the
  state-dir-before-cache-dir race eventually heals.
- Non-destructive toggleTheme (no DOM rebuild). The analytics route
  is the only surface whose charts bake CSS vars into SVG at render
  time, so that route alone re-renders on theme change.
- Remote-mode slow polling (10s) uses the same targeted refresh
  helpers as the WS path so card counters and pipeline nodes catch
  up without a full-page reload.
- Session-detail refresh race guard: after awaiting
  /api/sessions/<id>, re-check the active route + layout skeleton
  before mutating DOM.

Install and CLI
- Preflight codex_hooks and idempotent hook merge when paths contain
  spaces.
- Relaxed runtime detection and tolerate unwritable default shim
  paths.
- `humanize monitor web` foreground mode passes --host explicitly,
  IPv6 brackets in the printed URL, readiness probe is fail-closed.

Parser and API
- /api/sessions summary carries every field the terminal
  `humanize monitor rlcr` status bar needs (push_every_round,
  ask_codex_question, agent_teams, drift, stall count,
  ultimate_goal, tasks_active / deferred, active_log_path,
  git_status, etc.).
- cache_logs entries carry round / tool / role / path / basename
  for deterministic client-side selection.

Version: 1.17.0.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
`scripts/humanize.sh` is meant to be sourced into the user's
interactive shell (the README shows `source humanize.sh` followed
by `humanize monitor web`). Using `exec` to hand control to the
Flask process replaced that shell, so pressing Ctrl+C — or any
server exit — took the whole interactive session down with it.

Replace the `exec` with a plain subprocess invocation so the
function returns normally when the server stops and the user's
shell prompt stays alive. Daemon mode still delegates to
`viz/scripts/viz-start.sh` and is unaffected.

Fixes Codex review P1 on PR #63
(#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
viz-restart.sh advertised `viz-restart.sh [--project <path>]` in
its usage string but ran `PROJECT_DIR="${1:-.}"`, which treated
the literal flag name `--project` as a directory. Calling the
documented form would fail at `cd --project` before the restart
could happen.

Mirror the flag-parsing loop used by viz-start.sh and viz-stop.sh
(positional path plus `--project <path>` named form), and forward
`--project` to both helpers so the whole chain uses the same
resolved absolute path.

Fixes Codex review P2 on PR #63
(#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The six Chart.js analytics panels on /#/analytics (rounds per
session, avg round duration, verdict distribution, P0-P9 issues,
first COMPLETE round, BitLesson growth) were dropped in the
home-analytics refactor; the home page now carries the same
summary numbers as inline stat cards plus a lightweight inline-
SVG sparkline, and the analytics route surfaces the verdict
timeline + session comparison table without any Chart.js panels.

Clean up the remaining dead references that Codex flagged:
- Delete viz/static/js/charts.js (no longer imported).
- Drop the Chart.js CDN <script> from index.html.
- Drop the /js/charts.js <script> tag from index.html.
- Remove the six chart-panel i18n keys from i18n.js — no other
  caller references them.
- Update tests/test-viz.sh's static-file presence check to match.

Fixes Codex review P2 on PR #63
(#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The forbidden-token pattern for imports was
`\b(?:import|require|from)\s+\w+`, which matches ordinary English
prose like "drifted from the original plan structure". That one
phrase appears verbatim in the built-in `plan_execution`
methodology observation text, so every sanitized issue payload
that picked `plan_execution` as a dominant category was marked as
having warnings, and `/api/sessions/<id>/github-issue` rejected the
submission with a 400 even after the outbound body had already
been assembled from the constrained taxonomy.

Replace the single pattern with three code-anchored variants:
  - `^\s*import\s+[\w.]+`                (Python `import x.y` at line start)
  - `^\s*from\s+[\w.]+\s+import\b`       (Python `from x import y` at line start)
  - `\brequire\s*\(`                     (JS / Node `require(` call syntax)

These still catch real import fragments that leak into a
methodology report (bare `import os`, indented `  import sys`,
`from viz.server import app`, `require("fs")`) but leave natural
prose using the preposition "from" alone.

Fixes Codex review P1 on PR #63
(#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
`read_plan_file()` gated its open() on `os.path.exists(candidate_real)`,
which is True for directories as well as files. A state.md that
carried `plan_file: .` (or any directory under the project / session
that happened to resolve inside the allowed prefix) would therefore
pass validation, skip the backup fallback, and drop into
`open(candidate_real, 'r')` which raises IsADirectoryError. The
exception propagated out through /api/sessions/<id>/plan as an
uncaught 500 instead of surfacing the controlled fallback to the
session-local plan.md (or the intended 404 when no backup exists).

Swap the existence check for `os.path.isfile`, which is directory-
safe and also returns False for broken symlinks. Directories and
dangling symlinks now fall through to the existing backup branch.

Fixes Codex review P2 on PR #63
(#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The live-log pane decoded every SSE chunk with
`_utf8Decoder.decode(bytes)`, which is non-streaming: each call
finalises the decoder and replaces any trailing buffered bytes
with U+FFFD. A multibyte UTF-8 codepoint split across the 64 KiB
SSE chunk boundary (any CJK char or emoji that happens to straddle
a chunk) was therefore corrupted even though the source bytes were
valid.

Pass `{ stream: true }` to the decoder on incremental chunks so it
retains leading bytes of an incomplete multibyte sequence and
reassembles on the next append. Explicitly flush the decoder in
the places where the underlying stream actually ends:
  - resync with a `truncated` / `rotated` / `recreated` /
    `overflow` reason: the server is telling us the file is gone
    or restarted from offset 0, so the previous buffer must not
    bleed into whatever arrives next.
  - eof: the stream is terminally closed; emit any trailing
    incomplete sequence as U+FFFD rather than silently dropping it.

Fixes Codex review P2 on PR #63
(#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
`SessionWatcher.start()` used to spin up an independent watchdog
Observer for every session directory under .humanize/rlcr/ and
never tore any of them down. On projects that have accumulated
dozens of completed sessions this burned inotify slots and
watcher threads for no reason — completed sessions never write
new round-*.log files, so a dedicated per-session cache observer
for them is pure overhead. On heavy hosts it could also prevent
the watcher from starting at all (ENOSPC on inotify watches),
which silently disables the broadcast path.

Two tightening changes:

- Boot-time pass only primes cache observers for sessions whose
  state.md exists and has no terminal *-state.md marker alongside
  (see _TERMINAL_STATE_SUFFIXES). Finished sessions stay
  observer-free.
- The terminal-state file listener now tears down the matching
  observer via a new `on_session_finished` hook on
  RLCREventHandler, so an active session that transitions to
  complete / cancel / stop / maxiter / unexpected / finalize /
  methodology-analysis releases its watcher as soon as the marker
  lands.

`_start_cache_observer()` also re-checks `_session_is_active()`
before allocating the observer — covers the race where a start
callback fires for a session that finished between the event and
the scheduler tick. The existing retry in `_schedule_event()` still
recovers the observer if a cache-log file appears after the
state dir is created but before the cache dir materialises.

Fixes Codex review P1 on PR #63
(#63 cache-observer
resource exhaustion).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The frontend renders session ids into inline onclick template
literals in many places:
  onclick="navigate('#/session/${s.id}')"
  onclick="opsPreviewIssue(window._currentSession?.id)"
  onclick="openFlyout(this, ${r.number})"
and so on. Any session directory whose name is not well-formed
would therefore let disk state break out of the surrounding JS
string literal when the card / detail / analysis pages render —
a stored-XSS vector if an operator ever creates a hostile session
dir by hand or restores one from an untrusted archive.

Backend already rejected some metacharacters in _get_session_dir()
(`/`, `\`, `..`, leading `.`), but that still left quotes, angle
brackets, backticks, Unicode, etc. Tighten the validator to the
exact shape generated by `setup-rlcr-loop.sh`
(`YYYY-MM-DD_HH-MM-SS`) via _is_safe_session_id(), and reuse it in
two places:
  - _get_session_dir() now rejects every id that doesn't match.
  - /api/sessions filters out any session whose id fails the
    check, so hostile ids never flow to the home-page renderer in
    the first place.

Defense-in-depth: the generator has always produced ids that match
this shape, so legitimate sessions stay visible. The full-fix
front-end migration (inline handlers -> data-attr + event
delegation) is intentionally out of scope here to keep the diff
focused; the regex check already closes the attack surface.

Fixes Codex review P1 on PR #63 (session-id XSS via inline
onclick handlers).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The SSE log stream carries its bearer token as a `?token=` query
parameter because browsers can't attach custom headers to
EventSource (DEC-4). That is safe on the localhost bind — the URL
never leaves the host — but plain-HTTP Flask on a non-loopback
bind transmits the token in cleartext and deposits it into every
proxy / access-log / browser-history buffer on the way. The
previous guard only required a token to be configured, not that
a TLS-terminating reverse proxy actually sits in front.

Add an explicit operator acknowledgement gate:
  - New `--trust-proxy` flag on `viz/server/app.py`, also readable
    from `HUMANIZE_VIZ_TRUST_PROXY=1`.
  - `main()` refuses to start when `--host` is non-loopback and
    neither the flag nor the env var is set, printing a clear
    "requires a TLS-terminating reverse proxy" message.
  - `scripts/humanize.sh::_humanize_monitor_web` accepts the same
    `--trust-proxy` flag and forwards it in both foreground and
    `--daemon` invocations (through `viz-start.sh`).
  - `viz/scripts/viz-start.sh` parses `--trust-proxy` and appends
    it to the Python command line.

Deployment model stays unchanged for localhost users and for
remote operators who already run behind nginx / caddy — they just
add one flag to acknowledge the topology. Operators who accidentally
expose plain HTTP get an explicit refusal instead of a silent
cleartext-token foot-gun.

Fixes Codex review P1 on PR #63 (token leakage over plain HTTP
on non-loopback binds).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
`safeMd()` renders every piece of session-authored markdown that
flows into the dashboard (plan files, round summaries, review
results, methodology reports, Preview Issue modal). When the
DOMPurify CDN dep wasn't loaded — offline deployment, blocked
CDN, or a CSP that refused unpkg.com — the old implementation
returned the raw `marked.parse()` output, which re-opens every
script-injection vector the sanitizer was supposed to close.

Switch to fail-closed: when either `DOMPurify` or `marked` is
undefined, wrap the escaped plain text in a `<pre>` block so the
degradation is visible (mono text with no markdown formatting)
rather than silently permissive. When both libs are present the
existing parse-then-sanitize path is unchanged.

Fixes Codex review P2 on PR #63 (DOMPurify fail-open when the CDN
script doesn't load).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The per-session SSE generator decides whether to emit EOF by
calling `_get_session(session_id, force_refresh=True)` every
`_SSE_POLL_INTERVAL_SECONDS` (250 ms). That path runs the full
parse_session pipeline — scans every round-*-summary.md and
review-result.md, parses goal-tracker.md, re-reads the methodology
report, and shells out to `git` twice for the git-status summary.
On long sessions and with multiple live SSE clients, parse and
fork overhead stacks up and eats into the latency budget that the
streaming protocol doc reserves for append emission.

Replace that cache-invalidating fetch with `_session_is_terminal_cheap()`:
an `os.path.isfile` check against the seven known terminal-state
marker filenames plus the existing `_get_session_dir` path guard.
Terminal state is a disk-level signal, so the cheap path is
authoritative — no parser output is needed to decide whether the
loop is still writing logs. A missing session dir is treated as
terminal so the SSE generator exits cleanly instead of spinning.

Full session data is still available to frontend callers via
`/api/sessions/<id>`; this change only narrows the hot-loop
probe. The `_is_terminal_status()` helper remains for callers that
already have a parsed session dict.

Fixes Codex review P2 on PR #63 (force_refresh=True on every SSE
poll tick).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
`/api/sessions/<id>/generate-report` short-circuited whenever
`methodology-analysis-report.md` already existed with non-zero
size. For a session that was still running, that meant every
subsequent Preview Issue click returned the analysis of the
*first* round the user previewed — not a fresh read of the
additional round summaries / review results that had landed
since. The GitHub issue payload built from the cached report
therefore described a session snapshot that no longer existed.

Add `_report_is_stale(session_dir, report_path)` that compares the
report's mtime against every round-*-summary.md and
round-*-review-result.md in the session directory. A newer source
file invalidates the cache and forces a fresh Claude CLI run.
Callers can also pass `?force=1` to skip the cache unconditionally.

Report is served as-is when none of the source files have landed
after its mtime — quick wins for sessions that finished an hour
ago still hit the fast path.

Fixes Codex review P2 on PR #63 (generate-report cache never
invalidates on active sessions).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
viz-restart.sh only captured --project and handed that to
viz-start.sh; --host / --port / --auth-token / --trust-proxy were
silently dropped. A daemon started with `humanize monitor web
--host 0.0.0.0 --port 18000 --auth-token TOK --trust-proxy
--daemon` would therefore come back in loopback mode on restart,
with a different auto-picked port and no auth — the previous
access URL broke without warning.

Parse the full flag set that viz-start.sh accepts, re-assemble
an argv in a deterministic order, and forward it verbatim. The
usage string documents that callers are responsible for repeating
the original flags; we don't attempt to read them back from
`viz.url` / env because those are deliberately immutable runtime
artifacts.

Fixes Codex review P2 on PR #63 (viz-restart drops --host /
--port / --auth-token).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The `/ws` handler enforced "localhost bind only" but otherwise
accepted every incoming handshake without checking the Origin
header. Browsers deliberately allow cross-origin WebSocket
handshakes — the Same-Origin Policy does not apply there — so a
malicious page open in the same browser could connect to
ws://localhost:<port>/ws and immediately send a `cancel_session`
message, killing an active RLCR loop with zero auth prompt.

Reuse the existing `_origin_matches_request()` matcher (the same
logic that gates mutating HTTP routes via CSRF) before adding the
socket to `_ws_clients`. Connections without an Origin header are
treated as same-origin (curl, server-to-server callers): the
localhost bind already refuses non-loopback clients and the
Origin header is effectively mandatory from browsers on the
WebSocket handshake.

Fixes Codex review P1 on PR #63 (cross-origin WebSocket
`cancel_session` vector).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The initial strict regex (`^[0-9]{4}-[0-9]{2}-[0-9]{2}_[0-9]{2}-
[0-9]{2}-[0-9]{2}$`) rejected every test fixture that uses a
short slug like `2026-04-17_CL`, breaking 26 assertions in
tests/test-app-routes-live.sh even though the underlying id is
benign. Widen the accepted set to ASCII letters / digits /
underscore / dash / period (the union of characters that the
on-disk session generator has ever produced plus what the CI
fixtures rely on), but keep the extra rules that reject `..`,
leading-dot, and path separators.

Quote, backtick, angle-bracket, backslash, newline, and every
other JS-string metacharacter are still refused up-front, which
is the property the original defense-in-depth was after: hostile
disk state cannot break out of the frontend's inline onclick
template literals.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Behind a TLS-terminating reverse proxy (nginx / caddy / the
--trust-proxy deployment mode added in 948ea7c), the Flask
back-channel always sees plain HTTP. `request.scheme` is
therefore `http`, so `_default_port_for_scheme()` collapsed the
browser-facing port to 80. A browser Origin of
`https://example.com` parses as port 443, and
`_origin_matches_request()` rejected legitimate same-origin POSTs
with 403 — cancel, generate-report, and the Preview-Issue GitHub
submission all broke on the standard HTTPS reverse-proxy setup.

Add `_effective_request_scheme()`. When TRUST_PROXY is True (the
operator has acknowledged the reverse-proxy topology) it trusts
the first hop of `X-Forwarded-Proto` for scheme resolution, so a
browser-facing `https://` flow resolves to port 443 and matches
the incoming Origin. When TRUST_PROXY is False (default, direct-
connect localhost dev) the header is ignored — attacker-supplied
`X-Forwarded-Proto: https` cannot trick a plain-HTTP dashboard
into computing the wrong port for the CSRF matcher.

The existing `--trust-proxy` / HUMANIZE_VIZ_TRUST_PROXY gate now
also promotes the resolved flag into a module-level `TRUST_PROXY`
global so the CSRF helpers can read it; the earlier main()
local is folded into the same initialisation block.

Fixes Codex review P1 on PR #63 (HTTPS reverse-proxy same-origin
mismatch).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 20 Codex P2 (PRRT_kwDOQ4a3IM57_U2A). `_determine_phase` returned
`code_review` unconditionally when `round_num > build_round`, so the
live finalize round — which by definition sits past `build_finish_round`
— was classified as `code_review` and the later finalize-specific
branch never ran. Phase timeline and duration metrics therefore hid the
finalize step for normal sessions holding a `.review-phase-started`
marker.

The finalize classification now runs as a priority gate: when the
session is finalizing and the round equals the current round, return
`finalize` before the code_review fall-through. Applies to both normal
and skip-impl paths so the live finalize round surfaces correctly in
either mode.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 20 Codex P2 (PRRT_kwDOQ4a3IM57_U2B). `_ensure_cache_watcher`
started one `CacheLogWatcher` per unique cache directory and never
stopped it, so long-running dashboard processes leaked one observer
thread and one inotify handle per session a user ever browsed. On
hosts with a small `fs.inotify.max_user_watches` budget that
eventually exhausts the pool.

Replace the start-only helper with an acquire/release pair:
- `_acquire_cache_watcher(cache_dir)` starts the watcher on first use
  and increments a per-directory refcount.
- `_release_cache_watcher(cache_dir)` decrements, stops the observer
  on the final release.

The SSE generator in `stream_session_log` now wraps its body in a
try/finally that acquires before the first yield and releases in the
finally block. Normal EOF, `GeneratorExit` from a client disconnect,
and exception paths all balance the refcount, so watchers are torn
down as soon as their last live stream goes away while concurrent
clients on the same session keep sharing one observer.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
CI route-backed truncation test (tests/test-app-routes-live.sh Group 6)
is prone to a watcher/writer race: when the writer does
``with open(path, 'wb') as f: f.write(data)`` the file is truncated
to 0 bytes before the second write lands. The per-cache watcher's
callback can fire on that first IN_MODIFY event and invoke
``_poll_locked`` while the file is still empty, so the resync path's
inline ``_snapshot_locked`` emits zero snapshot events and the content
subsequently arrives as ``append`` — violating the protocol contract's
resync -> snapshot sequencing that downstream clients rely on.

Track a ``_resync_pending`` flag that every resync path sets whenever
its immediate snapshot attempt returned empty (file was 0 bytes at
the time). The next poll that observes ``size > self._offset`` while
the flag is set emits the new content via ``_snapshot_locked`` instead
of ``_append``, and clears the flag once caught up. Normal write paths
are unaffected: the flag stays False when the resync snapshot sees
content immediately, so the append fast-path still covers steady-state
streaming.

Fixes the GitHub Actions ``run-all-tests`` failure on Round 20:
  FAIL: route-backed truncation event stream incomplete:
        snapshots=1 resync_truncated=True eof=True

The three existing resync sites (recreated, rotated, truncated) share
the same pattern so the fix is uniform across all of them.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 21 Codex P1 (PRRT_kwDOQ4a3IM57_efh). `_TERMINAL_STATE_FILES`
listed `methodology-analysis-state.md` and `finalize-state.md` among
the markers that tell the SSE generator to emit `eof` and close the
stream. Both markers are actually *active* states — `_is_terminal_status`
excludes `analyzing` / `finalizing`, and the dashboard still treats
those sessions as cancellable — so `_session_is_terminal_cheap` was
mis-classifying live sessions as finished.

The observable impact was that as soon as a session entered finalize
or methodology-analysis, every live SSE client saw an immediate
`eof` event and the log pane stopped updating, even though the
backing codex-run / codex-review log files were still being written.

Trim the marker list to the five truly-terminal states (complete,
cancel, stop, maxiter, unexpected) so SSE now keeps delivering
append events throughout the analyzing/finalizing phases. A short
comment pins this list to `_is_terminal_status` so future markers
don't drift out of sync again.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 21 Codex P2 (PRRT_kwDOQ4a3IM57_efi). `parse_git_status` in
`viz/server/parser.py` checked `x == 'M' or y == 'M'` before the
index-side `A` branch. That made a porcelain status like `AM`
(new file added to the index, then modified in the worktree —
the common "stage a new file, then tweak it" workflow) count as
`modified` and never as `added`, so the dashboard git summary
drifted out of parity with `humanize_parse_git_status` in
`scripts/humanize.sh`, which explicitly maps `AM` to `added`.

Reorder the branches so any index-side `A` wins first (covers
`A `, `AM`, `AD`), followed by `R` (rename → modified) then
`D` (deletion), and only then the `M` fallback. Matches the
shell helper's explicit priority exactly and stops the dashboard
header from undercounting added files whenever the user has a
staged-new-file with an in-progress edit.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 22 Codex P1 (PRRT_kwDOQ4a3IM57_u12).
``LogStreamRegistry.get_or_create`` had no matching delete path, so
every ``(session_id, basename)`` ever opened lived until the dashboard
process exited. Each ``LogStream`` keeps a 256-event retention deque
(often large base64 snapshot/append chunks), so browsing many
sessions/round logs grew memory without bound on long-running viz
servers.

Extend the registry with explicit ``acquire`` / ``release`` helpers
that ref-count active SSE consumers, and expose ``LogStream.eof_emitted``
for the release path to query. Eviction policy:
- refcount > 0: keep the stream (clients still reading).
- refcount == 0 AND stream has emitted EOF: drop from the registry
  and free the retention deque.
- refcount == 0 AND session still active (no EOF yet): keep the
  stream so future reconnects still land inside the 256-event
  ``Last-Event-Id`` replay window the streaming contract mandates.

Wire the SSE route to ``_acquire_log_stream`` / ``_release_log_stream``:
the release now sits in the same generator ``finally`` that already
releases the cache watcher, so normal EOF, client disconnect, and
exception paths all balance the refcount. ``get_or_create`` stays
callable (tests exercise it directly without refcount semantics).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 22 Codex P2 (PRRT_kwDOQ4a3IM57_u13). The SPA fallback route
probed ``os.path.isfile(os.path.join(STATIC_DIR, path))`` before
handing off to ``send_from_directory``. ``os.path.join`` does not
reject traversal segments, so a request like
``GET /..%2f..%2fetc%2fpasswd`` differentiated its response based on
whether the traversal target existed on disk:
- existing target → ``send_from_directory`` branch → 404
- missing target → SPA fallback → 200 ``index.html``

Because the static route is intentionally unauthenticated, that
turned the dashboard into an unauthenticated filesystem-existence
oracle for any path readable by the server process.

Swap ``os.path.join`` for ``werkzeug.utils.safe_join``: it returns
``None`` whenever the resolved path would escape ``STATIC_DIR``,
so traversal attempts skip the ``os.path.isfile`` probe entirely
and fall straight through to the SPA fallback. Legitimate static
assets keep working (they resolve to an absolute path inside
STATIC_DIR and pass both checks). No new CDN dep is introduced —
``werkzeug.utils`` is already imported transitively via Flask.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 23 Codex P2 (PRRT_kwDOQ4a3IM57_zC9). The earlier eviction path
only dropped a stream when ``refcount`` reached zero AND
``eof_emitted`` was already true. A client that disconnected before
its session reached a terminal state therefore left the stream (and
its 256-event retention deque, often full of base64 snapshot/append
chunks) resident for the process lifetime — no later code path marks
EOF on an unpolled stream whose session quietly ended. On long-lived
dashboard instances where users briefly browse many active sessions,
that accumulated memory without bound.

Add a second eviction axis: an idle-TTL timer recorded whenever a
release drops the refcount to zero without EOF. Every ``release``
doubles as an opportunistic sweep — entries whose idle timestamp is
older than ``IDLE_STREAM_TTL_SECONDS`` (5 min) and still have
refcount=0 are dropped regardless of EOF status. ``acquire`` clears
the idle timestamp so normal reconnect-within-window flows keep
retention (the streaming contract's 256-event ``Last-Event-Id``
replay window), while late reconnects fall through to the existing
``resync(overflow)`` + fresh-snapshot path.

The constructor accepts an override so tests can drive eviction
with a deterministic short TTL without waiting 5 minutes of
wall-clock time.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 23 Codex P1 (PRRT_kwDOQ4a3IM57_zC8). ``/api/analytics`` fed the
raw output of ``list_sessions(PROJECT_DIR)`` into ``compute_analytics``
without the ``_is_safe_session_id`` filter that ``/api/sessions``
already applies. A crafted on-disk session directory whose name
contained quote / JS metacharacters could therefore reach the
Analytics comparison table, where ``buildCmpTable`` interpolated
``s.session_id`` directly into an inline
``onclick="navigate('#/session/${id}')"`` handler plus cell HTML —
the same XSS vector the rest of the API had already closed.

Two layers of defence:

1. Backend: drop sessions whose id fails ``_is_safe_session_id``
   before handing them to the analyzer, so the Analytics payload
   cannot carry an unsafe id at all.
2. Frontend: stop interpolating the id through an inline
   ``onclick`` string literal. ``buildCmpTable`` now emits a
   ``data-session-id`` attribute (escaped via ``_esc``) and binds
   navigation through a delegated click listener; every other cell
   value is also escaped for consistency. Even if a future backend
   regression lets an unsafe id slip past the filter, the value
   flows only through ``dataset`` (a DOM-level string that is never
   re-parsed as JS) and ``window.location`` — neither evaluates
   markup.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 24 Codex P2 (PRRT_kwDOQ4a3IM57_5AV). ``compute_analytics`` read
``s['current_round']`` as if it were a round count, but the field is
a 0-based *index* — a session that finished round 0 reports
``current_round=0`` even though its ``rounds`` list has one entry.
Two knock-on effects on the Analytics UI:

- ``overview.average_rounds`` undercounted every non-empty session
  by one. The prior filter ``current_round > 0`` also excluded
  single-round sessions entirely, so the average silently skipped
  the shortest runs instead of representing them.
- ``session_stats[*].rounds`` (the comparison table's ``Rounds``
  column) reported ``0`` for sessions whose only round was round 0
  and ``N-1`` for sessions with N rounds completed.

Switch both sites to ``len(s['rounds'])``. The parser already
constructs ``rounds`` from ``range(max_disk_round + 1)`` so its
length is the authoritative completed-round count, and it naturally
covers sessions where on-disk review files exceed ``current_round``
(the parser's own ``max_disk_round`` rule). Drop the ``> 0`` filter
in favour of ``n > 0`` on the derived length so single-round
sessions now participate in the average.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 25 Codex P2 (PRRT_kwDOQ4a3IM58A-pN). The GitHub-issue generator
rendered ``A {current_round}-round RLCR session ...`` where
``current_round`` is a 0-based *index*. A session whose only round
was round 0 reported ``0-round`` in the outbound issue text, and
every other session lost one round in the summary.

Swap to ``len(s['rounds'])``, matching the Round 24 fix in
``compute_analytics``. The parsed ``rounds`` list is the authoritative
count: the parser builds it from ``range(max_disk_round + 1)`` so it
covers both the normal ``current_round + 1`` case and the edge case
where on-disk review files exceed ``current_round``. The sanitized
taxonomy-only issue body now reflects the real session size.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 25 Codex P2 (PRRT_kwDOQ4a3IM58A-pP). The Markdown exporter put
``session['current_round']`` into the ``Rounds`` overview cell, which
is a 0-based round index. A session whose only round was round 0
therefore exported ``| Rounds | 0 |`` and every other session lost
one round in the generated report. Downstream consumers (users
archiving session summaries, anything grepping the export) saw
incorrect session totals.

Use ``len(session.get('rounds') or [])`` to match the analytics and
sanitized-issue fixes from Round 24/25: the parsed rounds list is
the authoritative count regardless of how ``current_round`` relates
to ``max_disk_round`` (they can diverge when on-disk review files
exceed ``current_round``). The guard against a missing ``rounds``
key also hardens the exporter so a partially-parsed session cannot
raise AttributeError mid-render.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 26 Codex P2 (PRRT_kwDOQ4a3IM58BEtU). The sanitized GitHub issue
body's ``Quantitative Summary`` table still read ``| Total rounds |
{s["current_round"]} |``. ``current_round`` is a 0-based round
index, so the table under-reported every session by one (``0`` for
a single-round session, ``N-1`` for an N-round session) even though
the Context line above already switched to ``round_total`` in
Round 25.

Reuse the ``round_total = len(s.get('rounds') or [])`` value already
computed earlier in the same function so the Quantitative Summary
and the Context paragraph agree on the session size. Downstream
issue readers no longer see conflicting counts between the two
sections of the generated report.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
feat: add humanize monitor web dashboard for RLCR sessions
@SihaoLiu SihaoLiu changed the title Promote beta to dev: viz monitor web dashboard beta v1.17.0 Apr 19, 2026
@SihaoLiu SihaoLiu mentioned this pull request Apr 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d67da3bace

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/app.py Outdated
origin_host = (parsed.hostname or '').lower()
if not origin_host:
return False
origin_port = parsed.port or _default_port_for_scheme(parsed.scheme)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle invalid Origin ports instead of throwing 500

_origin_matches_request() reads parsed.port directly, but urllib.parse raises ValueError for malformed Origin values such as http://host:bad or out-of-range ports. That exception currently bubbles through _enforce_csrf_protection() and turns mutating requests into 500s instead of a controlled 403 reject, so a malformed client header can break cancel/report/issue POSTs. Treat invalid origin ports as non-matching origins and return False here.

Useful? React with 👍 / 👎.

Comment thread viz/server/parser.py Outdated
if len(rounds) >= 2:
first_mtime = None
last_mtime = None
for rn in range(current_round + 1):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute session duration from full on-disk round range

parse_session() correctly expands parsed rounds up to max_disk_round, but session_duration_minutes is still computed with for rn in range(current_round + 1). In sessions where state.current_round lags behind summaries already present on disk (a case this parser explicitly supports), duration is undercounted or left None, which propagates to /api/sessions and analytics displays. The duration loop should use the same max parsed round range rather than current_round only.

Useful? React with 👍 / 👎.

urlparse() accepts malformed port forms like host:bad or host:999999
without raising, but accessing parsed.port on the result raises
ValueError. Previously that exception bubbled through
_enforce_csrf_protection(), turning mutating requests into 500s for
any client sending a malformed Origin header instead of the intended
controlled 403 reject.

Wrap the parsed.port access in try/except ValueError and treat the
value as a non-matching origin so cancel / report / issue POSTs stay
available when a malformed Origin arrives. Add a live Flask
test_client regression for bad / out-of-range / non-integer Origin
ports.
parse_session() correctly expands rounds up to max_disk_round so the
/api/sessions payload surfaces every summary on disk even when
state.current_round lags behind. The session_duration_minutes loop
was still range(current_round + 1), so sessions whose current_round
undercounted real progress reported an undercounted or None duration
that propagated to /api/sessions and analytics displays.

Iterate over max_disk_round instead so the duration loop matches the
round-expansion logic, and add a parser regression whose
current_round=0 session contains three summary files with staggered
mtimes spanning four minutes.
cancel-rlcr-session.sh interpolates --session-id directly into
SESSION_DIR, so values like ../foo or /absolute/path previously
escaped the per-project .humanize/rlcr tree and could rename state
files outside the intended session. Any caller forwarding an
unvalidated id would therefore mutate unrelated project state.

Validate --session-id before building the target path: reject path
separators, backslashes, leading dots, and anything outside the
alphanumeric / dot / underscore / dash character class that the
date-based session ids produced by setup-rlcr-loop.sh fit into. Emit
exit code 3 with an explanatory message for every rejected form.
Extend tests/test-cancel-session.sh with sibling-directory fixtures
proving traversal ids do not mutate files outside the session tree.
LogStreamRegistry only ran _sweep_idle_streams_locked() from inside
release(), so a stream whose refcount dropped to zero without EOF
stayed resident for the process lifetime whenever no subsequent
release() fired after the idle TTL. Long-lived dashboards with
low-churn traffic leaked every LogStream that ever had a one-off
disconnect, together with its retained event buffer.

Piggyback the sweep onto acquire() and streams_in_cache_dir() too.
New SSE connections anywhere on the dashboard reclaim stale
retention deques from unrelated sessions, and the cache watcher
callback (which calls streams_in_cache_dir on every observed write)
keeps eviction driven by ongoing activity instead of only by
balanced refcount transitions. Add a streaming regression that
rewinds _idle_since and verifies both call-sites evict idle streams.
@SihaoLiu
Copy link
Copy Markdown
Contributor Author

@codex review this PR as entirety target branch dev

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@SihaoLiu
Copy link
Copy Markdown
Contributor Author

Codex review cycle complete: no further findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants