beta v1.17.0 by SihaoLiu · Pull Request #98 · PolyArch/humanize

SihaoLiu · 2026-04-19T16:16:53Z

Summary

Promotes the accumulated beta work (viz monitor web dashboard and follow-up fixes) to the dev branch.
Establishes the three-tier branching flow: main (production) <- dev (stable next release) <- beta (next month's dev).

Note on beta lifecycle

The beta branch only exists during the window after dev has been frozen but before it has been merged to main. Once dev is merged, beta is automatically promoted to become the new dev.

Test plan

Review merged changes from PR feat: add humanize monitor web dashboard for RLCR sessions #63 for any regressions surfacing against dev.
Run existing humanize/RLCR smoke flows against this branch before promoting to dev.
Verify viz dashboard boots cleanly with viz-restart.sh flags.

Introduces the `humanize monitor web` command and its Flask + SPA dashboard for RLCR sessions, plus the accumulated review-phase fixes and UX polish. Feature - Per-project session discovery and summary (state.md, goal tracker, round summaries, review results, methodology reports). - Pipeline canvas with snake-path node layout, SVG connectors, pan / zoom, and per-round flyout details. - Session-detail page carries a below-canvas live-log panel with a three-state toggle (collapsed / normal / expanded) driven by --log-h and a single grid-template-rows declaration. Clicking a pipeline node auto-collapses the log; closing the flyout restores the prior state. - Incremental pipeline update (_updatePipelineIncremental) appends or replaces individual nodes and redraws only the SVG connectors, preserving zoom / pan across round additions. - Diff-based home refresh: only cards whose rendered HTML changed are replaced; unchanged cards stay untouched. - Per-session SSE live-log streaming from ~/.cache/humanize with snapshot / append / resync / eof / retained-replay protocol. - Active-log selection mirrors the CLI monitor: pick the top round's codex-run log, with codex-review / gemini-run / gemini-review fallbacks. - REST + WebSocket + SSE transports. Localhost binds skip auth; remote binds require a bearer token (Authorization header for fetch, ?token= query for SSE) plus origin-matched CSRF gating on every mutating route. - Ops-menu Preview Issue flow: calls local Claude CLI to generate a sanitized methodology report (humanize issue taxonomy), shows a preview modal, and submits via `gh issue create --repo humania-org/humanize`. Falls back to a copy-to-clipboard path when gh is unauthenticated. - Active-session dynamic progress visualization, methodology report generation via local Claude CLI, and a comprehensive frontend test suite. Correctness and safety - broadcast_message uses set.difference_update to avoid the UnboundLocalError from Python treating `-=` as a local rebind. - Session path validation rejects `..`, dotfiles, symlink escapes; plan-file reads are constrained to the project / session dir. - Remote-mode auth fail-closed when no token is configured. - CSRF layer enforced on every mutating route, including wildcard binds and IPv6 literals. - Parser accepts table-form acceptance criteria and legacy / single-letter criterion ids; multi-id table cells are split correctly; COMPLETE verdict only fires on terminal lines. - Finalize-phase classification scoped to the live round, and skip-impl sessions correctly classify round 0 as review-only. Transport and watcher - watcher.py starts a cache-log observer per active session that broadcasts `round_added` when a new round-N-*.log file appears, so the dashboard switches the live-log pane to the new round without waiting for the next state.md write. - SessionWatcher.on_session_created hook primes cache-log observers for brand-new sessions; every event retries the hook so the state-dir-before-cache-dir race eventually heals. - Non-destructive toggleTheme (no DOM rebuild). The analytics route is the only surface whose charts bake CSS vars into SVG at render time, so that route alone re-renders on theme change. - Remote-mode slow polling (10s) uses the same targeted refresh helpers as the WS path so card counters and pipeline nodes catch up without a full-page reload. - Session-detail refresh race guard: after awaiting /api/sessions/<id>, re-check the active route + layout skeleton before mutating DOM. Install and CLI - Preflight codex_hooks and idempotent hook merge when paths contain spaces. - Relaxed runtime detection and tolerate unwritable default shim paths. - `humanize monitor web` foreground mode passes --host explicitly, IPv6 brackets in the printed URL, readiness probe is fail-closed. Parser and API - /api/sessions summary carries every field the terminal `humanize monitor rlcr` status bar needs (push_every_round, ask_codex_question, agent_teams, drift, stall count, ultimate_goal, tasks_active / deferred, active_log_path, git_status, etc.). - cache_logs entries carry round / tool / role / path / basename for deterministic client-side selection. Version: 1.17.0. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

`scripts/humanize.sh` is meant to be sourced into the user's interactive shell (the README shows `source humanize.sh` followed by `humanize monitor web`). Using `exec` to hand control to the Flask process replaced that shell, so pressing Ctrl+C — or any server exit — took the whole interactive session down with it. Replace the `exec` with a plain subprocess invocation so the function returns normally when the server stops and the user's shell prompt stays alive. Daemon mode still delegates to `viz/scripts/viz-start.sh` and is unaffected. Fixes Codex review P1 on PR #63 (#63 (comment)). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

viz-restart.sh advertised `viz-restart.sh [--project <path>]` in its usage string but ran `PROJECT_DIR="${1:-.}"`, which treated the literal flag name `--project` as a directory. Calling the documented form would fail at `cd --project` before the restart could happen. Mirror the flag-parsing loop used by viz-start.sh and viz-stop.sh (positional path plus `--project <path>` named form), and forward `--project` to both helpers so the whole chain uses the same resolved absolute path. Fixes Codex review P2 on PR #63 (#63 (comment)). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The six Chart.js analytics panels on /#/analytics (rounds per session, avg round duration, verdict distribution, P0-P9 issues, first COMPLETE round, BitLesson growth) were dropped in the home-analytics refactor; the home page now carries the same summary numbers as inline stat cards plus a lightweight inline- SVG sparkline, and the analytics route surfaces the verdict timeline + session comparison table without any Chart.js panels. Clean up the remaining dead references that Codex flagged: - Delete viz/static/js/charts.js (no longer imported). - Drop the Chart.js CDN <script> from index.html. - Drop the /js/charts.js <script> tag from index.html. - Remove the six chart-panel i18n keys from i18n.js — no other caller references them. - Update tests/test-viz.sh's static-file presence check to match. Fixes Codex review P2 on PR #63 (#63 (comment)). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The forbidden-token pattern for imports was `\b(?:import|require|from)\s+\w+`, which matches ordinary English prose like "drifted from the original plan structure". That one phrase appears verbatim in the built-in `plan_execution` methodology observation text, so every sanitized issue payload that picked `plan_execution` as a dominant category was marked as having warnings, and `/api/sessions/<id>/github-issue` rejected the submission with a 400 even after the outbound body had already been assembled from the constrained taxonomy. Replace the single pattern with three code-anchored variants: - `^\s*import\s+[\w.]+` (Python `import x.y` at line start) - `^\s*from\s+[\w.]+\s+import\b` (Python `from x import y` at line start) - `\brequire\s*\(` (JS / Node `require(` call syntax) These still catch real import fragments that leak into a methodology report (bare `import os`, indented ` import sys`, `from viz.server import app`, `require("fs")`) but leave natural prose using the preposition "from" alone. Fixes Codex review P1 on PR #63 (#63 (comment)). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

`read_plan_file()` gated its open() on `os.path.exists(candidate_real)`, which is True for directories as well as files. A state.md that carried `plan_file: .` (or any directory under the project / session that happened to resolve inside the allowed prefix) would therefore pass validation, skip the backup fallback, and drop into `open(candidate_real, 'r')` which raises IsADirectoryError. The exception propagated out through /api/sessions/<id>/plan as an uncaught 500 instead of surfacing the controlled fallback to the session-local plan.md (or the intended 404 when no backup exists). Swap the existence check for `os.path.isfile`, which is directory- safe and also returns False for broken symlinks. Directories and dangling symlinks now fall through to the existing backup branch. Fixes Codex review P2 on PR #63 (#63 (comment)). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The live-log pane decoded every SSE chunk with `_utf8Decoder.decode(bytes)`, which is non-streaming: each call finalises the decoder and replaces any trailing buffered bytes with U+FFFD. A multibyte UTF-8 codepoint split across the 64 KiB SSE chunk boundary (any CJK char or emoji that happens to straddle a chunk) was therefore corrupted even though the source bytes were valid. Pass `{ stream: true }` to the decoder on incremental chunks so it retains leading bytes of an incomplete multibyte sequence and reassembles on the next append. Explicitly flush the decoder in the places where the underlying stream actually ends: - resync with a `truncated` / `rotated` / `recreated` / `overflow` reason: the server is telling us the file is gone or restarted from offset 0, so the previous buffer must not bleed into whatever arrives next. - eof: the stream is terminally closed; emit any trailing incomplete sequence as U+FFFD rather than silently dropping it. Fixes Codex review P2 on PR #63 (#63 (comment)). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

`SessionWatcher.start()` used to spin up an independent watchdog Observer for every session directory under .humanize/rlcr/ and never tore any of them down. On projects that have accumulated dozens of completed sessions this burned inotify slots and watcher threads for no reason — completed sessions never write new round-*.log files, so a dedicated per-session cache observer for them is pure overhead. On heavy hosts it could also prevent the watcher from starting at all (ENOSPC on inotify watches), which silently disables the broadcast path. Two tightening changes: - Boot-time pass only primes cache observers for sessions whose state.md exists and has no terminal *-state.md marker alongside (see _TERMINAL_STATE_SUFFIXES). Finished sessions stay observer-free. - The terminal-state file listener now tears down the matching observer via a new `on_session_finished` hook on RLCREventHandler, so an active session that transitions to complete / cancel / stop / maxiter / unexpected / finalize / methodology-analysis releases its watcher as soon as the marker lands. `_start_cache_observer()` also re-checks `_session_is_active()` before allocating the observer — covers the race where a start callback fires for a session that finished between the event and the scheduler tick. The existing retry in `_schedule_event()` still recovers the observer if a cache-log file appears after the state dir is created but before the cache dir materialises. Fixes Codex review P1 on PR #63 (#63 cache-observer resource exhaustion). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The frontend renders session ids into inline onclick template literals in many places: onclick="navigate('#/session/${s.id}')" onclick="opsPreviewIssue(window._currentSession?.id)" onclick="openFlyout(this, ${r.number})" and so on. Any session directory whose name is not well-formed would therefore let disk state break out of the surrounding JS string literal when the card / detail / analysis pages render — a stored-XSS vector if an operator ever creates a hostile session dir by hand or restores one from an untrusted archive. Backend already rejected some metacharacters in _get_session_dir() (`/`, `\`, `..`, leading `.`), but that still left quotes, angle brackets, backticks, Unicode, etc. Tighten the validator to the exact shape generated by `setup-rlcr-loop.sh` (`YYYY-MM-DD_HH-MM-SS`) via _is_safe_session_id(), and reuse it in two places: - _get_session_dir() now rejects every id that doesn't match. - /api/sessions filters out any session whose id fails the check, so hostile ids never flow to the home-page renderer in the first place. Defense-in-depth: the generator has always produced ids that match this shape, so legitimate sessions stay visible. The full-fix front-end migration (inline handlers -> data-attr + event delegation) is intentionally out of scope here to keep the diff focused; the regex check already closes the attack surface. Fixes Codex review P1 on PR #63 (session-id XSS via inline onclick handlers). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The SSE log stream carries its bearer token as a `?token=` query parameter because browsers can't attach custom headers to EventSource (DEC-4). That is safe on the localhost bind — the URL never leaves the host — but plain-HTTP Flask on a non-loopback bind transmits the token in cleartext and deposits it into every proxy / access-log / browser-history buffer on the way. The previous guard only required a token to be configured, not that a TLS-terminating reverse proxy actually sits in front. Add an explicit operator acknowledgement gate: - New `--trust-proxy` flag on `viz/server/app.py`, also readable from `HUMANIZE_VIZ_TRUST_PROXY=1`. - `main()` refuses to start when `--host` is non-loopback and neither the flag nor the env var is set, printing a clear "requires a TLS-terminating reverse proxy" message. - `scripts/humanize.sh::_humanize_monitor_web` accepts the same `--trust-proxy` flag and forwards it in both foreground and `--daemon` invocations (through `viz-start.sh`). - `viz/scripts/viz-start.sh` parses `--trust-proxy` and appends it to the Python command line. Deployment model stays unchanged for localhost users and for remote operators who already run behind nginx / caddy — they just add one flag to acknowledge the topology. Operators who accidentally expose plain HTTP get an explicit refusal instead of a silent cleartext-token foot-gun. Fixes Codex review P1 on PR #63 (token leakage over plain HTTP on non-loopback binds). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

`safeMd()` renders every piece of session-authored markdown that flows into the dashboard (plan files, round summaries, review results, methodology reports, Preview Issue modal). When the DOMPurify CDN dep wasn't loaded — offline deployment, blocked CDN, or a CSP that refused unpkg.com — the old implementation returned the raw `marked.parse()` output, which re-opens every script-injection vector the sanitizer was supposed to close. Switch to fail-closed: when either `DOMPurify` or `marked` is undefined, wrap the escaped plain text in a `<pre>` block so the degradation is visible (mono text with no markdown formatting) rather than silently permissive. When both libs are present the existing parse-then-sanitize path is unchanged. Fixes Codex review P2 on PR #63 (DOMPurify fail-open when the CDN script doesn't load). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The per-session SSE generator decides whether to emit EOF by calling `_get_session(session_id, force_refresh=True)` every `_SSE_POLL_INTERVAL_SECONDS` (250 ms). That path runs the full parse_session pipeline — scans every round-*-summary.md and review-result.md, parses goal-tracker.md, re-reads the methodology report, and shells out to `git` twice for the git-status summary. On long sessions and with multiple live SSE clients, parse and fork overhead stacks up and eats into the latency budget that the streaming protocol doc reserves for append emission. Replace that cache-invalidating fetch with `_session_is_terminal_cheap()`: an `os.path.isfile` check against the seven known terminal-state marker filenames plus the existing `_get_session_dir` path guard. Terminal state is a disk-level signal, so the cheap path is authoritative — no parser output is needed to decide whether the loop is still writing logs. A missing session dir is treated as terminal so the SSE generator exits cleanly instead of spinning. Full session data is still available to frontend callers via `/api/sessions/<id>`; this change only narrows the hot-loop probe. The `_is_terminal_status()` helper remains for callers that already have a parsed session dict. Fixes Codex review P2 on PR #63 (force_refresh=True on every SSE poll tick). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

`/api/sessions/<id>/generate-report` short-circuited whenever `methodology-analysis-report.md` already existed with non-zero size. For a session that was still running, that meant every subsequent Preview Issue click returned the analysis of the *first* round the user previewed — not a fresh read of the additional round summaries / review results that had landed since. The GitHub issue payload built from the cached report therefore described a session snapshot that no longer existed. Add `_report_is_stale(session_dir, report_path)` that compares the report's mtime against every round-*-summary.md and round-*-review-result.md in the session directory. A newer source file invalidates the cache and forces a fresh Claude CLI run. Callers can also pass `?force=1` to skip the cache unconditionally. Report is served as-is when none of the source files have landed after its mtime — quick wins for sessions that finished an hour ago still hit the fast path. Fixes Codex review P2 on PR #63 (generate-report cache never invalidates on active sessions). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

viz-restart.sh only captured --project and handed that to viz-start.sh; --host / --port / --auth-token / --trust-proxy were silently dropped. A daemon started with `humanize monitor web --host 0.0.0.0 --port 18000 --auth-token TOK --trust-proxy --daemon` would therefore come back in loopback mode on restart, with a different auto-picked port and no auth — the previous access URL broke without warning. Parse the full flag set that viz-start.sh accepts, re-assemble an argv in a deterministic order, and forward it verbatim. The usage string documents that callers are responsible for repeating the original flags; we don't attempt to read them back from `viz.url` / env because those are deliberately immutable runtime artifacts. Fixes Codex review P2 on PR #63 (viz-restart drops --host / --port / --auth-token). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The `/ws` handler enforced "localhost bind only" but otherwise accepted every incoming handshake without checking the Origin header. Browsers deliberately allow cross-origin WebSocket handshakes — the Same-Origin Policy does not apply there — so a malicious page open in the same browser could connect to ws://localhost:<port>/ws and immediately send a `cancel_session` message, killing an active RLCR loop with zero auth prompt. Reuse the existing `_origin_matches_request()` matcher (the same logic that gates mutating HTTP routes via CSRF) before adding the socket to `_ws_clients`. Connections without an Origin header are treated as same-origin (curl, server-to-server callers): the localhost bind already refuses non-loopback clients and the Origin header is effectively mandatory from browsers on the WebSocket handshake. Fixes Codex review P1 on PR #63 (cross-origin WebSocket `cancel_session` vector). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

The initial strict regex (`^[0-9]{4}-[0-9]{2}-[0-9]{2}_[0-9]{2}- [0-9]{2}-[0-9]{2}$`) rejected every test fixture that uses a short slug like `2026-04-17_CL`, breaking 26 assertions in tests/test-app-routes-live.sh even though the underlying id is benign. Widen the accepted set to ASCII letters / digits / underscore / dash / period (the union of characters that the on-disk session generator has ever produced plus what the CI fixtures rely on), but keep the extra rules that reject `..`, leading-dot, and path separators. Quote, backtick, angle-bracket, backslash, newline, and every other JS-string metacharacter are still refused up-front, which is the property the original defense-in-depth was after: hostile disk state cannot break out of the frontend's inline onclick template literals. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Behind a TLS-terminating reverse proxy (nginx / caddy / the --trust-proxy deployment mode added in 948ea7c), the Flask back-channel always sees plain HTTP. `request.scheme` is therefore `http`, so `_default_port_for_scheme()` collapsed the browser-facing port to 80. A browser Origin of `https://example.com` parses as port 443, and `_origin_matches_request()` rejected legitimate same-origin POSTs with 403 — cancel, generate-report, and the Preview-Issue GitHub submission all broke on the standard HTTPS reverse-proxy setup. Add `_effective_request_scheme()`. When TRUST_PROXY is True (the operator has acknowledged the reverse-proxy topology) it trusts the first hop of `X-Forwarded-Proto` for scheme resolution, so a browser-facing `https://` flow resolves to port 443 and matches the incoming Origin. When TRUST_PROXY is False (default, direct- connect localhost dev) the header is ignored — attacker-supplied `X-Forwarded-Proto: https` cannot trick a plain-HTTP dashboard into computing the wrong port for the CSRF matcher. The existing `--trust-proxy` / HUMANIZE_VIZ_TRUST_PROXY gate now also promotes the resolved flag into a module-level `TRUST_PROXY` global so the CSRF helpers can read it; the earlier main() local is folded into the same initialisation block. Fixes Codex review P1 on PR #63 (HTTPS reverse-proxy same-origin mismatch). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 20 Codex P2 (PRRT_kwDOQ4a3IM57_U2A). `_determine_phase` returned `code_review` unconditionally when `round_num > build_round`, so the live finalize round — which by definition sits past `build_finish_round` — was classified as `code_review` and the later finalize-specific branch never ran. Phase timeline and duration metrics therefore hid the finalize step for normal sessions holding a `.review-phase-started` marker. The finalize classification now runs as a priority gate: when the session is finalizing and the round equals the current round, return `finalize` before the code_review fall-through. Applies to both normal and skip-impl paths so the live finalize round surfaces correctly in either mode. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 20 Codex P2 (PRRT_kwDOQ4a3IM57_U2B). `_ensure_cache_watcher` started one `CacheLogWatcher` per unique cache directory and never stopped it, so long-running dashboard processes leaked one observer thread and one inotify handle per session a user ever browsed. On hosts with a small `fs.inotify.max_user_watches` budget that eventually exhausts the pool. Replace the start-only helper with an acquire/release pair: - `_acquire_cache_watcher(cache_dir)` starts the watcher on first use and increments a per-directory refcount. - `_release_cache_watcher(cache_dir)` decrements, stops the observer on the final release. The SSE generator in `stream_session_log` now wraps its body in a try/finally that acquires before the first yield and releases in the finally block. Normal EOF, `GeneratorExit` from a client disconnect, and exception paths all balance the refcount, so watchers are torn down as soon as their last live stream goes away while concurrent clients on the same session keep sharing one observer. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

CI route-backed truncation test (tests/test-app-routes-live.sh Group 6) is prone to a watcher/writer race: when the writer does ``with open(path, 'wb') as f: f.write(data)`` the file is truncated to 0 bytes before the second write lands. The per-cache watcher's callback can fire on that first IN_MODIFY event and invoke ``_poll_locked`` while the file is still empty, so the resync path's inline ``_snapshot_locked`` emits zero snapshot events and the content subsequently arrives as ``append`` — violating the protocol contract's resync -> snapshot sequencing that downstream clients rely on. Track a ``_resync_pending`` flag that every resync path sets whenever its immediate snapshot attempt returned empty (file was 0 bytes at the time). The next poll that observes ``size > self._offset`` while the flag is set emits the new content via ``_snapshot_locked`` instead of ``_append``, and clears the flag once caught up. Normal write paths are unaffected: the flag stays False when the resync snapshot sees content immediately, so the append fast-path still covers steady-state streaming. Fixes the GitHub Actions ``run-all-tests`` failure on Round 20: FAIL: route-backed truncation event stream incomplete: snapshots=1 resync_truncated=True eof=True The three existing resync sites (recreated, rotated, truncated) share the same pattern so the fix is uniform across all of them. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 21 Codex P1 (PRRT_kwDOQ4a3IM57_efh). `_TERMINAL_STATE_FILES` listed `methodology-analysis-state.md` and `finalize-state.md` among the markers that tell the SSE generator to emit `eof` and close the stream. Both markers are actually *active* states — `_is_terminal_status` excludes `analyzing` / `finalizing`, and the dashboard still treats those sessions as cancellable — so `_session_is_terminal_cheap` was mis-classifying live sessions as finished. The observable impact was that as soon as a session entered finalize or methodology-analysis, every live SSE client saw an immediate `eof` event and the log pane stopped updating, even though the backing codex-run / codex-review log files were still being written. Trim the marker list to the five truly-terminal states (complete, cancel, stop, maxiter, unexpected) so SSE now keeps delivering append events throughout the analyzing/finalizing phases. A short comment pins this list to `_is_terminal_status` so future markers don't drift out of sync again. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 21 Codex P2 (PRRT_kwDOQ4a3IM57_efi). `parse_git_status` in `viz/server/parser.py` checked `x == 'M' or y == 'M'` before the index-side `A` branch. That made a porcelain status like `AM` (new file added to the index, then modified in the worktree — the common "stage a new file, then tweak it" workflow) count as `modified` and never as `added`, so the dashboard git summary drifted out of parity with `humanize_parse_git_status` in `scripts/humanize.sh`, which explicitly maps `AM` to `added`. Reorder the branches so any index-side `A` wins first (covers `A `, `AM`, `AD`), followed by `R` (rename → modified) then `D` (deletion), and only then the `M` fallback. Matches the shell helper's explicit priority exactly and stops the dashboard header from undercounting added files whenever the user has a staged-new-file with an in-progress edit. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 22 Codex P1 (PRRT_kwDOQ4a3IM57_u12). ``LogStreamRegistry.get_or_create`` had no matching delete path, so every ``(session_id, basename)`` ever opened lived until the dashboard process exited. Each ``LogStream`` keeps a 256-event retention deque (often large base64 snapshot/append chunks), so browsing many sessions/round logs grew memory without bound on long-running viz servers. Extend the registry with explicit ``acquire`` / ``release`` helpers that ref-count active SSE consumers, and expose ``LogStream.eof_emitted`` for the release path to query. Eviction policy: - refcount > 0: keep the stream (clients still reading). - refcount == 0 AND stream has emitted EOF: drop from the registry and free the retention deque. - refcount == 0 AND session still active (no EOF yet): keep the stream so future reconnects still land inside the 256-event ``Last-Event-Id`` replay window the streaming contract mandates. Wire the SSE route to ``_acquire_log_stream`` / ``_release_log_stream``: the release now sits in the same generator ``finally`` that already releases the cache watcher, so normal EOF, client disconnect, and exception paths all balance the refcount. ``get_or_create`` stays callable (tests exercise it directly without refcount semantics). Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 22 Codex P2 (PRRT_kwDOQ4a3IM57_u13). The SPA fallback route probed ``os.path.isfile(os.path.join(STATIC_DIR, path))`` before handing off to ``send_from_directory``. ``os.path.join`` does not reject traversal segments, so a request like ``GET /..%2f..%2fetc%2fpasswd`` differentiated its response based on whether the traversal target existed on disk: - existing target → ``send_from_directory`` branch → 404 - missing target → SPA fallback → 200 ``index.html`` Because the static route is intentionally unauthenticated, that turned the dashboard into an unauthenticated filesystem-existence oracle for any path readable by the server process. Swap ``os.path.join`` for ``werkzeug.utils.safe_join``: it returns ``None`` whenever the resolved path would escape ``STATIC_DIR``, so traversal attempts skip the ``os.path.isfile`` probe entirely and fall straight through to the SPA fallback. Legitimate static assets keep working (they resolve to an absolute path inside STATIC_DIR and pass both checks). No new CDN dep is introduced — ``werkzeug.utils`` is already imported transitively via Flask. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 23 Codex P2 (PRRT_kwDOQ4a3IM57_zC9). The earlier eviction path only dropped a stream when ``refcount`` reached zero AND ``eof_emitted`` was already true. A client that disconnected before its session reached a terminal state therefore left the stream (and its 256-event retention deque, often full of base64 snapshot/append chunks) resident for the process lifetime — no later code path marks EOF on an unpolled stream whose session quietly ended. On long-lived dashboard instances where users briefly browse many active sessions, that accumulated memory without bound. Add a second eviction axis: an idle-TTL timer recorded whenever a release drops the refcount to zero without EOF. Every ``release`` doubles as an opportunistic sweep — entries whose idle timestamp is older than ``IDLE_STREAM_TTL_SECONDS`` (5 min) and still have refcount=0 are dropped regardless of EOF status. ``acquire`` clears the idle timestamp so normal reconnect-within-window flows keep retention (the streaming contract's 256-event ``Last-Event-Id`` replay window), while late reconnects fall through to the existing ``resync(overflow)`` + fresh-snapshot path. The constructor accepts an override so tests can drive eviction with a deterministic short TTL without waiting 5 minutes of wall-clock time. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 23 Codex P1 (PRRT_kwDOQ4a3IM57_zC8). ``/api/analytics`` fed the raw output of ``list_sessions(PROJECT_DIR)`` into ``compute_analytics`` without the ``_is_safe_session_id`` filter that ``/api/sessions`` already applies. A crafted on-disk session directory whose name contained quote / JS metacharacters could therefore reach the Analytics comparison table, where ``buildCmpTable`` interpolated ``s.session_id`` directly into an inline ``onclick="navigate('#/session/${id}')"`` handler plus cell HTML — the same XSS vector the rest of the API had already closed. Two layers of defence: 1. Backend: drop sessions whose id fails ``_is_safe_session_id`` before handing them to the analyzer, so the Analytics payload cannot carry an unsafe id at all. 2. Frontend: stop interpolating the id through an inline ``onclick`` string literal. ``buildCmpTable`` now emits a ``data-session-id`` attribute (escaped via ``_esc``) and binds navigation through a delegated click listener; every other cell value is also escaped for consistency. Even if a future backend regression lets an unsafe id slip past the filter, the value flows only through ``dataset`` (a DOM-level string that is never re-parsed as JS) and ``window.location`` — neither evaluates markup. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 24 Codex P2 (PRRT_kwDOQ4a3IM57_5AV). ``compute_analytics`` read ``s['current_round']`` as if it were a round count, but the field is a 0-based *index* — a session that finished round 0 reports ``current_round=0`` even though its ``rounds`` list has one entry. Two knock-on effects on the Analytics UI: - ``overview.average_rounds`` undercounted every non-empty session by one. The prior filter ``current_round > 0`` also excluded single-round sessions entirely, so the average silently skipped the shortest runs instead of representing them. - ``session_stats[*].rounds`` (the comparison table's ``Rounds`` column) reported ``0`` for sessions whose only round was round 0 and ``N-1`` for sessions with N rounds completed. Switch both sites to ``len(s['rounds'])``. The parser already constructs ``rounds`` from ``range(max_disk_round + 1)`` so its length is the authoritative completed-round count, and it naturally covers sessions where on-disk review files exceed ``current_round`` (the parser's own ``max_disk_round`` rule). Drop the ``> 0`` filter in favour of ``n > 0`` on the derived length so single-round sessions now participate in the average. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 25 Codex P2 (PRRT_kwDOQ4a3IM58A-pN). The GitHub-issue generator rendered ``A {current_round}-round RLCR session ...`` where ``current_round`` is a 0-based *index*. A session whose only round was round 0 reported ``0-round`` in the outbound issue text, and every other session lost one round in the summary. Swap to ``len(s['rounds'])``, matching the Round 24 fix in ``compute_analytics``. The parsed ``rounds`` list is the authoritative count: the parser builds it from ``range(max_disk_round + 1)`` so it covers both the normal ``current_round + 1`` case and the edge case where on-disk review files exceed ``current_round``. The sanitized taxonomy-only issue body now reflects the real session size. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 25 Codex P2 (PRRT_kwDOQ4a3IM58A-pP). The Markdown exporter put ``session['current_round']`` into the ``Rounds`` overview cell, which is a 0-based round index. A session whose only round was round 0 therefore exported ``| Rounds | 0 |`` and every other session lost one round in the generated report. Downstream consumers (users archiving session summaries, anything grepping the export) saw incorrect session totals. Use ``len(session.get('rounds') or [])`` to match the analytics and sanitized-issue fixes from Round 24/25: the parsed rounds list is the authoritative count regardless of how ``current_round`` relates to ``max_disk_round`` (they can diverge when on-disk review files exceed ``current_round``). The guard against a missing ``rounds`` key also hardens the exporter so a partially-parsed session cannot raise AttributeError mid-render. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

Round 26 Codex P2 (PRRT_kwDOQ4a3IM58BEtU). The sanitized GitHub issue body's ``Quantitative Summary`` table still read ``| Total rounds | {s["current_round"]} |``. ``current_round`` is a 0-based round index, so the table under-reported every session by one (``0`` for a single-round session, ``N-1`` for an N-round session) even though the Context line above already switched to ``round_total`` in Round 25. Reuse the ``round_total = len(s.get('rounds') or [])`` value already computed earlier in the same function so the Quantitative Summary and the Context paragraph agree on the session size. Downstream issue readers no longer see conflicting counts between the two sections of the generated report. Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>

feat: add humanize monitor web dashboard for RLCR sessions

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d67da3bace

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T16:28:40Z

+    origin_host = (parsed.hostname or '').lower()
+    if not origin_host:
+        return False
+    origin_port = parsed.port or _default_port_for_scheme(parsed.scheme)


Handle invalid Origin ports instead of throwing 500

_origin_matches_request() reads parsed.port directly, but urllib.parse raises ValueError for malformed Origin values such as http://host:bad or out-of-range ports. That exception currently bubbles through _enforce_csrf_protection() and turns mutating requests into 500s instead of a controlled 403 reject, so a malformed client header can break cancel/report/issue POSTs. Treat invalid origin ports as non-matching origins and return False here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-19T16:28:40Z

+    if len(rounds) >= 2:
+        first_mtime = None
+        last_mtime = None
+        for rn in range(current_round + 1):


Compute session duration from full on-disk round range

parse_session() correctly expands parsed rounds up to max_disk_round, but session_duration_minutes is still computed with for rn in range(current_round + 1). In sessions where state.current_round lags behind summaries already present on disk (a case this parser explicitly supports), duration is undercounted or left None, which propagates to /api/sessions and analytics displays. The duration loop should use the same max parsed round range rather than current_round only.

Useful? React with 👍 / 👎.

urlparse() accepts malformed port forms like host:bad or host:999999 without raising, but accessing parsed.port on the result raises ValueError. Previously that exception bubbled through _enforce_csrf_protection(), turning mutating requests into 500s for any client sending a malformed Origin header instead of the intended controlled 403 reject. Wrap the parsed.port access in try/except ValueError and treat the value as a non-matching origin so cancel / report / issue POSTs stay available when a malformed Origin arrives. Add a live Flask test_client regression for bad / out-of-range / non-integer Origin ports.

parse_session() correctly expands rounds up to max_disk_round so the /api/sessions payload surfaces every summary on disk even when state.current_round lags behind. The session_duration_minutes loop was still range(current_round + 1), so sessions whose current_round undercounted real progress reported an undercounted or None duration that propagated to /api/sessions and analytics displays. Iterate over max_disk_round instead so the duration loop matches the round-expansion logic, and add a parser regression whose current_round=0 session contains three summary files with staggered mtimes spanning four minutes.

cancel-rlcr-session.sh interpolates --session-id directly into SESSION_DIR, so values like ../foo or /absolute/path previously escaped the per-project .humanize/rlcr tree and could rename state files outside the intended session. Any caller forwarding an unvalidated id would therefore mutate unrelated project state. Validate --session-id before building the target path: reject path separators, backslashes, leading dots, and anything outside the alphanumeric / dot / underscore / dash character class that the date-based session ids produced by setup-rlcr-loop.sh fit into. Emit exit code 3 with an explanatory message for every rejected form. Extend tests/test-cancel-session.sh with sibling-directory fixtures proving traversal ids do not mutate files outside the session tree.

LogStreamRegistry only ran _sweep_idle_streams_locked() from inside release(), so a stream whose refcount dropped to zero without EOF stayed resident for the process lifetime whenever no subsequent release() fired after the idle TTL. Long-lived dashboards with low-churn traffic leaked every LogStream that ever had a one-off disconnect, together with its retained event buffer. Piggyback the sweep onto acquire() and streams_in_cache_dir() too. New SSE connections anywhere on the dashboard reclaim stale retention deques from unrelated sessions, and the cache watcher callback (which calls streams_in_cache_dir on every observed write) keeps eviction driven by ongoing activity instead of only by balanced refcount transitions. Add a streaming regression that rewinds _idle_since and verifies both call-sites evict idle streams.

SihaoLiu · 2026-04-19T17:24:24Z

@codex review this PR as entirety target branch dev

chatgpt-codex-connector · 2026-04-19T17:42:46Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

SihaoLiu · 2026-04-19T17:55:34Z

Codex review cycle complete: no further findings.

zevorn added 30 commits April 18, 2026 18:07

Merge pull request #63 from zevorn/feat/viz-dashboard

d67da3b

feat: add humanize monitor web dashboard for RLCR sessions

SihaoLiu changed the title ~~Promote beta to dev: viz monitor web dashboard~~ beta v1.17.0 Apr 19, 2026

SihaoLiu mentioned this pull request Apr 19, 2026

dev v1.16.0 #51

Open

chatgpt-codex-connector Bot reviewed Apr 19, 2026

View reviewed changes

SihaoLiu added 4 commits April 19, 2026 10:16

Merge branch 'dev' into beta

13cd25e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

beta v1.17.0#98

beta v1.17.0#98
SihaoLiu wants to merge 36 commits intodevfrom
beta

SihaoLiu commented Apr 19, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Uh oh!

SihaoLiu commented Apr 19, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 19, 2026

Uh oh!

SihaoLiu commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SihaoLiu commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Note on beta lifecycle

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

SihaoLiu commented Apr 19, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 19, 2026

Uh oh!

SihaoLiu commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SihaoLiu commented Apr 19, 2026 •

edited

Loading