perf(engine): extraction-phase instrumentation#444
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
miguel-heygen
left a comment
There was a problem hiding this comment.
Review: perf(engine): extraction-phase instrumentation
Clean additive-only change — zero deletions, zero regression risk. The ExtractionPhaseBreakdown type is well-typed, the forward-looking cache placeholder fields (cacheHits, cacheMisses) avoid a type-breaking change in #446, and the sampleDirectoryBytes helper is defensively coded (iterative stack, error-swallowing).
P2 — Should fix
vfrPreflightMs subtraction creates asymmetric semantics.
breakdown.vfrPreflightMs = Date.now() - vfrPreflightStart - breakdown.vfrProbeMs;hdrPreflightMs wraps the entire HDR preflight block (probe + analysis + conversion), but vfrPreflightMs deliberately excludes probe time. A consumer looking at both *PreflightMs fields would expect analogous scopes. Either:
- (a) Rename to
vfrTranscodeMsto match what it actually measures, or - (b) Include probe time in
vfrPreflightMs(matchinghdrPreflightMs) — the separatevfrProbeMsalready provides the finer decomposition.
Option (b) is cleaner.
JSDoc references phase3ExtractMs but the field is named extractMs.
* - phase3ExtractMs wraps the parallel `extractVideoFramesRange` calls
P3 — Suggestions
- Test coverage is VFR-only. HDR-path fields (
hdrProbeMs,hdrPreflightMs,hdrPreflightCount) are never asserted — even a simpleexpect(...).toBeGreaterThanOrEqual(0)per field would catch initialization bugs. sampleDirectoryBytesfollows symlinks viastatSync. UsinglstatSyncand skipping symlinks would be more accurate for "bytes on disk in workDir." Minor since the workDir is engine-controlled.tmpPeakBytesreturns0when workDir is missing, but the type isnumber | undefined. Usingundefinedfor the missing-directory case would be more semantically precise.
Verdict: Approve with minor fixes
The instrumentation overhead is sub-ms (a handful of Date.now() calls). The design cleanly enables #445 and #446. Fix the vfrPreflightMs semantics and the JSDoc typo before merge.
Adds per-phase timings and counters to `extractAllVideoFrames`: resolve, HDR probe/preflight, VFR probe/preflight, per-video extract, plus cacheHits/cacheMisses placeholders for the downstream cache PR. Surfaces them on `RenderPerfSummary.videoExtractBreakdown` and samples the render's `workDir` size into `tmpPeakBytes` right before cleanup. These are foundational for measuring every other perf PR in the stack. Overhead is zero (ten Date.now() calls across a multi-second phase). A VFR-fixture breakdown is asserted in videoFrameExtractor.test.ts.
Address Miguel's review feedback on #444: vfrPreflightMs was subtracting vfrProbeMs from the wall-clock measurement, creating asymmetric semantics vs hdrPreflightMs (which includes hdrProbeMs). A consumer looking at both *PreflightMs fields expects analogous scopes — the separate *ProbeMs fields already provide the finer decomposition. Also fix the JSDoc reference to phase3ExtractMs (the field is extractMs).
76948c9 to
5cfabff
Compare
|
Pushed
On the P3s, kept as-is for this PR since they're orthogonal to the semantic fix:
All three would land clean in a single small follow-up PR; happy to do that after this stack merges so the downstream PRs aren't re-validated against moving semantics. |
The CLI already ships `render_complete` events to PostHog but only carries a subset of `RenderPerfSummary` (duration_ms, capture_avg_ms, etc.). With #444 adding per-phase breakdown and #446 adding cache counters, the data is sitting in `job.perfSummary.videoExtractBreakdown` and `job.perfSummary.tmpPeakBytes` but never reaching PostHog. This forwards the new fields as flat properties (flat is easier to query with PostHog insights than nested objects): - tmp_peak_bytes - stage_compile_ms, stage_video_extract_ms, stage_audio_process_ms, stage_capture_ms, stage_encode_ms, stage_assemble_ms - extract_resolve_ms, extract_hdr_probe_ms, extract_hdr_preflight_ms, extract_hdr_preflight_count, extract_vfr_probe_ms, extract_vfr_preflight_ms, extract_vfr_preflight_count, extract_phase3_ms, extract_cache_hits, extract_cache_misses `extract_phase3_ms` disambiguates from `stage_video_extract_ms`: the former is just the parallel ffmpeg extract inside Phase 3, the latter is the full stage (resolve + probe + preflight + extract). All fields optional — the Docker subprocess call site (no perfSummary available) still compiles and ships events without them.
The CLI already ships `render_complete` events to PostHog but only carries a subset of `RenderPerfSummary` (duration_ms, capture_avg_ms, etc.). With #444 adding per-phase breakdown and #446 adding cache counters, the data is sitting in `job.perfSummary.videoExtractBreakdown` and `job.perfSummary.tmpPeakBytes` but never reaching PostHog. This forwards the new fields as flat properties (flat is easier to query with PostHog insights than nested objects): - tmp_peak_bytes - stage_compile_ms, stage_video_extract_ms, stage_audio_process_ms, stage_capture_ms, stage_encode_ms, stage_assemble_ms - extract_resolve_ms, extract_hdr_probe_ms, extract_hdr_preflight_ms, extract_hdr_preflight_count, extract_vfr_probe_ms, extract_vfr_preflight_ms, extract_vfr_preflight_count, extract_phase3_ms, extract_cache_hits, extract_cache_misses `extract_phase3_ms` disambiguates from `stage_video_extract_ms`: the former is just the parallel ffmpeg extract inside Phase 3, the latter is the full stage (resolve + probe + preflight + extract). All fields optional — the Docker subprocess call site (no perfSummary available) still compiles and ships events without them.
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-checked the live head. The vfrPreflightMs semantics and extractMs JSDoc fixes are in place, the latest CI is green, and I found no remaining blockers.
The CLI already ships `render_complete` events to PostHog but only carries a subset of `RenderPerfSummary` (duration_ms, capture_avg_ms, etc.). With #444 adding per-phase breakdown and #446 adding cache counters, the data is sitting in `job.perfSummary.videoExtractBreakdown` and `job.perfSummary.tmpPeakBytes` but never reaching PostHog. This forwards the new fields as flat properties (flat is easier to query with PostHog insights than nested objects): - tmp_peak_bytes - stage_compile_ms, stage_video_extract_ms, stage_audio_process_ms, stage_capture_ms, stage_encode_ms, stage_assemble_ms - extract_resolve_ms, extract_hdr_probe_ms, extract_hdr_preflight_ms, extract_hdr_preflight_count, extract_vfr_probe_ms, extract_vfr_preflight_ms, extract_vfr_preflight_count, extract_phase3_ms, extract_cache_hits, extract_cache_misses `extract_phase3_ms` disambiguates from `stage_video_extract_ms`: the former is just the parallel ffmpeg extract inside Phase 3, the latter is the full stage (resolve + probe + preflight + extract). All fields optional — the Docker subprocess call site (no perfSummary available) still compiles and ships events without them.
## What Scopes the SDR→HDR preflight re-encode to the segment the composition actually uses, mirroring the existing VFR→CFR segment-scope fix. ## Why `convertSdrToHdr` was re-encoding entire source files, so a 30-minute SDR screen recording contributing a 2-second clip in a mixed HDR/SDR composition ate multi-second preflight time that produced frames no one would ever read. Validated on a mixed 30s-SDR + 2s-HDR fixture: `hdrPreflightMs` drops **87%** (1162→148ms), `videoExtractMs` drops **82%** (1272→231ms), `tmpPeakBytes` drops **45%** (8.2MB→4.5MB). Depends on #444 (phase-level instrumentation) for the measurement surface. ## How - `convertSdrToHdr` gains `startTime` and `duration` parameters ahead of the upstream `targetTransfer` arg added by #370. New signature: `convertSdrToHdr(input, output, startTime, duration, targetTransfer, signal, config)`. `-ss $start -t $duration` is added to the ffmpeg args. - Phase 2 now captures the full `VideoMetadata` per `resolvedVideos` entry (previously just `colorSpace`) so the caller can compute `segDuration` from `video.end - video.start` with a fallback to `metadata.durationSeconds - video.mediaStart` for unbounded (Infinity) clips — without firing another ffprobe. - After a successful convert, `entry.video.mediaStart` is zeroed out via shallow-copy (doesn't mutate the caller's `VideoElement`) so downstream extraction seeks from 0 instead of the original offset. Mirrors what the VFR→CFR path already does. ## Test plan Validation on `/tmp/hf-fixtures/hdr-sdr-mixed-scope`: ``` hdrPreflightMs: >1000 → 150 (gate: <300) ✓ videoExtractMs: 1272 → 237 (-82%) tmpPeakBytes: 8.2MB → 4.5MB (-45%) ``` - [x] Unit test: new regression test synthesizes 10s SDR + 2s HDR fixture inline and asserts the converted file's duration matches the 2s used segment (pre-fix matched the 10s source) - [x] Lint + format - [x] Typecheck - [x] Manual perf validation against synthesized fixture
The CLI already ships `render_complete` events to PostHog but only carries a subset of `RenderPerfSummary` (duration_ms, capture_avg_ms, etc.). With #444 adding per-phase breakdown and #446 adding cache counters, the data is sitting in `job.perfSummary.videoExtractBreakdown` and `job.perfSummary.tmpPeakBytes` but never reaching PostHog. This forwards the new fields as flat properties (flat is easier to query with PostHog insights than nested objects): - tmp_peak_bytes - stage_compile_ms, stage_video_extract_ms, stage_audio_process_ms, stage_capture_ms, stage_encode_ms, stage_assemble_ms - extract_resolve_ms, extract_hdr_probe_ms, extract_hdr_preflight_ms, extract_hdr_preflight_count, extract_vfr_probe_ms, extract_vfr_preflight_ms, extract_vfr_preflight_count, extract_phase3_ms, extract_cache_hits, extract_cache_misses `extract_phase3_ms` disambiguates from `stage_video_extract_ms`: the former is just the parallel ffmpeg extract inside Phase 3, the latter is the full stage (resolve + probe + preflight + extract). All fields optional — the Docker subprocess call site (no perfSummary available) still compiles and ships events without them.
## What Adds a content-addressed cache for extracted video frames, keyed on the tuple `(path, mtime, size, mediaStart, duration, fps, format)`. Repeat renders of the same composition (studio edit → re-render, preview → final) skip the ffmpeg extraction entirely. ## Why Video frame extraction is the dominant non-capture phase for video-heavy compositions. Studio iteration workflows extract the same frames over and over — each render burns ffmpeg time that adds no value. Validated on `/tmp/hf-fixtures/cfr-sdr-cache`: ``` Cold (miss): extractMs=69, videoExtractMs=70, totalElapsedMs=2052 Warm (hit): extractMs=1, videoExtractMs=2, totalElapsedMs=1964 cacheHits: 0→1, cacheMisses: 1→0 ``` The fixture is tiny (3s CFR SDR @ 30fps), so the wall-clock delta is small; the extraction-time delta (69→1ms, 98%) scales linearly with source length. For heavy-iteration workflows (a user rendering the same composition while tuning encoding params), extraction time goes to zero on every repeat render. Depends on #444 (instrumentation surface) and #445 (segment-scope HDR preflight — otherwise cache keys would be unstable across renders on mixed-HDR compositions). ## How - New `packages/engine/src/services/extractionCache.ts`: - SHA-256 key over a stable JSON encoding of `(path, mtime_ms, size, mediaStart, duration, fps, format)`. Infinity duration is normalized to `-1` so unresolved natural-duration sources still produce stable keys. - Truncates to 16 hex chars in the entry directory name — 64 bits of entropy is plenty at cache scale and keeps `ls` output short. - `hfcache-v2-` schema prefix — bumping it invalidates old entries (callers own gc policy; the cache owns keys). - `.hf-complete` dotfile sentinel. An entry dir without the sentinel is treated as a miss (covers crash-mid-extract and abandoned writes); the next render re-extracts over the partial frames with `-y`. - `FRAME_FILENAME_PREFIX = "frame_"` shared with the extractor — future refactors only need to touch one place to rename frames. - `EngineConfig.extractCacheDir` (env: `HYPERFRAMES_EXTRACT_CACHE_DIR`) gates the feature. Undefined disables caching — extraction runs into the render's workDir and cleanup removes it on render end, preserving the prior behaviour exactly. No default root is chosen by the engine; the caller (CLI, app, studio) owns the location policy. - `ExtractedFrames.ownedByLookup` flag prevents `FrameLookupTable.cleanup` from rm'ing a shared cache dir at render end. Set to `true` on both hits and misses (misses own the directory they wrote into, but hand it over to the cache rather than deleting it). - Phase 3 extractor flow: 1. Snapshot `(videoPath, mediaStart, start, end)` per resolved video BEFORE Phase 2a/2b preflight mutates them — so cache keys are stable across renders that use workDir-local normalized files (those files have fresh mtimes every render). 2. Compute key, `lookupCacheEntry`. 3. On hit: rebuild `ExtractedFrames` from the cache dir plus the Phase 2-probed `VideoMetadata` — no re-ffprobe. 4. On miss: `ensureCacheEntryDir`, extract with `extractVideoFramesRange(..., outputDirOverride)`, then `markCacheEntryComplete` (the sentinel write is the last step so a crash leaves the dir un-sentineled). - `extractVideoFramesRange` gains an `outputDirOverride` parameter so cache-miss writes land directly in the keyed dir (no `join(outputDir, videoId)` wrapping). ## Test plan - [x] 19 unit tests in `extractionCache.test.ts` covering key determinism, mtime/size invalidation, format/fps/mediaStart/duration invalidation, Infinity normalization, sentinel semantics, missing-file tolerance - [x] 2 integration tests in `videoFrameExtractor.test.ts`: - "reuses extracted frames on a warm cache hit" — asserts `cacheHits=1`, `extractMs<50ms` on second call against a CFR SDR fixture - "invalidates the cache when fps changes" — different fps on second call forces a new miss - [x] End-to-end validation with `HYPERFRAMES_EXTRACT_CACHE_DIR` set, two runs of the same fixture - [x] Lint + format (oxlint + oxfmt) - [x] Typecheck (engine + producer)
The CLI already ships `render_complete` events to PostHog but only carries a subset of `RenderPerfSummary` (duration_ms, capture_avg_ms, etc.). With #444 adding per-phase breakdown and #446 adding cache counters, the data is sitting in `job.perfSummary.videoExtractBreakdown` and `job.perfSummary.tmpPeakBytes` but never reaching PostHog. This forwards the new fields as flat properties (flat is easier to query with PostHog insights than nested objects): - tmp_peak_bytes - stage_compile_ms, stage_video_extract_ms, stage_audio_process_ms, stage_capture_ms, stage_encode_ms, stage_assemble_ms - extract_resolve_ms, extract_hdr_probe_ms, extract_hdr_preflight_ms, extract_hdr_preflight_count, extract_vfr_probe_ms, extract_vfr_preflight_ms, extract_vfr_preflight_count, extract_phase3_ms, extract_cache_hits, extract_cache_misses `extract_phase3_ms` disambiguates from `stage_video_extract_ms`: the former is just the parallel ffmpeg extract inside Phase 3, the latter is the full stage (resolve + probe + preflight + extract). All fields optional — the Docker subprocess call site (no perfSummary available) still compiles and ships events without them.
The CLI already ships `render_complete` events to PostHog but only carries a subset of `RenderPerfSummary` (duration_ms, capture_avg_ms, etc.). With #444 adding per-phase breakdown and #446 adding cache counters, the data is sitting in `job.perfSummary.videoExtractBreakdown` and `job.perfSummary.tmpPeakBytes` but never reaching PostHog. This forwards the new fields as flat properties (flat is easier to query with PostHog insights than nested objects): - tmp_peak_bytes - stage_compile_ms, stage_video_extract_ms, stage_audio_process_ms, stage_capture_ms, stage_encode_ms, stage_assemble_ms - extract_resolve_ms, extract_hdr_probe_ms, extract_hdr_preflight_ms, extract_hdr_preflight_count, extract_vfr_probe_ms, extract_vfr_preflight_ms, extract_vfr_preflight_count, extract_phase3_ms, extract_cache_hits, extract_cache_misses `extract_phase3_ms` disambiguates from `stage_video_extract_ms`: the former is just the parallel ffmpeg extract inside Phase 3, the latter is the full stage (resolve + probe + preflight + extract). All fields optional — the Docker subprocess call site (no perfSummary available) still compiles and ships events without them.
…#454) ## What Forwards the new per-phase extraction breakdown and `tmpPeakBytes` fields from `RenderPerfSummary` (added in #444 and #446) to PostHog via the CLI's existing `render_complete` telemetry event. ## Why The CLI already ships `render_complete` events to PostHog (`packages/cli/src/telemetry/client.ts`), but `events.ts:trackRenderComplete` only carried a subset of `RenderPerfSummary` — top-level timings, composition dims, and memory snapshots. After #444 added per-phase extraction breakdown (`videoExtractBreakdown`) and #446 added cache hit/miss counters, the data lives on `job.perfSummary` at render-complete but never reaches PostHog dashboards. Without this, any PostHog insight built around "how often are we hitting the cache?", "what's the median HDR preflight cost?", or "where in the extract phase do compositions spend time?" has to be answered by Datadog log scraping instead. ## How - **`packages/cli/src/telemetry/events.ts`** — extend `trackRenderComplete` props with 17 new optional fields: `tmpPeakBytes`, the six named stage timings, and the ten `videoExtractBreakdown` fields. All sent as flat properties (`extract_cache_hits`, `stage_capture_ms`, etc.) — PostHog insights query flat keys more ergonomically than nested objects. - **`packages/cli/src/commands/render.ts`** — wire `job.perfSummary.videoExtractBreakdown` / `stages` / `tmpPeakBytes` into the `trackRenderMetrics` → `trackRenderComplete` hand-off. - Naming: `extract_phase3_ms` deliberately disambiguates from `stage_video_extract_ms` — the former is just the parallel ffmpeg extract inside Phase 3; the latter is the full stage (resolve + probe + preflight + extract). - All new fields are optional. The Docker-subprocess branch of `render.ts` that doesn't have a local `perfSummary` still compiles and ships events without them. ## Test plan - [x] `bun run --cwd packages/cli test` — 161/161 pass - [x] `bunx tsc -p packages/cli/tsconfig.json --noEmit` — no errors - [x] `bunx oxlint` + `bunx oxfmt` — clean - [ ] Once merged, verify PostHog receives the new properties on a real render event (run `hyperframes render` against a fixture and watch PostHog ingestion — telemetry auto-disables in CI, so this requires a local dev render with `HYPERFRAMES_NO_TELEMETRY` unset). ## Stack Depends on #444 (adds the `videoExtractBreakdown` + `tmpPeakBytes` fields to `RenderPerfSummary`) and transitively on #445 → #446. ## Future work (not in this PR) - The HeyGen internal producer server (`hyperframes-internal/packages/producer/src/server.ts`) logs `perfSummary` to Datadog via `log.info` but has no PostHog integration. Production renders are the bulk of the traffic — separate PR to either ship perfSummary to PostHog from the internal server, or materialize Datadog log-based metrics for per-phase timings.

What
Adds per-phase timings and counters to
extractAllVideoFramesand surfaces them on the producer'sRenderPerfSummaryasvideoExtractBreakdownalongside a newtmpPeakBytesworkDir size sample.Why
Phase 2 video extraction has five distinct sub-phases (resolve, HDR probe, HDR preflight, VFR probe, VFR preflight, per-video extract) and today they collapse into a single
videoExtractMsstage timing. That makes every subsequent perf PR in this stack immeasurable — you can't tell whether a win came from cache hits, preflight scope reduction, or pure extraction speed.This PR is foundational for PR #445 (segment-scope HDR preflight) and PR #446 (content-addressed extraction cache).
How
ExtractionPhaseBreakdowntype withresolveMs,hdrProbeMs,hdrPreflightMs/Count,vfrProbeMs,vfrPreflightMs/Count,extractMs,cacheHits,cacheMisses. Populated inline withDate.now()wrappers — overhead is sub-millisecond on every phase.ExtractionResult.phaseBreakdown.RenderPerfSummarywithvideoExtractBreakdown?: ExtractionPhaseBreakdownandtmpPeakBytes?: number.tmpPeakBytesis sampled from the workDir right before cleanup via a new recursive-size helper that swallows errors (purely observational — a missing workDir must never fail the render).No changes to the capture-lifecycle resource tracking — earlier versions of this instrumentation plumbed injector LRU stats through
RenderOrchestrator, which conflicted hard with upstream #371 (buildHdrCaptureOptionsrefactor). Dropped that piece for a marginal observability loss.Test plan
Validation on
packages/producer/tests/vfr-screen-recording:Total elapsed within noise of pre-PR baseline (2665 → 2673 → 3228ms across hosts).
videoFrameExtractor.test.ts