test(pt_expt): shrink change-bias water dataset to 5 frames#5467
test(pt_expt): shrink change-bias water dataset to 5 frames#5467wanghan-iapcm wants to merge 2 commits into
Conversation
``TestChangeBias`` was the dominant memory hog in the ``Test Python`` shard ``(10, 3.13)`` of the CI matrix — by itself it peaked at ~5 GB RSS, leaving so little headroom under the 7 GB GitHub-hosted runner ceiling that the shard sporadically lost communication with the GitHub Actions server (intermittent ``runner lost communication`` flake observed across many recent PRs). Profile finding: peak RSS scales **linearly at ~50 MB per frame** during ``dp change-bias``'s in-process ``main(cmds)`` call. The forward over ``compute_output_stats`` enumerates ``nbatches = min( data.get_nbatches()) = 80`` frames of the water example, and each frame leaks ~50 MB into torch's caching allocator (not autograd — the wrapper is already in ``torch.no_grad()``; the leak is in ``forward_common_atomic`` somewhere and is a separate bug). Constraint: we **must** keep ``nbatches == total dataset frames`` to preserve determinism for ``test_change_bias_pt2_pte_consistency`` which compares two .pte and .pt2 invocations with ``atol=1e-10``. ``_load_batch_set`` shuffles the loaded set, so a value of ``nbatches < total_frames`` would sample a random subset and the two calls (running in the same Python process with an advancing ``dp_random`` state) would see different frames and produce different biases. Full enumeration sees every frame and so the aggregate bias is invariant under shuffle. Solution: build a 5-frame subset of ``examples/water/data/data_0`` in ``TestChangeBias.setUpClass`` and point both the trainer config and the change-bias ``-s`` argument at it. ``nbatches`` then resolves to 5 (= the new dataset size, = full enumeration), peak RSS drops to ~1.7 GB for the whole class, and all 9 tests in the class (including the strict atol=1e-10 consistency check) still pass. Class wall time also improves (~3:55 → less data-loop work in each change-bias invocation).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughAdds ChangesTest Dataset Subsetting
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
source/tests/pt_expt/test_change_bias.py (1)
153-154: 💤 Low valueConsider validating array shape before truncation.
The function truncates arrays to
n_frameswithout checking if they contain at least that many frames. While this works for the current use case (80 frames → 5 frames), it could silently create inconsistent datasets if some arrays have fewer frames than requested.🛡️ Optional defensive check
for fname in os.listdir(src_set): if not fname.endswith(".npy"): continue arr = np.load(os.path.join(src_set, fname)) + if len(arr) < n_frames: + raise ValueError( + f"{fname} has only {len(arr)} frames, need at least {n_frames}" + ) np.save(os.path.join(dst_set, fname), arr[:n_frames])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/pt_expt/test_change_bias.py` around lines 153 - 154, Before truncating and saving, check the loaded array's frame dimension (arr) to ensure arr.shape[0] >= n_frames; if it's smaller, either raise a clear error or handle it (e.g., skip the file or pad to n_frames) rather than silently slicing. Update the block that loads and writes arrays (referencing arr, fname, n_frames, src_set, dst_set) to perform this validation and take the chosen action so datasets remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Line 141: Remove the redundant in-function import of numpy inside the
_make_subset_dataset function: delete the line "import numpy as np" from within
the function and rely on the existing top-level "import numpy as np" already
present in the module so all uses of np inside _make_subset_dataset reference
the module-level symbol.
---
Nitpick comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 153-154: Before truncating and saving, check the loaded array's
frame dimension (arr) to ensure arr.shape[0] >= n_frames; if it's smaller,
either raise a clear error or handle it (e.g., skip the file or pad to n_frames)
rather than silently slicing. Update the block that loads and writes arrays
(referencing arr, fname, n_frames, src_set, dst_set) to perform this validation
and take the chosen action so datasets remain consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a02ef6fa-f542-4bfd-8cf7-4d4ba6e23201
📒 Files selected for processing (1)
source/tests/pt_expt/test_change_bias.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5467 +/- ##
=======================================
Coverage 82.46% 82.46%
=======================================
Files 829 829
Lines 88763 88762 -1
Branches 4225 4225
=======================================
+ Hits 73197 73198 +1
+ Misses 14274 14273 -1
+ Partials 1292 1291 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
numpy is already imported at module level; the in-function import in _make_subset_dataset triggers ruff PLC0415.
Summary
TestChangeBiasis the dominant memory hog inTest Pythonshard(10, 3.13)of the CI matrix — by itself it peaks at ~5 GB RSS, leaving so little headroom under the 7 GB GitHub-hosted runner that the shard intermittently loses communication with the GitHub Actions server. This causes the recurringrunner lost communicationfailure that has affected many recent PRs (#5446, #5448, #5450, #5455, #5456, …).This PR shrinks the change-bias test dataset from 80 frames to 5 frames, dropping the class's peak RSS to ~1.7 GB while keeping all 9 tests passing — including the strict
atol=1e-10pt2_pte_consistencycheck.How I located it
Local reproduction of shard
(10, 3.13)(using the same.test_durationscache CI uses, so identical test partitioning):test_change_bias_frozen_pteTestDeepEvalEnerPt2classTestDeepEvalEnerAparamPt2classTestSpinInference::test_get_use_spintest_finetune_from_pt2_use_pretrain_scripttest_training_loop_compiledtest_export_pipelinetest_descriptor_shape_dpa1Then phase-by-phase RSS profiling inside
dp change-biasshowed the 4.3 GB jump happens entirely insidecompute_output_stats→_compute_model_predict. Scaling experiment confirms it: peak grows linearly at ~50 MB per frame of input data.That's a leak in the
torch.no_grad()-wrappedforward_common_atomicsomewhere — separate from autograd. The water example has 80 frames at batch_size=1, so the CLI defaultnbatches = min(data.get_nbatches()) = 80triggers all 80 forwards in one go.Why I can't just pass
-n 5_load_batch_setshuffles when it loads the set. Ifnbatches < total_frames, the loop samples a random subset — and the two calls intest_change_bias_pt2_pte_consistency(running in the same Python process viamain(cmds), withdp_random's state advancing between calls) would see different subsets → different biases → theatol=1e-10assertion fails.nbatches == total_framesmakes the forward enumerate every frame regardless of shuffle order, so the aggregate bias is invariant under shuffle. Determinism is preserved.The fix
Build a 5-frame subset of
examples/water/data/data_0inTestChangeBias.setUpClassand point both the trainer config and the change-bias-sargument at it.nbatchesthen resolves to 5 (= the new dataset size = full enumeration), and all 9 tests pass with peak RSS at ~1.7 GB.Test plan
TestChangeBiaspass locally (CPU fp64):test_change_bias_with_datatest_change_bias_with_data_sys_filetest_change_bias_with_user_definedtest_change_bias_frozen_ptetest_change_bias_frozen_pt2test_change_bias_frozen_pt2_user_definedtest_change_bias_pt2_pte_consistency(atol=1e-10 — the determinism-sensitive one)test_change_bias_pte_preserves_model_def_scripttest_change_bias_pt2_preserves_model_def_scriptru_maxrss):(10, 3.13)confirms no morerunner lost communicationon this branch (pending CI run)Known limitations
forward_common_atomicremains as a production bug. Users runningdp change-biason real datasets with thousands of frames will see multi-GB RSS growth. Worth a separate follow-up to find and patch the leak._make_subset_datasetonly handlesset.000; multi-set datasets would need extension. Not needed for water/data_0.Summary by CodeRabbit