Skip to content

test(pt_expt): shrink change-bias water dataset to 5 frames#5467

Queued
wanghan-iapcm wants to merge 2 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix-test-change-bias-mem
Queued

test(pt_expt): shrink change-bias water dataset to 5 frames#5467
wanghan-iapcm wants to merge 2 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix-test-change-bias-mem

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented May 27, 2026

Summary

TestChangeBias is the dominant memory hog in Test Python shard (10, 3.13) of the CI matrix — by itself it peaks at ~5 GB RSS, leaving so little headroom under the 7 GB GitHub-hosted runner that the shard intermittently loses communication with the GitHub Actions server. This causes the recurring runner lost communication failure that has affected many recent PRs (#5446, #5448, #5450, #5455, #5456, …).

This PR shrinks the change-bias test dataset from 80 frames to 5 frames, dropping the class's peak RSS to ~1.7 GB while keeping all 9 tests passing — including the strict atol=1e-10 pt2_pte_consistency check.

How I located it

Local reproduction of shard (10, 3.13) (using the same .test_durations cache CI uses, so identical test partitioning):

Test profiled in isolation Peak RSS
test_change_bias_frozen_pte 5.04 GB ← outlier
TestDeepEvalEnerPt2 class 1.43 GB
TestDeepEvalEnerAparamPt2 class 1.44 GB
TestSpinInference::test_get_use_spin 1.41 GB
test_finetune_from_pt2_use_pretrain_script 1.41 GB
test_training_loop_compiled 1.32 GB
test_export_pipeline 1.57 GB
test_descriptor_shape_dpa1 1.34 GB

Then phase-by-phase RSS profiling inside dp change-bias showed the 4.3 GB jump happens entirely inside compute_output_stats_compute_model_predict. Scaling experiment confirms it: peak grows linearly at ~50 MB per frame of input data.

nbatches Peak RSS per-frame
1 567 MB
5 781 MB +43 MB
20 1583 MB +53 MB
80 4797 MB +53 MB

That's a leak in the torch.no_grad()-wrapped forward_common_atomic somewhere — separate from autograd. The water example has 80 frames at batch_size=1, so the CLI default nbatches = min(data.get_nbatches()) = 80 triggers all 80 forwards in one go.

Why I can't just pass -n 5

_load_batch_set shuffles when it loads the set. If nbatches < total_frames, the loop samples a random subset — and the two calls in test_change_bias_pt2_pte_consistency (running in the same Python process via main(cmds), with dp_random's state advancing between calls) would see different subsets → different biases → the atol=1e-10 assertion fails.

nbatches == total_frames makes the forward enumerate every frame regardless of shuffle order, so the aggregate bias is invariant under shuffle. Determinism is preserved.

The fix

Build a 5-frame subset of examples/water/data/data_0 in TestChangeBias.setUpClass and point both the trainer config and the change-bias -s argument at it. nbatches then resolves to 5 (= the new dataset size = full enumeration), and all 9 tests pass with peak RSS at ~1.7 GB.

Test plan

  • All 9 tests in TestChangeBias pass locally (CPU fp64):
    • test_change_bias_with_data
    • test_change_bias_with_data_sys_file
    • test_change_bias_with_user_defined
    • test_change_bias_frozen_pte
    • test_change_bias_frozen_pt2
    • test_change_bias_frozen_pt2_user_defined
    • test_change_bias_pt2_pte_consistency (atol=1e-10 — the determinism-sensitive one)
    • test_change_bias_pte_preserves_model_def_script
    • test_change_bias_pt2_preserves_model_def_script
  • Peak RSS measurement (kernel ru_maxrss):
    • Before: 5.66 GB
    • After: 1.62 GB (single test) / 1.75 GB (whole class)
  • CI shard (10, 3.13) confirms no more runner lost communication on this branch (pending CI run)

Known limitations

  • The underlying ~50 MB/frame leak in forward_common_atomic remains as a production bug. Users running dp change-bias on real datasets with thousands of frames will see multi-GB RSS growth. Worth a separate follow-up to find and patch the leak.
  • The 5-frame number is somewhat arbitrary. It's chosen as the smallest value that (a) keeps the assertion logic working, (b) leaves room for the least-squares regression (need ≥ ntypes = 2 frames).
  • _make_subset_dataset only handles set.000; multi-set datasets would need extension. Not needed for water/data_0.

Summary by CodeRabbit

  • Tests
    • Reduced the dataset used by a change-bias test to a small, fixed subset (5 frames) so test runs use far less disk space and complete faster.
    • Test setup now builds and points to the truncated dataset for all related invocations, lowering resource overhead during CI and local testing.

Review Change Stack

``TestChangeBias`` was the dominant memory hog in the ``Test Python``
shard ``(10, 3.13)`` of the CI matrix — by itself it peaked at ~5 GB
RSS, leaving so little headroom under the 7 GB GitHub-hosted runner
ceiling that the shard sporadically lost communication with the
GitHub Actions server (intermittent ``runner lost communication``
flake observed across many recent PRs).

Profile finding: peak RSS scales **linearly at ~50 MB per frame**
during ``dp change-bias``'s in-process ``main(cmds)`` call.  The
forward over ``compute_output_stats`` enumerates ``nbatches = min(
data.get_nbatches()) = 80`` frames of the water example, and each
frame leaks ~50 MB into torch's caching allocator (not autograd —
the wrapper is already in ``torch.no_grad()``; the leak is in
``forward_common_atomic`` somewhere and is a separate bug).

Constraint: we **must** keep ``nbatches == total dataset frames``
to preserve determinism for ``test_change_bias_pt2_pte_consistency``
which compares two .pte and .pt2 invocations with ``atol=1e-10``.
``_load_batch_set`` shuffles the loaded set, so a value of
``nbatches < total_frames`` would sample a random subset and the
two calls (running in the same Python process with an advancing
``dp_random`` state) would see different frames and produce
different biases.  Full enumeration sees every frame and so the
aggregate bias is invariant under shuffle.

Solution: build a 5-frame subset of ``examples/water/data/data_0``
in ``TestChangeBias.setUpClass`` and point both the trainer config
and the change-bias ``-s`` argument at it.  ``nbatches`` then
resolves to 5 (= the new dataset size, = full enumeration), peak
RSS drops to ~1.7 GB for the whole class, and all 9 tests in the
class (including the strict atol=1e-10 consistency check) still
pass.  Class wall time also improves (~3:55 → less data-loop work
in each change-bias invocation).
@wanghan-iapcm wanghan-iapcm requested a review from njzjz May 27, 2026 08:22
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1f2b0b40-7913-4151-9276-f30ebe926538

📥 Commits

Reviewing files that changed from the base of the PR and between dacf606 and 9706ede.

📒 Files selected for processing (1)
  • source/tests/pt_expt/test_change_bias.py
💤 Files with no reviewable changes (1)
  • source/tests/pt_expt/test_change_bias.py

📝 Walkthrough

Walkthrough

Adds _make_subset_dataset and updates TestChangeBias.setUpClass to build and use a truncated example dataset containing the first 5 frames instead of the full example data.

Changes

Test Dataset Subsetting

Layer / File(s) Summary
Dataset subset helper
source/tests/pt_expt/test_change_bias.py
New _make_subset_dataset(src_system, dst_system, n_frames) helper copies type.raw/type_map.raw when present and truncates .npy arrays under set.000 to the first n_frames frames into the destination directory.
Test setup integration
source/tests/pt_expt/test_change_bias.py
TestChangeBias.setUpClass now calls the helper to create cls.tmpdir/data/data_0 with 5 frames and sets cls.data_dir/cls.data_file to that truncated dataset for training and dp change-bias invocations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • iProzd
  • njzjz-bot
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: shrinking the change-bias water dataset to 5 frames for TestChangeBias tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
source/tests/pt_expt/test_change_bias.py (1)

153-154: 💤 Low value

Consider validating array shape before truncation.

The function truncates arrays to n_frames without checking if they contain at least that many frames. While this works for the current use case (80 frames → 5 frames), it could silently create inconsistent datasets if some arrays have fewer frames than requested.

🛡️ Optional defensive check
     for fname in os.listdir(src_set):
         if not fname.endswith(".npy"):
             continue
         arr = np.load(os.path.join(src_set, fname))
+        if len(arr) < n_frames:
+            raise ValueError(
+                f"{fname} has only {len(arr)} frames, need at least {n_frames}"
+            )
         np.save(os.path.join(dst_set, fname), arr[:n_frames])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/test_change_bias.py` around lines 153 - 154, Before
truncating and saving, check the loaded array's frame dimension (arr) to ensure
arr.shape[0] >= n_frames; if it's smaller, either raise a clear error or handle
it (e.g., skip the file or pad to n_frames) rather than silently slicing. Update
the block that loads and writes arrays (referencing arr, fname, n_frames,
src_set, dst_set) to perform this validation and take the chosen action so
datasets remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Line 141: Remove the redundant in-function import of numpy inside the
_make_subset_dataset function: delete the line "import numpy as np" from within
the function and rely on the existing top-level "import numpy as np" already
present in the module so all uses of np inside _make_subset_dataset reference
the module-level symbol.

---

Nitpick comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 153-154: Before truncating and saving, check the loaded array's
frame dimension (arr) to ensure arr.shape[0] >= n_frames; if it's smaller,
either raise a clear error or handle it (e.g., skip the file or pad to n_frames)
rather than silently slicing. Update the block that loads and writes arrays
(referencing arr, fname, n_frames, src_set, dst_set) to perform this validation
and take the chosen action so datasets remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a02ef6fa-f542-4bfd-8cf7-4d4ba6e23201

📥 Commits

Reviewing files that changed from the base of the PR and between 4e64f8b and dacf606.

📒 Files selected for processing (1)
  • source/tests/pt_expt/test_change_bias.py

Comment thread source/tests/pt_expt/test_change_bias.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.46%. Comparing base (4e64f8b) to head (9706ede).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5467   +/-   ##
=======================================
  Coverage   82.46%   82.46%           
=======================================
  Files         829      829           
  Lines       88763    88762    -1     
  Branches     4225     4225           
=======================================
+ Hits        73197    73198    +1     
+ Misses      14274    14273    -1     
+ Partials     1292     1291    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

numpy is already imported at module level; the in-function import in
_make_subset_dataset triggers ruff PLC0415.
@wanghan-iapcm wanghan-iapcm enabled auto-merge May 27, 2026 10:08
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants