test(pt_expt): shrink change-bias water dataset to 5 frames by wanghan-iapcm · Pull Request #5467 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-05-27T08:19:01Z

Summary

TestChangeBias is the dominant memory hog in Test Python shard (10, 3.13) of the CI matrix — by itself it peaks at ~5 GB RSS, leaving so little headroom under the 7 GB GitHub-hosted runner that the shard intermittently loses communication with the GitHub Actions server. This causes the recurring runner lost communication failure that has affected many recent PRs (#5446, #5448, #5450, #5455, #5456, …).

This PR shrinks the change-bias test dataset from 80 frames to 5 frames, dropping the class's peak RSS to ~1.7 GB while keeping all 9 tests passing — including the strict atol=1e-10 pt2_pte_consistency check.

How I located it

Local reproduction of shard (10, 3.13) (using the same .test_durations cache CI uses, so identical test partitioning):

Test profiled in isolation	Peak RSS
`test_change_bias_frozen_pte`	5.04 GB ← outlier
`TestDeepEvalEnerPt2` class	1.43 GB
`TestDeepEvalEnerAparamPt2` class	1.44 GB
`TestSpinInference::test_get_use_spin`	1.41 GB
`test_finetune_from_pt2_use_pretrain_script`	1.41 GB
`test_training_loop_compiled`	1.32 GB
`test_export_pipeline`	1.57 GB
`test_descriptor_shape_dpa1`	1.34 GB

Then phase-by-phase RSS profiling inside dp change-bias showed the 4.3 GB jump happens entirely inside compute_output_stats → _compute_model_predict. Scaling experiment confirms it: peak grows linearly at ~50 MB per frame of input data.

nbatches	Peak RSS	per-frame
1	567 MB	—
5	781 MB	+43 MB
20	1583 MB	+53 MB
80	4797 MB	+53 MB

That's a leak in the torch.no_grad()-wrapped forward_common_atomic somewhere — separate from autograd. The water example has 80 frames at batch_size=1, so the CLI default nbatches = min(data.get_nbatches()) = 80 triggers all 80 forwards in one go.

Why I can't just pass `-n 5`

_load_batch_set shuffles when it loads the set. If nbatches < total_frames, the loop samples a random subset — and the two calls in test_change_bias_pt2_pte_consistency (running in the same Python process via main(cmds), with dp_random's state advancing between calls) would see different subsets → different biases → the atol=1e-10 assertion fails.

nbatches == total_frames makes the forward enumerate every frame regardless of shuffle order, so the aggregate bias is invariant under shuffle. Determinism is preserved.

The fix

Build a 5-frame subset of examples/water/data/data_0 in TestChangeBias.setUpClass and point both the trainer config and the change-bias -s argument at it. nbatches then resolves to 5 (= the new dataset size = full enumeration), and all 9 tests pass with peak RSS at ~1.7 GB.

Test plan

All 9 tests in TestChangeBias pass locally (CPU fp64):
- test_change_bias_with_data
- test_change_bias_with_data_sys_file
- test_change_bias_with_user_defined
- test_change_bias_frozen_pte
- test_change_bias_frozen_pt2
- test_change_bias_frozen_pt2_user_defined
- test_change_bias_pt2_pte_consistency (atol=1e-10 — the determinism-sensitive one)
- test_change_bias_pte_preserves_model_def_script
- test_change_bias_pt2_preserves_model_def_script
Peak RSS measurement (kernel ru_maxrss):
- Before: 5.66 GB
- After: 1.62 GB (single test) / 1.75 GB (whole class)
CI shard (10, 3.13) confirms no more runner lost communication on this branch (pending CI run)

Known limitations

The underlying ~50 MB/frame leak in forward_common_atomic remains as a production bug. Users running dp change-bias on real datasets with thousands of frames will see multi-GB RSS growth. Worth a separate follow-up to find and patch the leak.
The 5-frame number is somewhat arbitrary. It's chosen as the smallest value that (a) keeps the assertion logic working, (b) leaves room for the least-squares regression (need ≥ ntypes = 2 frames).
_make_subset_dataset only handles set.000; multi-set datasets would need extension. Not needed for water/data_0.

Summary by CodeRabbit

Tests
- Reduced the dataset used by a change-bias test to a small, fixed subset (5 frames) so test runs use far less disk space and complete faster.
- Test setup now builds and points to the truncated dataset for all related invocations, lowering resource overhead during CI and local testing.

``TestChangeBias`` was the dominant memory hog in the ``Test Python`` shard ``(10, 3.13)`` of the CI matrix — by itself it peaked at ~5 GB RSS, leaving so little headroom under the 7 GB GitHub-hosted runner ceiling that the shard sporadically lost communication with the GitHub Actions server (intermittent ``runner lost communication`` flake observed across many recent PRs). Profile finding: peak RSS scales **linearly at ~50 MB per frame** during ``dp change-bias``'s in-process ``main(cmds)`` call. The forward over ``compute_output_stats`` enumerates ``nbatches = min( data.get_nbatches()) = 80`` frames of the water example, and each frame leaks ~50 MB into torch's caching allocator (not autograd — the wrapper is already in ``torch.no_grad()``; the leak is in ``forward_common_atomic`` somewhere and is a separate bug). Constraint: we **must** keep ``nbatches == total dataset frames`` to preserve determinism for ``test_change_bias_pt2_pte_consistency`` which compares two .pte and .pt2 invocations with ``atol=1e-10``. ``_load_batch_set`` shuffles the loaded set, so a value of ``nbatches < total_frames`` would sample a random subset and the two calls (running in the same Python process with an advancing ``dp_random`` state) would see different frames and produce different biases. Full enumeration sees every frame and so the aggregate bias is invariant under shuffle. Solution: build a 5-frame subset of ``examples/water/data/data_0`` in ``TestChangeBias.setUpClass`` and point both the trainer config and the change-bias ``-s`` argument at it. ``nbatches`` then resolves to 5 (= the new dataset size, = full enumeration), peak RSS drops to ~1.7 GB for the whole class, and all 9 tests in the class (including the strict atol=1e-10 consistency check) still pass. Class wall time also improves (~3:55 → less data-loop work in each change-bias invocation).

coderabbitai · 2026-05-27T08:23:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1f2b0b40-7913-4151-9276-f30ebe926538

📥 Commits

Reviewing files that changed from the base of the PR and between dacf606 and 9706ede.

📒 Files selected for processing (1)

source/tests/pt_expt/test_change_bias.py

💤 Files with no reviewable changes (1)

source/tests/pt_expt/test_change_bias.py

📝 Walkthrough

Walkthrough

Adds _make_subset_dataset and updates TestChangeBias.setUpClass to build and use a truncated example dataset containing the first 5 frames instead of the full example data.

Changes

Test Dataset Subsetting

Layer / File(s)	Summary
Dataset subset helper `source/tests/pt_expt/test_change_bias.py`	New `_make_subset_dataset(src_system, dst_system, n_frames)` helper copies `type.raw`/`type_map.raw` when present and truncates `.npy` arrays under `set.000` to the first `n_frames` frames into the destination directory.
Test setup integration `source/tests/pt_expt/test_change_bias.py`	`TestChangeBias.setUpClass` now calls the helper to create `cls.tmpdir/data/data_0` with 5 frames and sets `cls.data_dir`/`cls.data_file` to that truncated dataset for training and `dp change-bias` invocations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

iProzd
njzjz-bot

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: shrinking the change-bias water dataset to 5 frames for TestChangeBias tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

source/tests/pt_expt/test_change_bias.py (1)

153-154: 💤 Low value

Consider validating array shape before truncation.

The function truncates arrays to n_frames without checking if they contain at least that many frames. While this works for the current use case (80 frames → 5 frames), it could silently create inconsistent datasets if some arrays have fewer frames than requested.

🛡️ Optional defensive check

     for fname in os.listdir(src_set):
         if not fname.endswith(".npy"):
             continue
         arr = np.load(os.path.join(src_set, fname))
+        if len(arr) < n_frames:
+            raise ValueError(
+                f"{fname} has only {len(arr)} frames, need at least {n_frames}"
+            )
         np.save(os.path.join(dst_set, fname), arr[:n_frames])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/test_change_bias.py` around lines 153 - 154, Before
truncating and saving, check the loaded array's frame dimension (arr) to ensure
arr.shape[0] >= n_frames; if it's smaller, either raise a clear error or handle
it (e.g., skip the file or pad to n_frames) rather than silently slicing. Update
the block that loads and writes arrays (referencing arr, fname, n_frames,
src_set, dst_set) to perform this validation and take the chosen action so
datasets remain consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Line 141: Remove the redundant in-function import of numpy inside the
_make_subset_dataset function: delete the line "import numpy as np" from within
the function and rely on the existing top-level "import numpy as np" already
present in the module so all uses of np inside _make_subset_dataset reference
the module-level symbol.

---

Nitpick comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 153-154: Before truncating and saving, check the loaded array's
frame dimension (arr) to ensure arr.shape[0] >= n_frames; if it's smaller,
either raise a clear error or handle it (e.g., skip the file or pad to n_frames)
rather than silently slicing. Update the block that loads and writes arrays
(referencing arr, fname, n_frames, src_set, dst_set) to perform this validation
and take the chosen action so datasets remain consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a02ef6fa-f542-4bfd-8cf7-4d4ba6e23201

📥 Commits

Reviewing files that changed from the base of the PR and between 4e64f8b and dacf606.

📒 Files selected for processing (1)

source/tests/pt_expt/test_change_bias.py

codecov · 2026-05-27T09:16:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.46%. Comparing base (4e64f8b) to head (9706ede).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5467   +/-   ##
=======================================
  Coverage   82.46%   82.46%           
=======================================
  Files         829      829           
  Lines       88763    88762    -1     
  Branches     4225     4225           
=======================================
+ Hits        73197    73198    +1     
+ Misses      14274    14273    -1     
+ Partials     1292     1291    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

numpy is already imported at module level; the in-function import in _make_subset_dataset triggers ruff PLC0415.

github-actions Bot added the Python label May 27, 2026

wanghan-iapcm requested a review from njzjz May 27, 2026 08:22

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread source/tests/pt_expt/test_change_bias.py Outdated

njzjz approved these changes May 27, 2026

View reviewed changes

test(pt_expt): drop redundant in-function numpy import

9706ede

numpy is already imported at module level; the in-function import in _make_subset_dataset triggers ruff PLC0415.

wanghan-iapcm enabled auto-merge May 27, 2026 10:08

wanghan-iapcm added this pull request to the merge queue May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(pt_expt): shrink change-bias water dataset to 5 frames#5467

test(pt_expt): shrink change-bias water dataset to 5 frames#5467
wanghan-iapcm wants to merge 2 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix-test-change-bias-mem

wanghan-iapcm commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanghan-iapcm commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How I located it

Why I can't just pass -n 5

The fix

Test plan

Known limitations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wanghan-iapcm commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Why I can't just pass `-n 5`

coderabbitai Bot commented May 27, 2026 •

edited

Loading

codecov Bot commented May 27, 2026 •

edited

Loading