Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap#213
Open
lmoresi wants to merge 4 commits into
Open
Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap#213lmoresi wants to merge 4 commits into
lmoresi wants to merge 4 commits into
Conversation
The OT/MA mesh-motion movers (meshing/smoothing.py) crashed the heap (probabilistic SEGV/SIGBUS/MPI-deadlock) in parallel at np>=3. Root cause: MUMPS (parallel LU) re-solved repeatedly corrupts the heap — NOT the singular pure-Neumann solve, n_components>dim, or DMClone-shared caches (all disproved; the default GMRES+GAMG never touches MUMPS and is clean). Every mover sub-solve was wired through _use_direct_solver (lagged MUMPS LU), and _use_iterative_solver reintroduced MUMPS as the GAMG coarse solver. Fixes (mover-local; serial bit-identical, MUMPS kept as the serial 10x lever): - _use_direct_solver: fall back to the iterative path when uw.mpi.size > 1. - _use_iterative_solver: GAMG coarse uses redundant+svd (not MUMPS) under MPI. - _wire: forward `elliptic` so the parallel fallback picks GAMG vs CG+Jacobi. - _patch_volumes: in parallel, assemble the per-vertex dual area (= lumped P1 mass diagonal) via the FE mass matrix (M.1) so PETSc does the cross-rank localToGlobal(ADD); the hand-rolled local sum under-counted partition-boundary vertices (under-refined parallel grading). Serial keeps the numpy path. Annotated to use DMCreateMassMatrixLumped once petsc4py binds it. Verified at np=5: OT/MA movers + OT_adapt(field) crash/hang-free; all solver types converge and match serial to ~1e-10; _patch_volumes serial bit-identical (maxdiff 2.6e-18) and parallel area-conserving; tier-A 177 passed / 0 failed. Underworld development team with AI support from Claude Code
Underworld development team with AI support from Claude Code
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a reproducible parallel crash/hang in the adaptive mesh movers by avoiding repeated parallel MUMPS usage (direct LU) and by fixing a parallel correctness issue in per-vertex patch-volume computation used by equidistribution.
Changes:
- Disable the MUMPS-based “direct” solver path under MPI by falling back to the iterative configuration; also remove MUMPS from the GAMG coarse solve in parallel (use
redundant+svd). - Make
_patch_volumesparallel-correct by computing the lumped P1 mass diagonal via FE assembly (M·1) when running under MPI. - Add a design note documenting the investigation and fix.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/underworld3/meshing/smoothing.py |
Gates MUMPS usage under MPI, adjusts GAMG coarse options, and fixes parallel patch-volume assembly for equidistribution movers. |
docs/developer/design/parallel-repeated-solve-corruption.md |
Documents the root-cause investigation and the mover-local mitigation strategy. |
Comments suppressed due to low confidence (2)
src/underworld3/meshing/smoothing.py:1414
- In parallel (
uw.mpi.size > 1), normalisingpatchwith the localnp.mean(patch)makes the scaling partition-dependent (each rank will use a different normalisation constant). Sincepatchis then used to definevol_fieldand to formb = rho_t * patch, this can introduce MPI-layout-dependent results. Prefer a partition-invariant normalisation (global mean over the MeshVariable’s global Vec) when running under MPI, while keeping the current numpy mean in serial for bit-identical behaviour.
patch = _patch_volumes(tris, old_coords, n_verts, vol_field)
patch /= float(np.mean(patch))
else:
patch = np.ones(n_verts, dtype=np.double)
_va = vol_field.array
_va[...] = patch.reshape(_va.shape)
src/underworld3/meshing/smoothing.py:1761
patch_meanis currently computed as an average of per-rank means (allreduce(mean)/size), which is not the mean “over the domain” unless every rank owns the same number of vertices (and it can also be sensitive to ghost duplication depending on the local array layout). Sincepatch_meanis used to normalise bothvol_fieldand the source term, this can make results depend on the partitioning. Prefer using the MeshVariable’s global mean under MPI (viavol_field.mean()), while keeping the numpy mean in serial for bit-identical behaviour.
patch = _patch_volumes(tris, old_coords, n_verts, vol_field)
# Normalise so the mean over the domain is the cell mean.
patch_mean = float(np.mean(patch))
if uw.mpi.size > 1:
patch_mean = uw.mpi.comm.allreduce(patch_mean) / uw.mpi.size
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+895
to
+897
| if uw.mpi.size > 1: | ||
| _use_iterative_solver(solver, singular=singular, elliptic=elliptic) | ||
| return |
Comment on lines
+5
to
+7
| (lagged MUMPS LU)** path the movers wire in, *not* the Poisson solve, the DM, or | ||
| singularity. The UW3 **default GMRES+GAMG solver is clean at np=5** (10/10, even | ||
| for the singular `constant_nullspace` case). Fix is mover-local + low-risk. |
| @@ -0,0 +1,281 @@ | |||
| # Parallel repeated-FE-solve heap corruption (np ≥ 5) | |||
Running the adaptive convection harness at np=4 deadlocked: mesh_metric_mismatch computed the mesh/metric misalignment with np.corrcoef over rank-LOCAL cells, so each rank got a different value straddling skip_threshold (0.34/0.91/0.77/0.45) → ranks disagreed on whether to remesh → the collective mover deadlocked (some ranks entered it, others skipped). Fix (meshing/smoothing.py): - mesh_metric_mismatch: compute the Pearson alignment from globally-reduced moment sums (Sx,Sy,Sxx,Syy,Sxy,n allreduced) so every rank agrees. Serial is bit-identical to np.corrcoef (the 1/n normalisation cancels in the ratio). - smooth_mesh_interior: OR-reduce the skip/adapt decision so that if ANY rank needs to remesh, ALL ranks remesh (and all skip together otherwise). The mover is collective, so the decision must be unanimous. Rank-0-only logging. Verified: np=4 adaptive convection (8 steps) runs clean — unanimous adapt at step 2, unanimous skip at 4/6/8, valid non-tangled adapted mesh; tier-A 177 passed / 0 failed; serial path unchanged. Underworld development team with AI support from Claude Code
After the mover moves nodes, the driving field is FE-remapped by evaluating the OLD field at the NEW DOF coordinates. This used the LOCAL uw.function.evaluate, but a DOF that moves near a rank-partition boundary lands in a neighbouring rank's subdomain where a local evaluate returns stale/garbage (it does not raise, so the bad value persists and convection amplifies it -> a growing field anomaly localised to the partition seams). Fix: use uw.function.global_evaluate (serial-identical drop-in, maxdiff 0.0; it gathers/resolves off-rank points across ranks): - meshing/_ot_adapt.py: the reference-mesh remap and the adapted-position remap (affects every mesh.OT_adapt(..., fields_to_remap=...) user). - scripts/stagnant_lid_adapt_loop.py: the harness's hand-rolled T remap. Verified: parallel OT_adapt(field) np=4 clean; np=4 convection T smoother at the seams (peak ring jump 0.0113 -> 0.0091) and recovers the correct hotter boundary-layer T; tier-A 177 passed / 0 failed; serial bit-identical. Underworld development team with AI support from Claude Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes parallel mesh adaptation (np>=3) correct for the OT/MA mesh-motion movers. Four parallel-correctness fixes, uncovered in one investigation (the last three surfaced by running the adaptive convection harness at np=4). Serial behaviour is unchanged/bit-identical throughout; tier-A is green (177 passed / 0 failed) at every stage.
1. MUMPS-in-parallel — the heap corruption (root cause)
MUMPS (parallel LU) re-solved in a loop at np>=3 corrupts the heap (probabilistic SEGV/SIGBUS/MPI-deadlock). NOT the singular pure-Neumann solve,
n_components>dim, or DMClone-shared caches (all disproved). The default GMRES+GAMG never touches MUMPS and is clean. The movers crashed because every sub-solve was wired through_use_direct_solver(lagged MUMPS LU), and_use_iterative_solverreintroduced MUMPS as the GAMG coarse solver._use_direct_solver: iterative fallback under MPI; serial keeps lagged MUMPS._use_iterative_solver: GAMG coarse usesredundant+svd(not MUMPS) under MPI._wire: forwardselliptic.2. Parallel-correct
_patch_volumesThe per-vertex dual area (= lumped P1 mass diagonal) is assembled via the FE mass matrix (
DM.createMassMatrix+M.1) in parallel so PETSc does the cross-ranklocalToGlobal(ADD); the hand-rolled local sum under-counted partition-boundary vertices.TODOto switch toDMCreateMassMatrixLumpedonce petsc4py binds it.3. Collective remesh decision
mesh_metric_mismatchcomputed misalignment withnp.corrcoefon rank-local cells → ranks disagreed acrossskip_thresholdand the collective mover deadlocked. Now: alignment from globally-reduced moment sums (serial bit-identical) + the skip/adapt decision OR-reduced — if any rank remeshes, all do.4.
global_evaluatefor the post-adapt field remapThe field remap (evaluate old field at new DOF coords) used the local
uw.function.evaluate; DOFs that move across a rank-partition seam land off-rank, where local evaluate returns stale/garbage → a growing field anomaly at the seams. Fixed touw.function.global_evaluate(serial-identical) in core_ot_adapt.py(reference-mesh and adapted-position remaps — affects allmesh.OT_adaptusers) and the harness.Verification
mesh.OT_adapt(field)crash/hang-free; solvers converge and match serial to ~1e-10.Follow-ups (not in this PR)
pc_factor_mat_solver_type=mumps/pc_type=lure-solved at np>=3 (UW3-wide; this PR covers the movers).Design doc:
docs/developer/design/parallel-repeated-solve-corruption.md.Underworld development team with AI support from Claude Code