Conversation
fglock
added a commit
that referenced
this pull request
Apr 30, 2026
…me -p`
Adds a new mandatory rule for investigative agents:
- ALL `jperl` / `jcpan` / `prove` invocations that could hang must be
wrapped in `timeout N`, never `/usr/bin/time -p` (which only measures)
and never bare `./jperl …`.
- Explains why: `./jperl` ends with `exec java …`, so when the agent's
bash exits, hung JVMs get reparented to PID 1 and keep running at 100%
CPU forever — there is no SIGHUP propagation and no JVM-side watchdog.
A handful of these orphans silently starves the whole machine.
- Includes WRONG/RIGHT examples and the post-investigation cleanup-check
command (`ps aux | awk '$3>20 {...}'` + `pkill -f "perlonjava-.*\.t"`).
Adds an Incident Log entry for today's PR-#635 work, where this exact
trap caused phantom `t/76joins.t` / `t/96_is_deteministic_value.t`
SIGKILLs in `./jcpan -t DBIx::Class` runs — the symptom looked like a
real DBIx::Class perf regression, but was actually CPU starvation from
~14 orphan JVMs left behind by an earlier investigative agent.
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
718b1d4 to
2b4576b
Compare
5 tasks
Adds an Investigation Plan section to dev/modules/dbix_class.md for the NEW failure mode observed today under `./jcpan -t DBIx::Class`: DBIx::Class::ResultSource::schema(): Unable to perform storage- dependent operations with a detached result source (source 'Artist' is not associated with a schema). at t/52leaks.t line 430 This is distinct from the existing "tests 12-18 leak detection at line 526" entry — that's a leak (objects not getting destroyed), this is the opposite (a schema getting destroyed too eagerly while a child resultset still expects it). Test passes standalone (11/11 in 46s); only fails when ~20+ prior DBIC tests have run through the same harness JVM. Suspected cause: the walker-gate property fix in PR #618 (commit ce8186e) widened DESTROY gating to every storedInPackageGlobal object — under cumulative state pressure, the gate fails to rescue a Schema/ResultSource pair, causing the weak ref from RS → Schema to read as undef. The plan section includes: - exact symptom + reproducer - code path that triggers it - hypothesis - 4-step diagnostic plan (bisect prefix, instrument Java side, reachability check, c4db69e-baseline verification) - what's NOT the cause (parent harness JVM is 99.7% idle in select polling) - "why we can't ship" — DBIx::Class is published as PASS in the CPAN compatibility report Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Confirmed via experiment: the failure is a timing-dependent walker
blind spot in `MortalList.maybeAutoSweep()`.
Diagnostic table added to dev/modules/dbix_class.md:
Mode | t/52leaks.t under harness
------------------------------|---------------------------
default (auto-GC every 5 s) | crashes mid-test:
| "detached result source" at line 430
JPERL_NO_AUTO_GC=1 | runs to completion;
| 14/23 subtests fail at leak-detection
So:
- WITH auto-sweep: walker incorrectly decides the Schema is
unreachable (it isn't — `my $schema = DBICTest->init_schema()`
in the test's top-level scope holds a strong ref). Auto-sweep
clears the Schema's weak refs from each ResultSource → row's
`->result_source->resultset` then dereferences a now-undef weak
back-ref → "detached result source" exception.
- WITHOUT auto-sweep: schema stays alive (so no crash), but the
underlying t/52leaks.t tests 12-18 leak-detection failures
surface — those are the documented "deep refcount inflation"
blockers from the existing plan.
Fix path is narrower than disabling the sweep: fix
ReachabilityWalker so it correctly seeds JVM-stack lexicals as
roots. Currently it only walks from global symbol tables; closures
following captures works but lexicals themselves aren't seeded.
Plan section now includes:
- exact symptom + experiment confirming the timer dependency
- ref-graph diagram of the schema/RS/row chain
- 3-step audit checklist for ReachabilityWalker (lexical seeding,
capture-following, identity matching)
- explicit "don't disable the sweep" note (breaks leak detection)
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds dev/sandbox/walker_blind_spot/ with: - README.md explaining the bug (linking to the full plan in dev/modules/dbix_class.md), what we tried, and concrete next steps for the next investigator. - simple_lexical_repro.t — minimal Schema/ResultSource pair with one weakened back-ref, exercises auto-sweep over 7s. Status of the simple reproducer: passes in both modes (with and without JPERL_NO_AUTO_GC=1). The DBIC failure must depend on a more complex pattern (closure captures, JVM-stack temporaries during DBIC's accessor chain, etc.) that the walker's seeding gates incorrectly exclude. The next investigator needs to either: 1. Add `ReachabilityWalker.sweepWeakRefs()` diagnostic logging to pinpoint which gate drops the schema, or 2. Mirror DBIC's accessor-chain pattern more precisely in the reproducer. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…roducible Today's testing of the schema-detached bug is flaky: - Different victim test on every full DBIC run. - Simple reproducers don't fail (walker handles trivial my-lexicals fine). - Even with explicit Internals::jperl_gc() x 50 the bug doesn't trip. This is intrinsic — the bug only fires when the auto-sweep 5-s timer expires at a precise moment relative to Perl's statement boundaries inside DBIC's accessor chain. Naive standalone reproducers are either too short (no sweep) or too simple (lexical too easy for the walker). Adds a "How to make this reliably reproducible" section to the plan with four pieces of infrastructure: 1. JPERL_FORCE_SWEEP_EVERY_FLUSH=1 — debug env var that fires the auto-sweep on every MortalList.flush() call, bypassing the 5-s throttle and the weakRefsExist gate. Converts the stochastic race into deterministic "sweep here → next access dies". 2. JPERL_WALKER_TRACE=1 — structured log of every weak-ref the sweep clears: target classname + identity, findPathTo() output, snapshot of seeding sources active. The first cleared Schema in the transcript is the bug. 3. Tiered reproducers T1..T6 — graduate from "1 schema + 1 weakened ref" (current simple_lexical_repro.t, passes) up to a DBIC-shape pattern (closures + @_ temporaries + overloaded "" + thousands of unrelated weakened scalars + interleaved dclone). Smallest tier that fails under (1) becomes the unit test. 4. Prefix bisection on the full DBIC suite — find the shortest sequence of test files that triggers a failure under (1)+(2). That sequence is the deterministic harness reproducer. Plan ordering: implement (1)+(2) first (~30 min), then (4) prefix bisection (~1 h), then inspect transcripts to identify the failing seeding gate, fix in ReachabilityWalker, promote smallest failing reproducer to src/test/resources/unit/refcount/walker_blind_spot.t. This gets us off the flaky-repro treadmill we've been stuck on today. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…lker plan
Adds the deterministic-sweep debug knob the
"How to make this reliably reproducible" section of
dev/modules/dbix_class.md committed to needing:
if (System.getenv("JPERL_FORCE_SWEEP_EVERY_FLUSH") != null) {
// bypass weakRefsExist gate AND the 5-s throttle on every
// MortalList.flush() — every Perl statement boundary runs
// a full sweepWeakRefs walk
}
This converts timing-dependent walker bugs (like the DBIC
"detached result source" mid-test crash on t/52leaks.t line 430)
into deterministic "sweep here → next access dies" sequences for
diagnostic work.
Hypothesis testing under this knob disconfirms the earlier
"walker doesn't seed `my $scalar` lexicals" theory:
- `dev/sandbox/walker_blind_spot/lexical_scalar_root_PASSES.t` —
`my $obj = bless` + weakened back-ref + 20× Internals::jperl_gc()
→ PASSES under JPERL_FORCE_SWEEP_EVERY_FLUSH=1.
- `dev/sandbox/walker_blind_spot/dbic_real_pattern_PASSES.t` —
DBIC-shape with schema in global %REGISTRY and a chain replacing
$phantom each iteration → also PASSES.
So the walker DOES correctly seed both `my $scalar` lexicals and
globally-registered schemas. The actual DBIC blind spot is somewhere
else — Moo/MRO, accessor magic, Storable's seen-table, or some other
DBIC-specific structural cycle.
The fix path in dev/modules/dbix_class.md is updated: stop
speculating about which seeding gate; the next investigator should
add `JPERL_WALKER_TRACE=1` instrumentation to
`ReachabilityWalker.sweepWeakRefs()` and capture an actual
DBIC failure to identify the real gate.
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds a "Next steps (concrete, in order)" section to
dev/modules/dbix_class.md so whoever picks up this PR can act
without re-reading the whole investigation history:
Step A — Add JPERL_WALKER_TRACE=1 instrumentation
(env-gated System.err.println in sweepWeakRefs that logs
cleared-target identity + refcount/state + findPathTo
output + seed-stats snapshot)
Step B — Run jcpan -t DBIx::Class with the new trace + the
JPERL_FORCE_SWEEP_EVERY_FLUSH knob already in this PR
Step C — Identify the failing seeding gate from the trace
(3 most-likely candidates listed)
Step D — Promote the smallest reproducer to a unit test
Step E — Verify on full DBIC suite
Step A is small (~20 lines in ReachabilityWalker), Step B is one
command, Step C is the actual diagnosis once we have the trace —
no more speculating about which seeding gate is at fault.
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
3db93f5 to
d036674
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Investigation work on the DBIx::Class
t/52leaks.tschema-detached intermittent failure that surfaces under./jcpan -t DBIx::Classonce #644's storable fixes unblock visibility past the earlyt/84serialize.tcrashes.This PR is diagnostic-and-plan only, no fix yet. The actual code-change PR for the walker fix should land on top once Step A–C of the plan have produced concrete evidence about which seeding gate is missing.
Contents
docs(dbic): document new t/52leaks.t schema-detached harness regressiondocs(dbic): pinpoint root cause of schema-detached t/52leaks.t failuresandbox(walker): walker blind spot reproducer attempts + handoff doc*_PASSES.t) that do NOT trigger the bug — proves the simple lexical-seeding case works correctlydocs(dbic): plan to make t/52leaks.t schema-detached bug reliably reproduciblefix(MortalList):JPERL_FORCE_SWEEP_EVERY_FLUSHdebug knob + corrected walker plandocs(dbic): concrete next-steps plan for the walker investigationWhat we learned today (and what we don't yet know)
✅ The bug is the auto-sweep walker prematurely clearing the weak ref from
ResultSource → Schemawhile DBIC still expects to dereference it.✅
JPERL_NO_AUTO_GC=1removes the crash but exposes 14/23 leak-tracer failures, so disabling the sweep is NOT the fix.✅ The walker DOES seed
my $scalar = $reflexicals (verified — bothdev/sandbox/walker_blind_spot/lexical_scalar_root_PASSES.tanddbic_real_pattern_PASSES.tpass underJPERL_FORCE_SWEEP_EVERY_FLUSH=1).❌ We do NOT yet know which specific seeding gate the walker is missing in DBIC's actual code path — likely tied to Moo / Class::C3::XS / Sub::Quote / accessor-magic / Storable seen-table interaction. Need
JPERL_WALKER_TRACEinstrumentation under a real DBIC failure to find out.Next steps
See
dev/modules/dbix_class.md§ Next steps (concrete, in order).Summary: add
JPERL_WALKER_TRACEtoReachabilityWalker.sweepWeakRefs(), runJPERL_FORCE_SWEEP_EVERY_FLUSH=1 JPERL_WALKER_TRACE=1 ./jcpan -t DBIx::Class, find the firstWALKER_CLEARline with aDBIx::Class::Schema/ResultSource/Storage::DBItarget, the seeding-state in that line tells us which gate to fix.Test plan
make(unit tests) — greenJPERL_FORCE_SWEEP_EVERY_FLUSH=1opt-in: doesn't change behaviour when env var unset (verified — full DBICt/52leaks.truns the same as without the knob).Dependencies
Depends on / shares context with:
Generated with Devin
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Related issue: #646