Storable hook cookie + leaf seen-tag fix; AGENTS timeout rule; jperl orphan watchdog#644
Open
Storable hook cookie + leaf seen-tag fix; AGENTS timeout rule; jperl orphan watchdog#644
Conversation
Two related byte-vs-UTF-8 mismatches in the native Storable writer that corrupted any blob containing bytes 0x80–0xFF: 1. StorableWriter.tryEmitHook() was calling `cookieSv.toString().getBytes(UTF_8)` on the cookie returned by STORABLE_freeze. The cookie is a binary Storable stream, not text, and re-encoding turned every 0x80..0xFF byte into a 2-byte UTF-8 sequence — breaking the embedded SX_HOOK length prefix and producing wild-looking errors like `SX_HASH keylen 2164260864 out of range` on the receiving side. 2. Storable.freezeImpl() created the result scalar with the default `new RuntimeScalar(String)` constructor (type STRING). The encoded stream is bytes 0..255 — it must be tagged BYTE_STRING so any downstream consumer that converts back to bytes (file I/O, hook re-pack, etc.) doesn't UTF-8-encode the high bytes. Together these fix `jcpan -t DBIx::Class` t/84serialize.t failures in the freeze/thaw_method, freeze/thaw_func and dclone code paths (both go through nested STORABLE_freeze hooks for ResultSet, ResultSource and ResultSourceHandle). Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The native binary writer was missing recordWriteSeen() calls for leaf
scalars (SX_SCALAR / SX_BYTE / SX_INTEGER / SX_DOUBLE / SX_UTF8STR /
SX_LSCALAR / SX_LUTF8STR / SX_UNDEF / SX_SV_*), but the reader's
Scalars.java already calls recordSeen() for every one of them (matching
upstream's SEEN_NN macro in Storable.xs). The mismatch caused write-side
tag IDs to drift behind the reader after the first leaf scalar, so any
subsequent SX_OBJECT backref resolved to the wrong object on thaw.
Surfaced as `jcpan -t DBIx::Class` failures in t/84serialize.t — for
example, after `nfreeze → thaw` of a DBIC resultset the inner
`_attrs->{order_by}->[0]` would resolve to the outer `_attrs` hash,
producing the cryptic `hash passed to _order_by must have exactly one
key (-desc or -asc)` error from SQL::Abstract::Classic.
Fix: emit a fresh seen-tag at the top of writeScalar() so write/read
counters stay in lockstep.
After this fix:
- DBIC t/84serialize.t: 0/115 → 115/115
- DBIC t/52leaks.t: still 11/11
Combined with the prior commit (storable hook-cookie UTF-8 byte-string
fix) this restores `jcpan -t DBIx::Class` to a passing state.
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…me -p`
Adds a new mandatory rule for investigative agents:
- ALL `jperl` / `jcpan` / `prove` invocations that could hang must be
wrapped in `timeout N`, never `/usr/bin/time -p` (which only measures)
and never bare `./jperl …`.
- Explains why: `./jperl` ends with `exec java …`, so when the agent's
bash exits, hung JVMs get reparented to PID 1 and keep running at 100%
CPU forever — there is no SIGHUP propagation and no JVM-side watchdog.
A handful of these orphans silently starves the whole machine.
- Includes WRONG/RIGHT examples and the post-investigation cleanup-check
command (`ps aux | awk '$3>20 {...}'` + `pkill -f "perlonjava-.*\.t"`).
Adds an Incident Log entry for today's PR-#635 work, where this exact
trap caused phantom `t/76joins.t` / `t/96_is_deteministic_value.t`
SIGKILLs in `./jcpan -t DBIx::Class` runs — the symptom looked like a
real DBIx::Class perf regression, but was actually CPU starvation from
~14 orphan JVMs left behind by an earlier investigative agent.
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds a self-watchdog daemon thread in `org.perlonjava.app.cli.Main`
that runs ONLY when env var JPERL_ORPHAN_EXIT is set (auto-set by
`./jcpan` and `./jprove`). The thread caches the initial parent PID
at startup and polls `ProcessHandle.of(...)` every 2s; after two
consecutive misses (~4s) it calls `Runtime.halt(143)`.
This fixes the runaway-orphan trap that has burned us repeatedly
today: when the user `kill -9`s an aborted `./jcpan` or
`./jprove` run, every in-flight sub-jperl test JVM gets reparented
to PID 1 and keeps spinning at 100% CPU, since SIGKILL bypasses
all user-space cleanup. The accumulated orphans then starve any
subsequent `./jcpan -t Module` run, manifesting as random tests
SIGKILL'd at the 300s "no output" deadline. AGENTS.md documents
this as today's incident.
Direct `./jperl your_script.pl` does NOT set the env var, so
user programs are unaffected:
- Real-Perl `exec LIST` semantics ("child survives parent")
still apply: the watchdog only runs in jperl JVMs, never in
sh / non-jperl children spawned via `open(... "$cmd|")` or
`exec`. So a user who calls exec from a long-running script
still gets the standard nohup-style behavior.
- No shutdown hook is installed in the runtime — only the
opt-in watchdog. The earlier draft of this fix (shutdown hook
in RuntimeIO) was removed because it would have killed
pipe-open children on parent System.exit, breaking exec
semantics under the harness.
Verified manually:
$ JPERL_ORPHAN_EXIT=1 ./jperl ... & # parent
parent gets `kill -9`
sub-jperl prints `[jperl] orphaned: parent PID NNN is gone — exiting`
and dies in ~4s.
$ JPERL_ORPHAN_EXIT=1 ./jperl -e \
'exec "sh", "-c", "echo started; sleep 4; echo finished"'
Output: started ... finished # exec'd child runs to completion ✓
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent, ready-to-merge fixes carved out of #635 so they can land without waiting on the still-open DBIx::Class walker investigation.
1. Storable hook cookie + leaf seen-tag drift (commits 1, 2)
PR #621's native Storable encoder had two correctness bugs:
StorableWriter.tryEmitHook(). The cookie is a binary blob, not text — re-encoding turned every 0x80–0xFF byte into a 2-byte sequence. Surface symptom:thaw failed: SX_HASH keylen 2164260864 out of rangewhen a class with aSTORABLE_freezehook (e.g. DBIC'sResultSource) is frozen and thawed.Scalars.javacallsrecordSeen()for every leaf opcode (matching upstream'sSEEN_NN). Tag IDs drifted past the first leaf, so any laterSX_OBJECT(tag=N)resolved to the wrong object on thaw. Surface symptom in DBIC: cryptichash passed to _order_by must have exactly one key (-desc or -asc)—_attrs->{order_by}->[0]resolves to the outer_attrshash because the tag pointed at the wrong slot.After this PR:
t/84serialize.t: 0/115 → 115/115 passt/52leaks.t: was crashing ondclone→ now runs to the existing tests-12-18 leak-tracer state2. AGENTS.md timeout discipline (commit 3)
Adds a mandatory rule documenting today's incident: investigative agents must wrap
./jperl/./jcpan/./jprovecalls intimeout N, NEVER bare./jperl …or/usr/bin/time -p ./jperl …. Without it, hung JVMs survive the agent's exit (reparented to PID 1, no SIGHUP propagation, no JVM-side watchdog) and silently starve the box.3. Subprocess orphan watchdog (commit 4)
Adds an opt-in JVM-side daemon thread that polls the initial parent PID every 2s and self-exits when the parent is gone. Enabled automatically by
./jcpanand./jproveviaJPERL_ORPHAN_EXIT=1. Direct./jperl your_script.plis unaffected (preserves real-Perlexecsemantics — child survives parent).Verified manually:
JPERL_ORPHAN_EXIT=1 ./jperl ...+kill -9of parent: child JVM prints[jperl] orphaned: parent PID NNN is gone — exitingand dies in ~4 s.exec "sh -c 'echo started; sleep 4; echo finished'"from any context: child runs to completion (no premature kill).Test plan
make(unit tests) greent/84serialize.t: 115/115./jperlexec/system semantics preservedConfiguration.java(Gradle-managed, taken from master) and AGENTS.md Incident Log (merged both rows from master + this branch)The DBIx::Class harness
t/52leaks.tschema-detached intermittent that surfaces under this PR (after the storable fixes unblock visibility) remains under investigation and tracked in #635.Generated with Devin
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>