fix(flow-02): poll node-ready + eRPC /rpc instead of one-shot#479
Closed
bussyjd wants to merge 1 commit into
Closed
fix(flow-02): poll node-ready + eRPC /rpc instead of one-shot#479bussyjd wants to merge 1 commit into
bussyjd wants to merge 1 commit into
Conversation
Two cold-start races surfaced on spark1's smoke run: 1. `obol stack up` returns once the k3s API responds, but the k3d node can take another few seconds to appear in `kubectl get nodes`. Old `run_step_grep "Nodes ready" "Ready" kubectl get nodes` raced that window and FAILed on "No resources found". Switched to `poll_step_grep` with 12×5s = 60s ceiling. Also tightened the pattern to ` Ready ` (surrounding spaces) so the word "NotReady" in the status column does not satisfy the match. 2. eRPC's HTTP listener becomes reachable before its upstream pool has fully resolved every alias. A one-shot GET /rpc moments after pods report Running often returns a partial list missing `base-sepolia`, which then cascades into the chains-OK and JSON-RPC checks. Poll the first eRPC assertion until base-sepolia appears (or 60s elapses); the subsequent assertions then have a stable list to reason about.
190c716 to
b5f7628
Compare
6 tasks
bussyjd
added a commit
that referenced
this pull request
May 13, 2026
…490) Integration branch that takes the release-smoke gate from "broken at flow-11 step 43" to 13/13 PASS on spark1 against the production facilitator (Base Sepolia + x402.gcp.obol.tech). Folds in the in-flight smoke fixes (#476 runner refactor, #477 ERE alternation, #478 wallet check + verifier readiness, #479 flow-02 cold-start polling, #483 sell-inference flag align, #484 frontend digest-pin v0.1.23) plus eight additional root-cause fixes uncovered while driving the gate green: - internal/x402/setup.go EnsureVerifier rewrites image pins in-memory before kubectl apply so OBOL_DEVELOPMENT=true source changes actually reach the cluster - internal/x402/chains.go ResolveChainInfo accepts both legacy aliases and CAIP-2 ids - flows/flow-10-anvil-facilitator.sh drops --prune-history (which was enable-pruning, not retention) and adds --host 0.0.0.0 + cluster-reachability preflight - internal/defaults/defaults.go combo-form image-pin regex now lists longest first - flows/lib.sh paid-RPC support (BASE_SEPOLIA_RPC, ALCHEMY_BASE_SEPOLIA_API_KEY) + Bob top-up preflight + secret scrubbing collapsing paid-RPC URLs to TLD-only - flows/flow-07-sell-verify.sh and flow-08-buy.sh wrap 402-body fetch in 12x5s retry to absorb first-request flake on freshly-deployed verifier - cmd/obol/network.go redactRPCURL host-anchored against parsed URL (CodeQL fix, no unanchored regex) - internal/x402/verifier.go drops debug log that leaked user-controlled path (CodeQL log-injection fix) - .agents/skills/obol-stack-dev rebuilt: 1750 -> 882 lines, 8-row symptom->fix table indexed at the top of SKILL.md - CLAUDE.md refreshed: stale CLI surface, added six release-smoke pitfalls, generalized personal-path Related Codebases Validated: RELEASE_SMOKE_INCLUDE_OBOL=true RELEASE_SMOKE_INCLUDE_OBOL_FORK=true bash flows/release-smoke.sh on spark1 against commit 4082961 (and reverified on each subsequent commit) -> 13/13 PASS, RC=0, "Release smoke passed". Full retrospective: plans/release-smoke-hardening-20260513.md. Closes #476 #477 #478 #479 #483 #484.
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two cold-start races surfaced on spark1's smoke run:
kubectl get nodes. The oldrun_step_grep "Nodes ready" "Ready" kubectl get nodesraced that window and FAILed on "No resources found". Switched topoll_step_grepwith 12×5s ceiling. Also tightened the pattern toReady(surrounding spaces) so the word "NotReady" in the status column does not satisfy the match./rpcmoments after pods report Running often returns a partial list missingbase-sepolia, which then cascades into the chains-OK and JSON-RPC checks. Poll the first eRPC assertion until base-sepolia appears (or 60s elapses); the subsequent assertions then have a stable list to reason about.Test plan
bash -n flows/flow-02-stack-init-up.shclean