Skip to content

fix(flows): stabilize wallet check + x402-verifier readiness#478

Closed
bussyjd wants to merge 1 commit into
mainfrom
fix/release-smoke-wallet-and-verifier-readiness
Closed

fix(flows): stabilize wallet check + x402-verifier readiness#478
bussyjd wants to merge 1 commit into
mainfrom
fix/release-smoke-wallet-and-verifier-readiness

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 12, 2026

Summary

Two pre-existing flow regressions surfaced by the spark1 smoke run:

flow-08 wallet invariant

Drop the hard "agent wallet == deterministic Bob" assertion. Bob pre-seeding is the flow-11/13/14 dual-stack pattern; single-stack flow-08 uses whatever wallet obol agent init generated, and every downstream funding/signing step already uses \$AGENT_WALLET directly. Keep BOB_WALLET derivation when REMOTE_SIGNER_PRIVATE_KEY is set and report a match for transparency, but PASS either way.

flow-07 + flow-10 x402-verifier readiness

Replace the pod-counting loops with kubectl rollout status deployment/x402-verifier. The old loops counted every pod in the x402 namespace, so when stuck old ReplicaSets or the unrelated serviceoffer-controller sat in Pending (real condition observed on spark1 under host load), the loops never converged. rollout status is the authoritative readiness signal and only tracks the latest ReplicaSet. Bumped the timeout to 180s in both spots to absorb image-pull jitter on cold caches.

Test plan

  • bash -n clean on flow-07/08/10
  • Re-run release-smoke on spark1; expect flow-07 step 7 + flow-10 steps 13/14 to converge under load, and flow-08 step 6 to no longer require pre-seeded Bob

flow-08:
  Drop the hard "agent wallet == deterministic Bob" assertion. Bob
  pre-seeding is the flow-11/13/14 dual-stack pattern; single-stack
  flow-08 uses whatever wallet `obol agent init` generated, and every
  downstream funding/signing step already uses $AGENT_WALLET directly.
  Keep BOB_WALLET derivation when REMOTE_SIGNER_PRIVATE_KEY is set and
  report a match for transparency, but PASS either way.

flow-07 + flow-10 x402-verifier readiness:
  Replace pod-counting loops with `kubectl rollout status
  deployment/x402-verifier`. The old loops counted every pod in the
  x402 namespace, so when stuck old ReplicaSets or the unrelated
  serviceoffer-controller sat in Pending (real condition observed on
  spark1 under host load), the loops never converged. rollout status
  is the authoritative readiness signal and only tracks the latest
  ReplicaSet. Bumped the timeout to 180s in both spots to absorb
  image-pull jitter on cold caches.
@bussyjd bussyjd force-pushed the fix/release-smoke-wallet-and-verifier-readiness branch from 849d9b1 to 95dd3f2 Compare May 12, 2026 09:34
bussyjd added a commit that referenced this pull request May 13, 2026
…490)

Integration branch that takes the release-smoke gate from "broken at flow-11 step 43" to 13/13 PASS on spark1 against the production facilitator (Base Sepolia + x402.gcp.obol.tech).

Folds in the in-flight smoke fixes (#476 runner refactor, #477 ERE alternation, #478 wallet check + verifier readiness, #479 flow-02 cold-start polling, #483 sell-inference flag align, #484 frontend digest-pin v0.1.23) plus eight additional root-cause fixes uncovered while driving the gate green:

- internal/x402/setup.go EnsureVerifier rewrites image pins in-memory before kubectl apply so OBOL_DEVELOPMENT=true source changes actually reach the cluster
- internal/x402/chains.go ResolveChainInfo accepts both legacy aliases and CAIP-2 ids
- flows/flow-10-anvil-facilitator.sh drops --prune-history (which was enable-pruning, not retention) and adds --host 0.0.0.0 + cluster-reachability preflight
- internal/defaults/defaults.go combo-form image-pin regex now lists longest first
- flows/lib.sh paid-RPC support (BASE_SEPOLIA_RPC, ALCHEMY_BASE_SEPOLIA_API_KEY) + Bob top-up preflight + secret scrubbing collapsing paid-RPC URLs to TLD-only
- flows/flow-07-sell-verify.sh and flow-08-buy.sh wrap 402-body fetch in 12x5s retry to absorb first-request flake on freshly-deployed verifier
- cmd/obol/network.go redactRPCURL host-anchored against parsed URL (CodeQL fix, no unanchored regex)
- internal/x402/verifier.go drops debug log that leaked user-controlled path (CodeQL log-injection fix)
- .agents/skills/obol-stack-dev rebuilt: 1750 -> 882 lines, 8-row symptom->fix table indexed at the top of SKILL.md
- CLAUDE.md refreshed: stale CLI surface, added six release-smoke pitfalls, generalized personal-path Related Codebases

Validated: RELEASE_SMOKE_INCLUDE_OBOL=true RELEASE_SMOKE_INCLUDE_OBOL_FORK=true bash flows/release-smoke.sh on spark1 against commit 4082961 (and reverified on each subsequent commit) -> 13/13 PASS, RC=0, "Release smoke passed".

Full retrospective: plans/release-smoke-hardening-20260513.md.

Closes #476 #477 #478 #479 #483 #484.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 13, 2026

Superseded by #490 (merged in 6f1f9ed). The integration branch carried this change plus the rest of the in-flight smoke fixes and validated the bundle against a green release-smoke (13/13 PASS, RC=0) on spark1 against the production facilitator. Closing in favor of the merged integration.

@bussyjd bussyjd closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant