Skip to content

fix: implement deep-audit findings for covenant signer#3938

Merged
piotr-roslaniec merged 40 commits into
feat/psbt-covenant-final-project-prfrom
fix/deep-audit-findings
May 23, 2026
Merged

fix: implement deep-audit findings for covenant signer#3938
piotr-roslaniec merged 40 commits into
feat/psbt-covenant-final-project-prfrom
fix/deep-audit-findings

Conversation

@piotr-roslaniec
Copy link
Copy Markdown
Collaborator

Summary

Implements 8 fixes from the deep audit of the covenant signer feature (20 agent passes across 5 analysis areas + cross-provider verification with Gemini and GPT models).

P1 Fixes

  • A31: Auto-derive DataDir from storage.Dir when unset and signer port is non-zero. Without this, the exclusive file lock was silently disabled, allowing two processes to corrupt the job store concurrently.
  • A32: Replace context.Background() with signal.NotifyContext(SIGINT, SIGTERM) in cmd/start.go. The shutdown goroutine (HTTP drain, cancel signing ops, release file lock) was dead code -- it blocked on ctx.Done() which never fired.
  • A33: Unconditionally evict stale byRequestID entries on route-key replacement in Put(), and add route-key holder verification in loadPollJob. Previously, a failed file delete left an orphan that a stale Poll could use to seize the route key from a newer job.

P2 Fixes

  • A34: Clear poisoned route keys on successful Put(). Poison was self-reinforcing at runtime because Submit hit GetByRouteRequest (which returns errPoisonedRouteKey) before reaching Put().
  • A36: Normalize migrationPlanQuoteTrustRoots KeyID at startup, matching the pattern for depositor/custodian roots. Config with accidental whitespace in KeyID caused immediate request rejection.
  • A38: Release file lock if trust root normalization fails after NewStore succeeds, using a deferred cleanup on named error return.

P3 Fixes

  • A35: Trim whitespace from destination.ReservationID in normalizeMigrationDestination, matching normalizeMigrationPlanQuote. Asymmetric trim caused spurious rejection for whitespace-padded IDs.
  • A37: Enforce low-S normalization on signer approval certificate threshold signature. Without this, a certificate holder could submit the high-S variant to create a duplicate job (doubled resource consumption, no fund loss).

Audit methodology

  • 20 agent runs (4 passes x 5 areas: coverage, contract verification, threat model, property testing, integration seams)
  • Cross-provider verification: Gemini 3 Pro Preview (security), Gemini 2.5 Pro (integration), GPT-5.2 (state layer)
  • Multi-agent consolidated review for final severity calibration
  • All 7 Bitcoin P2WSH spending paths verified correct via script execution traces
  • No P0 (fund-loss) vulnerabilities found

Test plan

  • go build ./cmd/... ./pkg/covenantsigner/... ./pkg/tbtc/... compiles cleanly
  • go test -race ./pkg/covenantsigner/... passes
  • Relevant tbtc tests pass (GetWallet, signer approval, payload hash)
  • CI green

@piotr-roslaniec piotr-roslaniec force-pushed the fix/deep-audit-findings branch from 3d55760 to 1b68176 Compare April 14, 2026 08:45
Base automatically changed from fix/audit-findings to fix/review-findings April 16, 2026 04:09
Copy link
Copy Markdown
Member

@lrsaturnino lrsaturnino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging a compile-breaking reference found during a deep review of this PR — see the inline comment on pkg/covenantsigner/store.go:358. Not requesting changes, just bringing it to attention.

Comment thread pkg/covenantsigner/store.go Outdated

s.byRequestID[job.RequestID] = cloned
s.byRouteKey[key] = job.RequestID
delete(s.poisonedRoutes, key)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.poisonedRoutes isn't a declared field on Store — the struct definition (lines 18–25 of this file) declares only handle, mutex, lockFile, byRequestID, and byRouteKey, and no errPoisonedRouteKey sentinel or poison-tracking infrastructure exists anywhere in the PR. Repo-wide git grep -n 'poisonedRoutes\|errPoisonedRouteKey' on origin/fix/deep-audit-findings returns only this one line.

Empirically from the head ref:

  • go build ./pkg/covenantsigner/... → exits 1 with pkg/covenantsigner/store.go:358:11: s.poisonedRoutes undefined (type *Store has no field or method poisonedRoutes)
  • go build ./pkg/tbtc/... and go build ./cmd/... transitively fail (both import pkg/covenantsigner)
  • go test ./pkg/covenantsigner/... → build-failed
  • go vet ./pkg/covenantsigner/... → same undefined-field error

On the base ref (origin/fix/review-findings) all of the above succeed.

Commit f54d4e4d4 (the A34 fix) is a single-line insertion of this delete(...) call with no accompanying struct-field declaration or producer-side poison-setting code — looks like a rebase may have dropped the foundation commit.

Flagging only, not a change request.

Copy link
Copy Markdown
Member

@lrsaturnino lrsaturnino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a double-check: Is this PR intended to merge with fix/review-findings or main?

@piotr-roslaniec
Copy link
Copy Markdown
Collaborator Author

@lrsaturnino It should target main since the previous branch was already merged. This PR is probably out of date, I will look into updating it. Thanks for the review

@piotr-roslaniec piotr-roslaniec changed the base branch from fix/review-findings to main April 17, 2026 10:06
piotr-roslaniec added a commit that referenced this pull request Apr 17, 2026
Adds the poisonedRoutes map field that was referenced in A34 fix
(commit f54d4e4) but never declared in the struct definition.

Without this field, the store fails to compile:
  s.poisonedRoutes undefined (type *Store has no field or method poisonedRoutes)

Fixes compile error reported in PR #3938 review comment.
@piotr-roslaniec piotr-roslaniec force-pushed the fix/deep-audit-findings branch from 1b68176 to 45428c9 Compare April 17, 2026 10:24
@piotr-roslaniec piotr-roslaniec changed the base branch from main to feat/psbt-covenant-final-project-pr May 23, 2026 10:30
piotr-roslaniec and others added 20 commits May 23, 2026 10:39
When the wallet registry is unavailable during fault-isolated GetWallet
calls, signer approval verification now returns actionable errors that
distinguish "registry unavailable" from "genuinely zero values".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ancellation

Replace unbounded context.WithoutCancel in submitHandler with a
service-level context that is cancelled on process shutdown and has
a 5-minute timeout. This prevents threshold signing goroutines from
hanging indefinitely and ensures clean cancellation on container stop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a guard that fails the CI step if the -run regex silently matches
zero tests, which would otherwise pass without actually testing anything.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… context

The test injected a value into the HTTP request context and expected it
to be visible in the engine, but the submit handler deliberately
derives its context from the service context (not the request context)
to survive HTTP disconnects. Fix the test to inject the value into the
service context, which matches the actual design contract.
…me key

A malformed file processed before its valid sibling would poison the
route key, then the valid job would load into byRouteKey but remain
inaccessible via GetByRouteRequest because the poison check ran first.
Clear the poison entry when a valid job is successfully indexed for that
route key.
…ersisted job files

The poison-route mechanism (A22) attempted graceful degradation by
skipping corrupt files and marking their route keys as poisoned. This
had two fundamental issues: (1) an ordering-dependent bug where a valid
job iterated before its malformed sibling left the route permanently
poisoned despite a valid job being loaded, and (2) when the corrupt
file's route key was unrecoverable (binary garbage), no poison was set
and dedupe protection was silently lost.

For a pre-mainnet Bitcoin signing service, fail-closed is the safer
default: ambiguous persisted state is a safety fault, not a recoverable
nuisance. A single corrupt file should block startup rather than risk
duplicate signing jobs.

Changes:
- Store.load() now returns a hard error on unreadable or malformed
  persisted job files instead of skipping them
- Removed poisonedRoutes map, errPoisonedRouteKey sentinel,
  poisonRouteFromPartialJob, SkippedJobFiles, and poison check in
  GetByRouteRequest
- Updated tests to expect hard failure on corrupt files
- Added superseded-job-removal assertions to dedup tests
…e operations

Add #nosec annotations for three findings in store.go:
- G304 on os.OpenFile: lockPath is built from operator config (dataDir)
  plus constants (jobsDirectory, lockFileName), not user input
- G104 on lockFile.Close(): best-effort cleanup after failed flock;
  the lock error is what gets returned
- G104 on store.Close(): best-effort lock release after failed load;
  the load error is what gets returned
…claration

gosec flags the `go func()` declaration line, not the
`context.Background()` call inside the body. Move the annotation to
the correct line so the client-scan CI job passes.
piotr-roslaniec and others added 20 commits May 23, 2026 10:41
Adds the poisonedRoutes map field that was referenced in A34 fix
(commit f54d4e4) but never declared in the struct definition.

Without this field, the store fails to compile:
  s.poisonedRoutes undefined (type *Store has no field or method poisonedRoutes)

Fixes compile error reported in PR #3938 review comment.
P0-1: Add signer approval expiration check via EndBlock field.
     validationOptions gains currentBlock field; at policyIndependentDigest
     true, checks EndBlock expiration and falls through to re-verification.

P0-2: Persist poisoned route markers to disk for recovery across restarts.
     store.go adds poisonedDirectory, MarkPoisoned writes marker files,
     and load() restores state on startup.

P1: Add concurrency limit for in-flight submit/poll operations.
     Service gains maxInFlight field and semaphore channel; slot acquisition
     added in both Submit() and Poll() paths.

P2: Check EndBlock expiration in loadPollJob before digest comparison.
     Uses currentBlockProvider to verify approval has not expired.

P3: Add poll rate limiting alongside existing submit rate limit.
     pollLimiter uses golang.org/x/time/rate at 60/min; handlers updated
     to check Allow() before processing.

Engine: Add CurrentBlockHeightProvider as separate optional interface;
     passiveEngine stub added; WithCurrentBlockProvider refactored to
     auto-detect interface from Engine; nil guard added in Poll().
Prior to this change, WithCurrentBlockProvider silently returned 0 when the
underlying provider returned an error. This caused signer approval expiration
checks to pass incorrectly during RPC failures, treating expired approvals
as valid.

After this change, the provider returns (uint64, error). Both call sites
(loadPollJob and Poll) now propagate errors instead of suppressing them.
Remove redundant inner if block in loadPollJob's expiration check.
The outer && already guarantees job.Request.SignerApproval != nil,
making the inner nil check on EndBlock unreachable after the
short-circuit. Also adds a doc comment to WithMaxInFlight explaining
that n <= 0 disables the concurrency limit entirely.
Remove the MarkPoisoned function, poisonedRoutes map, and related
poisonedDirectory constant. The feature was added in a prior commit
but is never called from any code path, leaving the scaffolding
unused.
…BlockErr field

Extends scriptedEngine to support the SignerApprovalVerifier interface
and inject currentBlockHeight provider errors for comprehensive testing
of error propagation paths.

- Add signerApprovalVerifier field to scriptedEngine
- Add currentBlockErr field to scriptedEngine
- Add VerifySignerApproval method delegating to inner verifier
- VerifySignerApproval returns nil when inner verifier is nil (no-op)
When a signer approval certificate is expired and no verifier is present
in the validation options (Poll flow), return the expiration error
directly instead of falling through to the re-verification path that
requires a signerApprovalVerifier.

This ensures TestServicePollRejectsExpiredCertificate correctly
returns 'signer approval certificate has expired' instead of
'request.signerApproval cannot be verified by this signer deployment'.
Use explicit engine variable and WithCurrentBlockProvider in tests
that need to verify block-height-dependent validation, for
clearer block height control.
…fier interface

scriptedEngine implicitly implemented SignerApprovalVerifier via its
signerApprovalVerifier field and VerifySignerApproval method. This caused
NewService to auto-wire a signerApprovalVerifier on every test service,
triggering the validation guard that requires SignerApproval in the
request whenever a verifier is configured. Tests using baseRequest (no
SignerApproval) then failed before calling the engine, hanging goroutine
synchronization channels indefinitely.

Remove VerifySignerApproval and the signerApprovalVerifier field from
scriptedEngine. Tests that need a signer approval verifier now pass it
explicitly via WithSignerApprovalVerifier, making the intent clear.
golang.org/x/time was marked indirect in go.mod even though
pkg/covenantsigner/server.go imports golang.org/x/time/rate directly for
submit-endpoint rate limiting. Running go mod tidy removes the stale
// indirect annotation so the dependency graph matches the actual import
surface.
…-project-pr

Two pieces dropped when skipping conflicting intermediate commits:

- cancelService() on listener bind error in covenantsigner.NewServer:
  prevents service-context goroutines from leaking when the HTTP listener
  fails to bind during startup. Originally added in the now-skipped
  ef45c15 (which also added a signal handler later removed by 73c8863).

- TestVerifySignerApprovalCertificateRejectsHighSSignature in
  signer_approval_certificate_test.go: defense-in-depth coverage proving
  the verifier rejects mathematically-equivalent high-S signatures (S' =
  N - S) that would otherwise enable signature malleability. Originally
  added by 54dae9b.
@piotr-roslaniec piotr-roslaniec force-pushed the fix/deep-audit-findings branch from 54dae9b to 6af268d Compare May 23, 2026 10:48
@piotr-roslaniec piotr-roslaniec merged commit 558f4bc into feat/psbt-covenant-final-project-pr May 23, 2026
17 checks passed
@piotr-roslaniec piotr-roslaniec deleted the fix/deep-audit-findings branch May 23, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants