fix: surface per-block execution timeouts instead of silently swallowing them by YaroShkvorets · Pull Request #790 · streamingfast/substreams

YaroShkvorets · 2026-05-27T20:30:41Z

Summary

Since 5b3f59b2 if block execution times out (like with non-deterministic eth_call call loop), the substreams engine quietly swallows that timeout error and skips the block as if it doesn't have any output.
This PR should fix it.

Symptom

Running in development mode against tier1 node with --substreams-block-execution-timeout=90s:

❯ substreams run https://github.com/pinax-network/substreams-evm/releases/download/evm-balances-v0.3.3/evm-clickhouse-balances-v0.3.3.spkg db_out -e hyperevm.substreams.pinax.network:443 -s 35129899 -t +2 --limit-processed-blocks 0 --noop-mode
2026-05-27T16:43:48.966-0400 WARN (substreams) noop-mode used without production-mode: server will execute in development mode without sending the data, this is probably not what you want
Connecting...


----------- BLOCK #35,129,899 (8c85da1b36f703305b32e4d5128b7582c749c3b22decac5b4e4359ddde13d8df) age=307h6m26.769813s ---------------
----------- BLOCK #35,129,900 (7fe17e72c0412fb100795cfc9de4567949755e085c635354ee3d873a758cff1d) age=307h8m1.02479s ---------------
📊 Usage Report
 • Egress Bytes (uncompressed): 638 B
 • Processed Blocks: 1 blocks
 • Received Blocks: 2 blocks
Completed successfully

Tier1 logs flood with retrying RPCCall on non-deterministic RPC call error for ~90s on a stuck eth_call.
After 90s, INFO (rpc-cache) stopping rpc calls here, context is canceled fires (rpccalls.go:467) — proving the per-block ctx hit its deadline.
1 second later, the request successfully ends with substreams request stats ... "error": "". The client sees Completed successfully. Egress stats show Processed Blocks: 1, Received Blocks: 2 — one block silently dropped.

Root cause

Two combined bugs in pipeline/process_block.go:

1. Dead code in recoverExecutionPanic:

// If recovering from context cancel, simply return the execution error as-is
if ctx.Err() != nil || errors.Is(recoveredErr, context.Canceled) {
    return executionError
}

// If the context deadline was exceeded, return a deadline exceeded error RPC error
if ctx.Err() == context.DeadlineExceeded {            // ← unreachable
    return connect.NewError(connect.CodeDeadlineExceeded, ...)
}

The catch-all ctx.Err() != nil matches DeadlineExceeded, so the deadline-specific branch never fires. Function returns executionError (typically nil) → executeModules returns nil → block silently skipped.

2. Broken error chain in Pipeline.execute's re-panic:

panic(fmt.Errorf("unknown error: %s", r))

Uses %s instead of %w, severing the wrap chain. Even if the dead code were fixed, downstream callers couldn't errors.Is(err, context.DeadlineExceeded) on the result.

Path through the code (for wasmtime)

Per-block ctx fires at 90s.
rpccalls.go:476 returns "timeout while doing eth_call ... cause: %w" wrapping context.DeadlineExceeded.
wasm/wasmtime/instance.go:49 panic(fmt.Errorf("running wasm extension ...: %w", err)).
wasmtime propagates the panic as a Go panic through entrypoint.Call() (no recover in wasm/wasmtime/module.go).
Propagates through wasmCall → mapexec.run → RunModule to Pipeline.execute's deferred recover.
Doesn't match ErrWasmDeterministicExec / ErrFoundationalStoreCanceled → re-panic with broken %s wrap.
Caught by executeModules's deferred recover → recoverExecutionPanic(ctx, nil, ...).
ctx.Err() != nil (DeadlineExceeded) → returns nil.
Pipeline marches on. Once block.Number > stopBlockNum, handleStepNew returns io.EOF → stream "succeeds".

Wazero is unaffected: it converts host-function panics to error returns at f.Call(), so the recover path isn't entered.

Fix

recoverExecutionPanic: check errors.Is(ctx.Err(), context.DeadlineExceeded) before the catch-all so deadline-exceeded gets the intended CodeDeadlineExceeded connect error.
Pipeline.execute's re-panic: wrap with %w so the chain is preserved for any callers that inspect it.

Bug introduced

Commit 5b3f59b (2025-10-21, "Improved panic handling and make clearer the intent when throwing errors in the WASM handlers") added ctx.Err() != nil || to a previously-correct condition. Present in v1.17.0 and every release since.

Test plan

go vet ./pipeline/... clean
go test -short ./pipeline/... passes
Manual repro: stuck eth_call on a non-deterministic error (e.g. Reth StackUnderflow) with --substreams-block-execution-timeout=90s. Before this PR: client sees Completed successfully with Processed: N-1 / Received: N. After: client sees DeadlineExceeded gRPC error with execution timed out at block ....

🤖 Generated with Claude Code

…ing them When a WASM host function (e.g. eth_call) panics in wasmtime because the per-block execution context's deadline has fired, the panic propagates up to Pipeline.execute's deferred recover, which re-panics with "unknown error: %s" (chain broken) up to executeModules's deferred recover, which calls recoverExecutionPanic. recoverExecutionPanic had two bugs that combined to silently drop the offending block: 1. The DeadlineExceeded handler was dead code: the catch-all `if ctx.Err() != nil || errors.Is(recoveredErr, context.Canceled)` above it matched DeadlineExceeded too, so the deadline-specific branch below was never reached. The function would return `executionError` (typically nil), making executeModules return nil and the pipeline move on to the next block as if nothing happened. 2. Pipeline.execute's re-panic used `%s` instead of `%w` when wrapping the recovered error, severing the wrap chain so callers couldn't `errors.Is(err, context.DeadlineExceeded)` even when the swallow condition was avoided. Fix: - Check `errors.Is(ctx.Err(), context.DeadlineExceeded)` first so the intended `CodeDeadlineExceeded` connect error is returned. The catch-all for Canceled remains below. - Wrap the recovered error with `%w` on re-panic so the deadline information survives all the way to the gRPC error mapper. Bug introduced in commit 5b3f59b (2025-10-21, "Improved panic handling and make clearer the intent when throwing errors in the WASM handlers"), present in v1.17.0 onward. Affects wasmtime (default runtime for Rust-built spkgs); wazero is unaffected because it converts host panics to error returns at f.Call(), bypassing the recover path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Five focused unit tests covering each branch of recoverExecutionPanic. The deadline-exceeded test asserts: - result is non-nil (catches the silent-swallow regression directly) - result is a connect.Error with CodeDeadlineExceeded - the original context.DeadlineExceeded survives the wrap chain so callers can errors.Is() on it - the block ref is in the message Verified: this test fails on the pre-fix code with the exact assertion "deadline-exceeded panic must not be silently swallowed", and passes with the fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

sduchesneau

LGTM! thanks

YaroShkvorets and others added 2 commits May 27, 2026 16:29

YaroShkvorets marked this pull request as ready for review May 27, 2026 20:49

sduchesneau approved these changes May 28, 2026

View reviewed changes

sduchesneau merged commit 5a7f57b into streamingfast:develop May 28, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: surface per-block execution timeouts instead of silently swallowing them#790

fix: surface per-block execution timeouts instead of silently swallowing them#790
sduchesneau merged 2 commits into
streamingfast:developfrom
YaroShkvorets:fix/swallowed-deadline-exceeded

YaroShkvorets commented May 27, 2026 •

edited

Loading

Uh oh!

sduchesneau left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YaroShkvorets commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Symptom

Root cause

Path through the code (for wasmtime)

Fix

Bug introduced

Test plan

Uh oh!

sduchesneau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YaroShkvorets commented May 27, 2026 •

edited

Loading