Skip to content

fix(ci): raise AVM check-circuit per-tx timeout to 120s#23749

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout-120s
Draft

fix(ci): raise AVM check-circuit per-tx timeout to 120s#23749
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout-120s

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

Problem

The avm-check-circuit job in run 26703197886 failed on next with exit code 124 (timeout).

The job runs bb-avm avm_check_circuit on every dumped e2e AVM input in parallel, each under a fixed 30s per-tx timeout (yarn-project/end-to-end/bootstrap.sh). Every input passed in 4–6s except e2e_multiple_blobs tx 0x241c8baa…, which was killed at the 30s wall (ran 35s, code: 124), failing the whole job.

Root cause

That tx produces a much larger circuit (~700,560 rows vs. tiny traces for the others). From the run log of the killed job:

04:57:43 Generating trace... (mem: 824 MiB)
04:58:06 Checking circuit... (mem: 3883 MiB)          <- trace generation alone took ~23s
04:58:06 Running check (with skippable) circuit over 700560 rows.
04:58:12 timeout: sending signal TERM to command 'bash'

Simulation + trace generation alone consumed ~23s on the 2-CPU isolation container, leaving the circuit check no room before the 30s deadline. This is exactly the situation the existing in-code WARNING comment anticipated ("transactions could need more CPU and MEM than we allocate by default … they might start timing out"). The 30s value has been unchanged since the feature was introduced (#18747), so this is a heavy tx finally crossing the threshold, not a regression.

Fix

Raise the per-tx timeout from 30s to 120s — ample headroom over the ~35s observed for the heaviest tx while small txs still finish in seconds.

Resources are deliberately left at the default. With up to 64 jobs running in parallel on a 128-CPU host, the containers already use --cpus=2 (≈128 CPUs total); raising --cpus would oversubscribe the runner. A longer timeout is resource-neutral — it only changes the kill deadline, not how much CPU/MEM each run consumes.

The outdated warning comment is updated to describe the actual behavior.


Created by claudebox · group: slackbot

The avm-check-circuit job runs bb-avm avm_check_circuit on every dumped
e2e AVM input under a fixed 30s per-tx timeout. The e2e_multiple_blobs tx
produces a ~700k-row trace whose simulation + trace generation alone takes
~23s on the 2-CPU isolation container, and the subsequent circuit check
pushed the run past 30s (observed 35s, killed with code 124), failing the
whole job while every other input passed in 4-6s.

This is the scenario the existing in-code warning anticipated. Raise the
timeout to 120s to give ample headroom for the heaviest txs. Resources are
left unchanged: with up to 64 jobs in parallel on a 128-CPU host, bumping
--cpus would oversubscribe the runner, and a longer timeout is resource
-neutral since small txs still finish in seconds.
@AztecBot AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant