feat: add benchmark baseline artifact by tac0turtle · Pull Request #3277 · evstack/ev-node

tac0turtle · 2026-04-22T14:41:11Z

Overview

Summary by CodeRabbit

Chores
- Enhanced benchmark infrastructure to emit structured output and automatically upload results as named artifacts for improved tracking and historical record-keeping across multiple workload scenarios.
Tests
- Established baseline benchmark snapshot containing comprehensive performance metrics and system configurations for future performance analysis and regression detection.

github-actions · 2026-04-22T14:41:24Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Apr 22, 2026, 2:41 PM

claude · 2026-04-22T14:41:31Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

coderabbitai · 2026-04-22T14:42:32Z

📝 Walkthrough

Walkthrough

The pull request updates the benchmark workflow to emit structured JSON output files for three benchmark types and automatically upload them as artifacts. A new baseline results JSON file is added containing benchmark schema, provenance metadata, and comprehensive benchmark performance data across multiple test categories.

Changes

Cohort / File(s)	Summary
Benchmark Workflow Configuration `.github/workflows/benchmark.yml`	Updated three benchmark jobs to output results to JSON files via `BENCH_RESULT_OUTPUT` environment variables, added `BENCH_OBJECTIVE` descriptions for each workload, and integrated artifact upload steps (`actions/upload-artifact@...`) to publish results as named artifacts (`erc20-benchmark-results`, `defi-benchmark-results`, `state-pressure-benchmark-results`).
Baseline Benchmark Data `test/e2e/benchmark/baseline-results.json`	New JSON file establishing a benchmark baseline snapshot with schema versioning, provenance tracking (workflow metadata, run IDs, branch/SHA, timestamps), and comprehensive results for EVM contract roundtrip, block executor production, trace collection, and throughput benchmarks (ERC20, DeFi, state pressure) with per-operation and aggregated metrics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Benchmarks bloom like carrots so bright,
JSON treasures stored with all their might,
Artifacts hopping from branch to vault,
Baselines set—no performance fault!
Structured results to compare with care,
A rabbit's speedtest beyond compare! 🏃✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description references the linked issue `#3185` but lacks detailed explanation of the implementation, changes made, or why the structure was chosen.	Expand description to explain what was added (baseline JSON file structure, workflow changes), why this structure was chosen, and how it serves the regression baseline purpose.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add benchmark baseline artifact' clearly and concisely summarizes the main change, which is adding a benchmark baseline JSON file as an artifact.
Linked Issues check	✅ Passed	The PR successfully adds a machine-readable JSON artifact (baseline-results.json) and updates workflow configuration to emit structured benchmark output, directly addressing issue `#3185`'s requirements.
Out of Scope Changes check	✅ Passed	All changes are scoped to adding benchmark baseline artifacts: workflow updates to emit JSON output and upload artifacts, plus the new baseline-results.json file with benchmark data.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch marko/issue-3185-benchmark-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-22T14:45:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.81%. Comparing base (77b39d4) to head (f02f9a3).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3277      +/-   ##
==========================================
- Coverage   62.33%   61.81%   -0.52%     
==========================================
  Files         122      122              
  Lines       12877    16241    +3364     
==========================================
+ Hits         8027    10040    +2013     
- Misses       3969     5315    +1346     
- Partials      881      886       +5

Flag	Coverage Δ
combined	`61.81% <ø> (-0.52%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 103-107: The upload-artifact steps that send benchmark JSONs
currently allow missing files and should instead fail the workflow; for each
actions/upload-artifact step that uploads benchmark results (the "Upload
benchmark results" steps that set name: erc20-benchmark-results and the two
analogous benchmark artifact uploads), add the input if-no-files-found: error to
the step's with: block so the action errors when the expected JSON (e.g.,
test/e2e/benchmark/erc20_run_result.json) is missing.

In `@test/e2e/benchmark/baseline-results.json`:
- Around line 253-445: The baseline JSON uses a different summary shape (fields
like ev_node_overhead_pct and seconds_per_gigagas and nested summary) than the
generated runResult produced by test/e2e/benchmark/run_result.go (which emits
top-level metrics and block_range), causing schema validation to fail; update
these throughput entries (erc20_throughput, defi_simulation, state_pressure) to
emit the exact runResult shape: move/rename the current "summary" data into the
runResult "metrics" object, ensure "block_range" is a top-level sibling of
"metrics" (matching runResult struct names), and drop/rename fields to match
runResult field names exactly so regression tooling can validate without a
bespoke mapper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17bd3af1-1482-49e2-bbae-b67e7bcfaeb3

📥 Commits

Reviewing files that changed from the base of the PR and between 3efcf93 and f02f9a3.

📒 Files selected for processing (2)

.github/workflows/benchmark.yml
test/e2e/benchmark/baseline-results.json

coderabbitai · 2026-04-22T14:47:01Z

+      - name: Upload benchmark results
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        with:
+          name: erc20-benchmark-results
+          path: test/e2e/benchmark/erc20_run_result.json


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the benchmark artifact upload steps fail when expected JSON outputs are absent. rg -n -C 4 'name: (erc20|defi|state-pressure)-benchmark-results|if-no-files-found' .github/workflows/benchmark.yml

Repository: evstack/ev-node

Length of output: 1339

🌐 Web query:

actions/upload-artifact v7.0.1 if-no-files-found parameter documentation

💡 Result:

The if-no-files-found parameter for actions/upload-artifact v7.0.1 controls the behavior when no files match the provided path glob. Description from action.yml: "The desired behavior if no files are found using the provided path. Available Options: warn: Output a warning but do not fail the action, error: Fail the action with an error message, ignore: Do not output any warnings or errors, the action does not fail" Default: 'warn' Example usage: - uses: actions/upload-artifact@v7 with: name: my-artifact path: path/to/artifact/ if-no-files-found: error This parameter has remained unchanged since its introduction (around v2), through v7.0.0 and the patch release v7.0.1 (which only updated README examples and dependencies, no action.yml changes).

Citations:

1: https://github.com/actions/upload-artifact

2: https://github.com/actions/upload-artifact/blob/main/action.yml

3: https://git.edu.yar.ru/actions/upload-artifact/src/tag/v7.0.0/action.yml

Add if-no-files-found: error to make benchmark artifact uploads fail when expected JSON files are missing.

Lines 103-107, 131-135, and 160-164 upload benchmark JSON artifacts, but missing files currently won't fail the workflow. The actions/upload-artifact action defaults to warn behavior when files are missing. Since emitRunResult only logs write failures, the benchmarks can pass without producing the required artifacts. Add if-no-files-found: error to all three upload-artifact steps to enforce the artifact contract.

Proposed workflow hardening

- name: Upload benchmark results uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: erc20-benchmark-results path: test/e2e/benchmark/erc20_run_result.json + if-no-files-found: error @@ - name: Upload benchmark results uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: defi-benchmark-results path: test/e2e/benchmark/defi_run_result.json + if-no-files-found: error @@ - name: Upload benchmark results uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: state-pressure-benchmark-results path: test/e2e/benchmark/state_pressure_run_result.json + if-no-files-found: error

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml around lines 103 - 107, The upload-artifact steps that send benchmark JSONs currently allow missing files and should instead fail the workflow; for each actions/upload-artifact step that uploads benchmark results (the "Upload benchmark results" steps that set name: erc20-benchmark-results and the two analogous benchmark artifact uploads), add the input if-no-files-found: error to the step's with: block so the action errors when the expected JSON (e.g., test/e2e/benchmark/erc20_run_result.json) is missing.

coderabbitai · 2026-04-22T14:47:01Z

+    "erc20_throughput": {
+      "name": "ERC20 Throughput Benchmark",
+      "kind": "throughput_summary",
+      "source": {
+        "job_name": "ERC20 Throughput Benchmark",
+        "job_id": 72488470294,
+        "source_type": "workflow_job_log_summary"
+      },
+      "config": {
+        "service_name": "ev-node-erc20",
+        "scenario": "ERC20TX",
+        "block_time": "100ms",
+        "slot_duration": "250ms",
+        "gas_limit": "",
+        "scrape_interval": "1s",
+        "num_spammers": 4,
+        "count_per_spammer": 5000,
+        "target_total_transactions": 20000,
+        "throughput_per_spammer_tps": 200,
+        "warmup_txs": 200,
+        "gas_units_to_burn": 1000000,
+        "max_wallets": 500,
+        "max_pending": 50000,
+        "rebroadcast": 0,
+        "base_fee": 20,
+        "tip_fee": 2,
+        "wait_timeout": "10m0s"
+      },
+      "summary": {
+        "block_range": {
+          "start": 47,
+          "end": 206,
+          "total": 160,
+          "non_empty": 71
+        },
+        "non_empty_ratio_pct": 44.4,
+        "block_intervals": {
+          "avg_ms": 138,
+          "p50_ms": 0,
+          "p99_ms": 1000,
+          "max_ms": 1000
+        },
+        "gas_per_non_empty_block": {
+          "avg": 11736445,
+          "p50": 7335324,
+          "p99": 73866396
+        },
+        "tx_per_non_empty_block": {
+          "avg": 276.3,
+          "p50": 155,
+          "p99": 1829
+        },
+        "throughput": {
+          "mgas_per_sec": 43.86,
+          "tps": 1032.5,
+          "steady_state_sec": 19,
+          "wall_clock_sec": 22.405
+        },
+        "ev_node_overhead_pct": 2.1,
+        "ev_reth_ggas_per_sec": 0.197,
+        "produce_block_avg_ms": 97,
+        "seconds_per_gigagas": 22.8013
+      }
+    },
+    "defi_simulation": {
+      "name": "DeFi Simulation Benchmark",
+      "kind": "throughput_summary",
+      "source": {
+        "job_name": "DeFi Simulation Benchmark",
+        "job_id": 72488470322,
+        "source_type": "workflow_job_log_summary"
+      },
+      "config": {
+        "service_name": "ev-node-defi",
+        "scenario": "UniswapSwaps",
+        "pair_count": 1,
+        "block_time": "100ms",
+        "slot_duration": "250ms",
+        "gas_limit": "",
+        "scrape_interval": "1s",
+        "num_spammers": 4,
+        "count_per_spammer": 5000,
+        "target_total_transactions": 20000,
+        "throughput_per_spammer_tps": 200,
+        "warmup_txs": 200,
+        "gas_units_to_burn": 1000000,
+        "max_wallets": 500,
+        "max_pending": 50000,
+        "rebroadcast": 0,
+        "base_fee": 20,
+        "tip_fee": 2,
+        "wait_timeout": "10m0s"
+      },
+      "summary": {
+        "block_range": {
+          "start": 208,
+          "end": 397,
+          "total": 190,
+          "non_empty": 147
+        },
+        "non_empty_ratio_pct": 77.4,
+        "block_intervals": {
+          "avg_ms": 180,
+          "p50_ms": 0,
+          "p99_ms": 1000,
+          "max_ms": 1000
+        },
+        "gas_per_non_empty_block": {
+          "avg": 7099944,
+          "p50": 3381760,
+          "p99": 39062140
+        },
+        "tx_per_non_empty_block": {
+          "avg": 146.6,
+          "p50": 76,
+          "p99": 752
+        },
+        "throughput": {
+          "mgas_per_sec": 32.62,
+          "tps": 673.5,
+          "steady_state_sec": 32,
+          "wall_clock_sec": 34.205
+        },
+        "ev_node_overhead_pct": 2.3,
+        "ev_reth_ggas_per_sec": 0.119,
+        "produce_block_avg_ms": 152,
+        "seconds_per_gigagas": 30.6604
+      }
+    },
+    "state_pressure": {
+      "name": "State Pressure Benchmark",
+      "kind": "throughput_summary",
+      "source": {
+        "job_name": "State Pressure Benchmark",
+        "job_id": 72488470291,
+        "source_type": "workflow_job_log_summary"
+      },
+      "config": {
+        "service_name": "ev-node-state-pressure",
+        "scenario": "StorageSpam",
+        "block_time": "100ms",
+        "slot_duration": "250ms",
+        "gas_limit": "",
+        "scrape_interval": "1s",
+        "num_spammers": 4,
+        "count_per_spammer": 5000,
+        "target_total_transactions": 20000,
+        "throughput_per_spammer_tps": 200,
+        "warmup_txs": 200,
+        "gas_units_to_burn": 1000000,
+        "max_wallets": 500,
+        "max_pending": 50000,
+        "rebroadcast": 0,
+        "base_fee": 20,
+        "tip_fee": 2,
+        "wait_timeout": "10m0s"
+      },
+      "summary": {
+        "block_range": {
+          "start": 155,
+          "end": 223,
+          "total": 69,
+          "non_empty": 50
+        },
+        "non_empty_ratio_pct": 72.5,
+        "block_intervals": {
+          "avg_ms": 426,
+          "p50_ms": 0,
+          "p99_ms": 1330,
+          "max_ms": 2000
+        },
+        "gas_per_non_empty_block": {
+          "avg": 376732449,
+          "p50": 254408064,
+          "p99": 1179379427
+        },
+        "tx_per_non_empty_block": {
+          "avg": 361.3,
+          "p50": 244,
+          "p99": 1131
+        },
+        "throughput": {
+          "mgas_per_sec": 724.49,
+          "tps": 694.8,
+          "steady_state_sec": 26,
+          "wall_clock_sec": 28.204
+        },
+        "ev_node_overhead_pct": 2.1,
+        "ev_reth_ggas_per_sec": 2.193,
+        "produce_block_avg_ms": 463,
+        "seconds_per_gigagas": 1.3803
+      }
+    }


⚠️ Potential issue | 🟠 Major

Use the generated run-result schema for throughput baselines.

The workflow writes ERC20/DeFi/state-pressure artifacts via test/e2e/benchmark/run_result.go, which serializes top-level metrics and block_range. These baseline sections use a different summary shape and renamed fields like ev_node_overhead_pct / seconds_per_gigagas, so regression tooling will need bespoke mapping and direct schema validation against generated artifacts will fail.

Please store the generated runResult shape for each throughput benchmark, or rename/restructure these sections to match it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/baseline-results.json` around lines 253 - 445, The baseline JSON uses a different summary shape (fields like ev_node_overhead_pct and seconds_per_gigagas and nested summary) than the generated runResult produced by test/e2e/benchmark/run_result.go (which emits top-level metrics and block_range), causing schema validation to fail; update these throughput entries (erc20_throughput, defi_simulation, state_pressure) to emit the exact runResult shape: move/rename the current "summary" data into the runResult "metrics" object, ensure "block_range" is a top-level sibling of "metrics" (matching runResult struct names), and drop/rename fields to match runResult field names exactly so regression tooling can validate without a bespoke mapper.

chatton · 2026-04-23T07:25:24Z

I don't think we need this one, this is actually those metrics that just added a lot of noise and weren't really reliable on the hardware used for the GH runners. There is a baseline file added for configurations https://github.com/01builders/infra/blob/7d6267d1b5e082093be8ce4f4c1e004e5d0a3aa0/ansible/roles/benchmarking/templates/matrices/baseline.json#L4

and all results will get uploaded here https://console.hetzner.com/projects/2037634/buckets/10387762/overview

I think we can close the linked the issue.

feat: add benchmark baseline artifact

f02f9a3

github-actions Bot assigned tac0turtle Apr 22, 2026

tac0turtle requested a review from chatton April 22, 2026 14:41

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

tac0turtle closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add benchmark baseline artifact#3277

feat: add benchmark baseline artifact#3277
tac0turtle wants to merge 1 commit intomainfrom
marko/issue-3185-benchmark-artifact

tac0turtle commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

claude Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

chatton commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tac0turtle commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

claude Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatton commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tac0turtle commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

claude Bot commented Apr 22, 2026 •

edited

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading