Skip to content

feat: add benchmark baseline artifact#3277

Closed
tac0turtle wants to merge 1 commit intomainfrom
marko/issue-3185-benchmark-artifact
Closed

feat: add benchmark baseline artifact#3277
tac0turtle wants to merge 1 commit intomainfrom
marko/issue-3185-benchmark-artifact

Conversation

@tac0turtle
Copy link
Copy Markdown
Contributor

@tac0turtle tac0turtle commented Apr 22, 2026

Overview

Closes #3185

Summary by CodeRabbit

  • Chores

    • Enhanced benchmark infrastructure to emit structured output and automatically upload results as named artifacts for improved tracking and historical record-keeping across multiple workload scenarios.
  • Tests

    • Established baseline benchmark snapshot containing comprehensive performance metrics and system configurations for future performance analysis and regression detection.

@github-actions
Copy link
Copy Markdown
Contributor

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedApr 22, 2026, 2:41 PM

@tac0turtle tac0turtle requested a review from chatton April 22, 2026 14:41
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 22, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

The pull request updates the benchmark workflow to emit structured JSON output files for three benchmark types and automatically upload them as artifacts. A new baseline results JSON file is added containing benchmark schema, provenance metadata, and comprehensive benchmark performance data across multiple test categories.

Changes

Cohort / File(s) Summary
Benchmark Workflow Configuration
.github/workflows/benchmark.yml
Updated three benchmark jobs to output results to JSON files via BENCH_RESULT_OUTPUT environment variables, added BENCH_OBJECTIVE descriptions for each workload, and integrated artifact upload steps (actions/upload-artifact@...) to publish results as named artifacts (erc20-benchmark-results, defi-benchmark-results, state-pressure-benchmark-results).
Baseline Benchmark Data
test/e2e/benchmark/baseline-results.json
New JSON file establishing a benchmark baseline snapshot with schema versioning, provenance tracking (workflow metadata, run IDs, branch/SHA, timestamps), and comprehensive results for EVM contract roundtrip, block executor production, trace collection, and throughput benchmarks (ERC20, DeFi, state pressure) with per-operation and aggregated metrics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Benchmarks bloom like carrots so bright,
JSON treasures stored with all their might,
Artifacts hopping from branch to vault,
Baselines set—no performance fault!
Structured results to compare with care,
A rabbit's speedtest beyond compare! 🏃✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description references the linked issue #3185 but lacks detailed explanation of the implementation, changes made, or why the structure was chosen. Expand description to explain what was added (baseline JSON file structure, workflow changes), why this structure was chosen, and how it serves the regression baseline purpose.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add benchmark baseline artifact' clearly and concisely summarizes the main change, which is adding a benchmark baseline JSON file as an artifact.
Linked Issues check ✅ Passed The PR successfully adds a machine-readable JSON artifact (baseline-results.json) and updates workflow configuration to emit structured benchmark output, directly addressing issue #3185's requirements.
Out of Scope Changes check ✅ Passed All changes are scoped to adding benchmark baseline artifacts: workflow updates to emit JSON output and upload artifacts, plus the new baseline-results.json file with benchmark data.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch marko/issue-3185-benchmark-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.81%. Comparing base (77b39d4) to head (f02f9a3).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3277      +/-   ##
==========================================
- Coverage   62.33%   61.81%   -0.52%     
==========================================
  Files         122      122              
  Lines       12877    16241    +3364     
==========================================
+ Hits         8027    10040    +2013     
- Misses       3969     5315    +1346     
- Partials      881      886       +5     
Flag Coverage Δ
combined 61.81% <ø> (-0.52%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 103-107: The upload-artifact steps that send benchmark JSONs
currently allow missing files and should instead fail the workflow; for each
actions/upload-artifact step that uploads benchmark results (the "Upload
benchmark results" steps that set name: erc20-benchmark-results and the two
analogous benchmark artifact uploads), add the input if-no-files-found: error to
the step's with: block so the action errors when the expected JSON (e.g.,
test/e2e/benchmark/erc20_run_result.json) is missing.

In `@test/e2e/benchmark/baseline-results.json`:
- Around line 253-445: The baseline JSON uses a different summary shape (fields
like ev_node_overhead_pct and seconds_per_gigagas and nested summary) than the
generated runResult produced by test/e2e/benchmark/run_result.go (which emits
top-level metrics and block_range), causing schema validation to fail; update
these throughput entries (erc20_throughput, defi_simulation, state_pressure) to
emit the exact runResult shape: move/rename the current "summary" data into the
runResult "metrics" object, ensure "block_range" is a top-level sibling of
"metrics" (matching runResult struct names), and drop/rename fields to match
runResult field names exactly so regression tooling can validate without a
bespoke mapper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17bd3af1-1482-49e2-bbae-b67e7bcfaeb3

📥 Commits

Reviewing files that changed from the base of the PR and between 3efcf93 and f02f9a3.

📒 Files selected for processing (2)
  • .github/workflows/benchmark.yml
  • test/e2e/benchmark/baseline-results.json

Comment on lines +103 to +107
- name: Upload benchmark results
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: erc20-benchmark-results
path: test/e2e/benchmark/erc20_run_result.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the benchmark artifact upload steps fail when expected JSON outputs are absent.
rg -n -C 4 'name: (erc20|defi|state-pressure)-benchmark-results|if-no-files-found' .github/workflows/benchmark.yml

Repository: evstack/ev-node

Length of output: 1339


🌐 Web query:

actions/upload-artifact v7.0.1 if-no-files-found parameter documentation

💡 Result:

The if-no-files-found parameter for actions/upload-artifact v7.0.1 controls the behavior when no files match the provided path glob. Description from action.yml: "The desired behavior if no files are found using the provided path. Available Options: warn: Output a warning but do not fail the action, error: Fail the action with an error message, ignore: Do not output any warnings or errors, the action does not fail" Default: 'warn' Example usage: - uses: actions/upload-artifact@v7 with: name: my-artifact path: path/to/artifact/ if-no-files-found: error This parameter has remained unchanged since its introduction (around v2), through v7.0.0 and the patch release v7.0.1 (which only updated README examples and dependencies, no action.yml changes).

Citations:


Add if-no-files-found: error to make benchmark artifact uploads fail when expected JSON files are missing.

Lines 103-107, 131-135, and 160-164 upload benchmark JSON artifacts, but missing files currently won't fail the workflow. The actions/upload-artifact action defaults to warn behavior when files are missing. Since emitRunResult only logs write failures, the benchmarks can pass without producing the required artifacts. Add if-no-files-found: error to all three upload-artifact steps to enforce the artifact contract.

Proposed workflow hardening
       - name: Upload benchmark results
         uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
         with:
           name: erc20-benchmark-results
           path: test/e2e/benchmark/erc20_run_result.json
+          if-no-files-found: error
@@
       - name: Upload benchmark results
         uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
         with:
           name: defi-benchmark-results
           path: test/e2e/benchmark/defi_run_result.json
+          if-no-files-found: error
@@
       - name: Upload benchmark results
         uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
         with:
           name: state-pressure-benchmark-results
           path: test/e2e/benchmark/state_pressure_run_result.json
+          if-no-files-found: error
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml around lines 103 - 107, The upload-artifact
steps that send benchmark JSONs currently allow missing files and should instead
fail the workflow; for each actions/upload-artifact step that uploads benchmark
results (the "Upload benchmark results" steps that set name:
erc20-benchmark-results and the two analogous benchmark artifact uploads), add
the input if-no-files-found: error to the step's with: block so the action
errors when the expected JSON (e.g., test/e2e/benchmark/erc20_run_result.json)
is missing.

Comment on lines +253 to +445
"erc20_throughput": {
"name": "ERC20 Throughput Benchmark",
"kind": "throughput_summary",
"source": {
"job_name": "ERC20 Throughput Benchmark",
"job_id": 72488470294,
"source_type": "workflow_job_log_summary"
},
"config": {
"service_name": "ev-node-erc20",
"scenario": "ERC20TX",
"block_time": "100ms",
"slot_duration": "250ms",
"gas_limit": "",
"scrape_interval": "1s",
"num_spammers": 4,
"count_per_spammer": 5000,
"target_total_transactions": 20000,
"throughput_per_spammer_tps": 200,
"warmup_txs": 200,
"gas_units_to_burn": 1000000,
"max_wallets": 500,
"max_pending": 50000,
"rebroadcast": 0,
"base_fee": 20,
"tip_fee": 2,
"wait_timeout": "10m0s"
},
"summary": {
"block_range": {
"start": 47,
"end": 206,
"total": 160,
"non_empty": 71
},
"non_empty_ratio_pct": 44.4,
"block_intervals": {
"avg_ms": 138,
"p50_ms": 0,
"p99_ms": 1000,
"max_ms": 1000
},
"gas_per_non_empty_block": {
"avg": 11736445,
"p50": 7335324,
"p99": 73866396
},
"tx_per_non_empty_block": {
"avg": 276.3,
"p50": 155,
"p99": 1829
},
"throughput": {
"mgas_per_sec": 43.86,
"tps": 1032.5,
"steady_state_sec": 19,
"wall_clock_sec": 22.405
},
"ev_node_overhead_pct": 2.1,
"ev_reth_ggas_per_sec": 0.197,
"produce_block_avg_ms": 97,
"seconds_per_gigagas": 22.8013
}
},
"defi_simulation": {
"name": "DeFi Simulation Benchmark",
"kind": "throughput_summary",
"source": {
"job_name": "DeFi Simulation Benchmark",
"job_id": 72488470322,
"source_type": "workflow_job_log_summary"
},
"config": {
"service_name": "ev-node-defi",
"scenario": "UniswapSwaps",
"pair_count": 1,
"block_time": "100ms",
"slot_duration": "250ms",
"gas_limit": "",
"scrape_interval": "1s",
"num_spammers": 4,
"count_per_spammer": 5000,
"target_total_transactions": 20000,
"throughput_per_spammer_tps": 200,
"warmup_txs": 200,
"gas_units_to_burn": 1000000,
"max_wallets": 500,
"max_pending": 50000,
"rebroadcast": 0,
"base_fee": 20,
"tip_fee": 2,
"wait_timeout": "10m0s"
},
"summary": {
"block_range": {
"start": 208,
"end": 397,
"total": 190,
"non_empty": 147
},
"non_empty_ratio_pct": 77.4,
"block_intervals": {
"avg_ms": 180,
"p50_ms": 0,
"p99_ms": 1000,
"max_ms": 1000
},
"gas_per_non_empty_block": {
"avg": 7099944,
"p50": 3381760,
"p99": 39062140
},
"tx_per_non_empty_block": {
"avg": 146.6,
"p50": 76,
"p99": 752
},
"throughput": {
"mgas_per_sec": 32.62,
"tps": 673.5,
"steady_state_sec": 32,
"wall_clock_sec": 34.205
},
"ev_node_overhead_pct": 2.3,
"ev_reth_ggas_per_sec": 0.119,
"produce_block_avg_ms": 152,
"seconds_per_gigagas": 30.6604
}
},
"state_pressure": {
"name": "State Pressure Benchmark",
"kind": "throughput_summary",
"source": {
"job_name": "State Pressure Benchmark",
"job_id": 72488470291,
"source_type": "workflow_job_log_summary"
},
"config": {
"service_name": "ev-node-state-pressure",
"scenario": "StorageSpam",
"block_time": "100ms",
"slot_duration": "250ms",
"gas_limit": "",
"scrape_interval": "1s",
"num_spammers": 4,
"count_per_spammer": 5000,
"target_total_transactions": 20000,
"throughput_per_spammer_tps": 200,
"warmup_txs": 200,
"gas_units_to_burn": 1000000,
"max_wallets": 500,
"max_pending": 50000,
"rebroadcast": 0,
"base_fee": 20,
"tip_fee": 2,
"wait_timeout": "10m0s"
},
"summary": {
"block_range": {
"start": 155,
"end": 223,
"total": 69,
"non_empty": 50
},
"non_empty_ratio_pct": 72.5,
"block_intervals": {
"avg_ms": 426,
"p50_ms": 0,
"p99_ms": 1330,
"max_ms": 2000
},
"gas_per_non_empty_block": {
"avg": 376732449,
"p50": 254408064,
"p99": 1179379427
},
"tx_per_non_empty_block": {
"avg": 361.3,
"p50": 244,
"p99": 1131
},
"throughput": {
"mgas_per_sec": 724.49,
"tps": 694.8,
"steady_state_sec": 26,
"wall_clock_sec": 28.204
},
"ev_node_overhead_pct": 2.1,
"ev_reth_ggas_per_sec": 2.193,
"produce_block_avg_ms": 463,
"seconds_per_gigagas": 1.3803
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the generated run-result schema for throughput baselines.

The workflow writes ERC20/DeFi/state-pressure artifacts via test/e2e/benchmark/run_result.go, which serializes top-level metrics and block_range. These baseline sections use a different summary shape and renamed fields like ev_node_overhead_pct / seconds_per_gigagas, so regression tooling will need bespoke mapping and direct schema validation against generated artifacts will fail.

Please store the generated runResult shape for each throughput benchmark, or rename/restructure these sections to match it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/baseline-results.json` around lines 253 - 445, The
baseline JSON uses a different summary shape (fields like ev_node_overhead_pct
and seconds_per_gigagas and nested summary) than the generated runResult
produced by test/e2e/benchmark/run_result.go (which emits top-level metrics and
block_range), causing schema validation to fail; update these throughput entries
(erc20_throughput, defi_simulation, state_pressure) to emit the exact runResult
shape: move/rename the current "summary" data into the runResult "metrics"
object, ensure "block_range" is a top-level sibling of "metrics" (matching
runResult struct names), and drop/rename fields to match runResult field names
exactly so regression tooling can validate without a bespoke mapper.

@chatton
Copy link
Copy Markdown
Contributor

chatton commented Apr 23, 2026

I don't think we need this one, this is actually those metrics that just added a lot of noise and weren't really reliable on the hardware used for the GH runners. There is a baseline file added for configurations https://github.com/01builders/infra/blob/7d6267d1b5e082093be8ce4f4c1e004e5d0a3aa0/ansible/roles/benchmarking/templates/matrices/baseline.json#L4

and all results will get uploaded here https://console.hetzner.com/projects/2037634/buckets/10387762/overview

I think we can close the linked the issue.

@tac0turtle tac0turtle closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add benchmark results artifact (JSON)

2 participants