feat(metrics): add block transaction count and SR set change monitoring by warku123 · Pull Request #6624 · tronprotocol/java-tron

warku123 · 2026-04-01T12:19:12Z

What does this PR do?

Add two new metrics enhancements for better network monitoring:

Block Transaction Count Histogram
- Replace tron:block_empty_total counter with tron:block_transaction_count histogram
- Track transaction count distribution per block with buckets: [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
- Empty blocks can be monitored via le=0.0 bucket
SR Set Change Monitoring
- Add tron:sr_set_change counter to track SR membership changes
- Detect SR additions and removals at each maintenance time interval
- Labels: action (add/remove), witness (SR address)

Changes:

Add overloaded init() method in MetricsHistogram for custom buckets
Add BLOCK_TRANSACTION_COUNT histogram metric
Add SR_SET_CHANGE counter with SR_ADD/SR_REMOVE labels
Implement SR set change detection logic in BlockChainMetricManager
Update PrometheusApiServiceTest with comprehensive test coverage

Why are these changes required?

Transaction histogram provides richer insights than simple counter for network analysis while maintaining empty block detection capability
SR set change monitoring enables tracking of network consensus participant changes for governance analysis

Compared to #6602, this PR provides a cleaner implementation focused solely on metrics enhancement without additional complexity.

This PR has been tested by:

Unit Tests (updated PrometheusApiServiceTest)
Manual Testing

Follow up

N/A

Extra details

Closes [Feature] Add Prometheus metrics for empty blocks and SR set changes #6590
Related to feat(metrics): add block transaction count histogram for empty block monitoring #6602 (alternative cleaner implementation)

…monitoring Replace the dedicated tron:block_empty_total counter with a more comprehensive tron:block_transaction_count histogram that tracks the distribution of transaction counts per block. Changes: - Add overloaded init() method in MetricsHistogram to support custom buckets - Add BLOCK_TRANSACTION_COUNT histogram with buckets [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000] - Record transaction count for all blocks (including empty blocks with txCount=0) - Empty blocks can be queried via bucket le=0.0 - Remove unused BLOCK_EMPTY counter This provides richer insights for network analysis while still supporting empty block monitoring via histogram bucket queries. Closes tronprotocol#6590

Replace duplicated miner string literal with MINER_LABEL constant to fix SonarQube code smell. Relates tronprotocol#6624

- Replace duplicated miner string literal with MINER_LABEL constant in PrometheusApiServiceTest to fix SonarQube code smell - Add //NOSONAR comment for new BLOCK_TRANSACTION_COUNT histogram label name in MetricsHistogram Relates tronprotocol#6624

Extract repeated 'miner' string literal into MINER_LABEL constant in MetricsHistogram to follow DRY principle and fix SonarQube warning. Also update PrometheusApiServiceTest to use MINER_LABEL constant for test assertions. Relates tronprotocol#6624

halibobo1205 · 2026-04-07T06:07:45Z

    init(MetricKeys.Counter.TXS, "tron  txs  info .", "type", "detail");
    init(MetricKeys.Counter.MINER, "tron  miner info .", "miner", "type");
    init(MetricKeys.Counter.BLOCK_FORK, "tron  block fork info .", "type");
+    init(MetricKeys.Counter.SR_SET_CHANGE, "tron sr set change .", "action", "witness");


The witness label uses addresses. Although the Active SR pool is relatively limited (27 slots), as SRs rotate in and out over time, label values will accumulate monotonically — they only grow, never shrink. This is a known Prometheus anti-pattern.

Practical impact assessment:

TRON's SR candidate pool is not infinite, and rotation frequency is low, so it won't explode in the short term

However, as a long-running node metric, months or years of operation could still accumulate a significant number of time series

Possible approaches:

If the address dimension is truly needed, keep the current design, but document this characteristic

Or remove the witness label entirely, keeping only action (add/remove), and output address details via logs instead. Follow the principle that Prometheus should be used for alerting and awareness, while logs are for precise investigation and root-cause analysis.

Thanks for raising the high-cardinality concern — it's a valid Prometheus best practice to watch for.

However, I believe the witness label is acceptable here given TRON's specific constraints:

Bounded candidate pool: The SR candidate pool is finite. Even over years of operation, the total number of distinct addresses that have ever held an SR slot has a practical ceiling — far from the
unbounded growth typical of high-cardinality anti-patterns.

Low rotation frequency: SR set changes only occur at maintenance intervals (every 6 hours), so new label values are introduced very infrequently.

Diagnostic value: The primary purpose of this metric is to identify which SR was added or removed. Dropping the witness label would reduce the metric to just "something changed", requiring operators to
cross-reference logs — losing the at-a-glance visibility in Grafana dashboards that makes this metric useful in the first place.

In summary, the worst-case time series count is roughly candidate_pool_size × 2 (add/remove), which stays well within Prometheus's comfortable operating range.

That said, I agree it's worth documenting this characteristic. Could you suggest what form you'd prefer — a code comment at the metric registration site, a note in the PR description, or something else?

prefer to a code comment at the metric registration site.

@halibobo1205 Added a comment at the SR_SET_CHANGE constant definition in MetricKeys.java (commit 96bf315) explaining the cardinality rationale:

witness label: bounded cardinality -- SR candidate pool is finite, rotation is infrequent (at most once per maintenance interval); kept for at-a-glance SR identification in dashboards rather than requiring log cross-referencing.

- Extract MINER_LABEL to MetricLabels.Histogram.MINER as shared constant - Use int instead of double for block count in tests - Store witness addresses as base58 directly, removing unnecessary hex round-trip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bladehan1

Review: 2 findings on lastActiveWitnesses / lastNextMaintenanceTime state management.

bladehan1 · 2026-04-14T02:11:49Z

  private long failProcessBlockNum = 0;
  @Setter
  private String failProcessBlockReason = "";
+  private final Set<String> lastActiveWitnesses = ConcurrentHashMap.newKeySet();


[P1] Two issues with lastActiveWitnesses and lastNextMaintenanceTime

1) False thread-safety guarantee from ConcurrentHashMap.newKeySet()

applyBlock() is only called from TronNetDelegate.processBlock(), which holds the dbManager lock — a single-threaded path. More importantly, the compound operations in recordSrSetChange() (read → diff → clear → addAll) are not atomic even with a concurrent set. If concurrent access were ever needed, this implementation would silently produce incorrect diffs.

Suggest replacing with a plain HashSet:

// Only accessed from block processing thread private final Set<String> lastActiveWitnesses = new HashSet<>();

2) In-memory state is not rolled back during fork switching

lastActiveWitnesses and lastNextMaintenanceTime are pure in-memory state that does not participate in the revoke/undo mechanism.

During Manager.switchFork():

eraseBlock() rolls back the old chain — DB state (nextMaintenanceTime, activeWitnesses) is reverted via revokingStore.fastPop()

Manager.applyBlock() applies new-chain blocks — DB state is rebuilt

But this internal applyBlock does not go through TronNetDelegate.processBlock(), so metricsService.applyBlock() is never called and the in-memory state is not updated

Scenario — fork across a maintenance boundary:

Old chain passed maintenance: lastNextMaintenanceTime = T1, lastActiveWitnesses = {SR_old}

Fork switches back to a pre-maintenance chain: DB now has nextMaintenanceTime = T0 < T1

On the next block, metrics sees nextMaintenanceTime(T0) != lastNextMaintenanceTime(T1) → triggers SR change detection

But it compares lastActiveWitnesses (stale, from old chain) against the current store (new chain) → phantom SR changes or missed real changes

Suggest adding a rollback guard:

} else if (nextMaintenanceTime != lastNextMaintenanceTime) { if (nextMaintenanceTime < lastNextMaintenanceTime) { // Fork rollback detected — reinitialize instead of diffing lastActiveWitnesses.clear(); lastActiveWitnesses.addAll(...current witnesses...); lastNextMaintenanceTime = nextMaintenanceTime; return; } // normal maintenance transition — diff and record ... }

On second thought, adding a rollback guard here over-complicates the monitoring layer. The core issue is that SR set change detection is business logic monitoring — it depends on chain state transitions (maintenance boundaries, fork switching), which makes it fundamentally unsuitable for an in-process metric that cannot participate in the revoke mechanism.

A simpler and more robust approach would be to move SR set change detection off-chain: an external monitor reads the /metrics endpoint or listens to block events, compares consecutive maintenance periods' active witness lists, and emits its own alerts. This completely decouples monitoring from chain state management and avoids the fork/rollback problem entirely.

For the histogram (BLOCK_TRANSACTION_COUNT), in-chain observation is fine — it's a stateless per-block metric with no accumulated state to go stale. The issue is specific to SR_SET_CHANGE because it maintains cross-block state (lastActiveWitnesses, lastNextMaintenanceTime) that the chain's undo mechanism cannot reach.

Thanks for the thorough analysis! Both points are valid — I've addressed them in the latest commit (7fdb3c5):

1) ConcurrentHashMap.newKeySet() → HashSet

Agreed. Since applyBlock() is only called from TronNetDelegate.processBlock() under synchronized (blockLock), using a concurrent set adds no real safety and gives a misleading impression. Replaced with a plain HashSet with a comment clarifying the single-threaded access pattern.

2) Fork rollback guard

Added a check for nextMaintenanceTime < lastNextMaintenanceTime — since maintenance time only increases monotonically under normal conditions, a decrease is a reliable signal that a fork rollback has occurred. In that case, we reinitialize the in-memory state from the current DB instead of diffing against stale data:

if (nextMaintenanceTime < lastNextMaintenanceTime) { // Fork rollback detected — reinitialize instead of diffing lastActiveWitnesses.clear(); lastActiveWitnesses.addAll(currentWitnesses); } else { recordSrSetChange(currentWitnesses); } lastNextMaintenanceTime = nextMaintenanceTime;

Regarding your second comment about moving SR set change detection off-chain entirely — I understand the concern that cross-block state cannot participate in the revoke mechanism, but I believe the rollback guard above addresses the root issue with minimal complexity. Removing the in-process metric would lose the at-a-glance visibility in Grafana dashboards that makes it useful for operators, and pushing the diff logic to an external monitor adds operational overhead (deploying/maintaining a separate component) for an edge case that is already handled. I'd prefer to keep it in-chain with this guard in place.

- Replace ConcurrentHashMap.newKeySet() with HashSet since applyBlock() is only called from a single-threaded path (synchronized on blockLock) - Add fork rollback detection: when nextMaintenanceTime < lastNextMaintenanceTime, reinitialize in-memory state instead of diffing, preventing phantom SR changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add two new panel sections to the java-tron-server Grafana dashboard to support metrics from tronprotocol/java-tron#6624: - Block Metrics: empty block rate, per-SR empty block count, avg transactions per block, transaction count distribution - SR Set Change: event table and cumulative totals by action Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lvs0075 · 2026-04-14T08:25:09Z

+        // Fork rollback detected — reinitialize instead of diffing
+        lastActiveWitnesses.clear();
+        lastActiveWitnesses.addAll(currentWitnesses);
+      } else {


If a fork rollback does occur, wouldn’t the error message already have been written to the metrics?

@bladehan1 @lvs0075 You are absolutely right: the rollback guard can only re-calibrate the in-memory baseline for subsequent blocks; it cannot retract Prometheus counters that have already been incremented. This is an inherent limitation because in-process metrics do not participate in the chain's revoking mechanism.

That said, this scenario — a fork rollback that crosses a maintenance boundary — is extremely rare in practice because TRON's maintenance interval is 6 hours and deep reorganizations of that magnitude are uncommon on the mainnet.

From an observability perspective, the recorded add/remove events are not necessarily "false positives" either: they reflect real SR set transitions that occurred on the abandoned chain at the time. When a fork rollback does cross a maintenance boundary, the chain genuinely experienced a different SR set during that (later invalidated) period. The metric captures that transient state change, even though the chain eventually switched to another fork.

In other words, SR_SET_CHANGE behaves like an event log of witnessed transitions rather than a guaranteed final-state snapshot. For operators, seeing an add followed by a remove (or vice versa) after a rollback is an indicator that a reorganization crossed the maintenance boundary.

This is a semantic trade-off. If the project prefers metrics to strictly mirror only the final canonical chain state, the only robust solution is to move SR change detection off-chain as @bladehan1 suggested. Otherwise, we can accept this limitation and document it clearly. Please let me know which direction the team prefers.

halibobo1205 · 2026-04-15T03:06:42Z

          MetricLabels.Counter.TXS_SUCCESS, MetricLabels.Counter.TXS_SUCCESS);
    }
+    // Record transaction count distribution for all blocks (including empty blocks)
+    Metrics.histogramObserve(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, txCount,


[SHOULD] If the primary goal is empty block monitoring, a simple Counter or Gauge would suffice. A Histogram is better suited when transaction count distribution analysis is actually needed — it can always be upgraded later.

Actually, we have disscussed this question in the issue #6590, and finally decide to implement Histogram instead of the simple Counter

halibobo1205 · 2026-04-16T03:27:50Z

          MetricLabels.Counter.TXS_SUCCESS, MetricLabels.Counter.TXS_SUCCESS);
    }
+    // Record transaction count distribution for all blocks (including empty blocks)
+    Metrics.histogramObserve(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, txCount,


The SR set change detection logic (reading WitnessScheduleStore, stream mapping, encode58Check encoding, HashSet diff) runs on every block regardless of whether metrics are enabled. The Metrics.enabled() guard inside counterInc()/histogramObserve() only prevents the final write — the upstream computation still executes unconditionally.

Suggest wrapping the entire SR detection block and the histogram observation (including the encode58Check call) with a Metrics.enabled() check to avoid unnecessary computation when metrics are disabled.

Good catch. Wrapped the entire block (histogram observation + SR set change detection) with a Metrics.enabled() guard in commit 7791944. This avoids the DB reads, encode58Check encoding, stream mapping, and HashSet diff on every block when Prometheus is disabled.

Note that the existing metrics in the same method (e.g., MINER_LATENCY histogram + encode58Check at L151-152) have the same issue, but I'll leave those untouched to keep this PR scoped to the new metrics only.

Wrap histogram observation and SR set change detection with Metrics.enabled() guard to avoid unnecessary computation (DB reads, encode58Check, stream mapping, HashSet diff) on every block when metrics are not enabled.

The SR set change detection used to live in BlockChainMetricManager.applyBlock and inferred SR set transitions from nextMaintenanceTime changes plus a diff against a cached HashSet. That approach was indirect (re-derived state already known at the source) and required a fork-rollback guard. Hoist the recording into the consensus layer where the transition is explicit: MaintenanceManager.doMaintenance already computes currentWits and newWits when the active witness set changes, so it can call SrSetChangeMetric.record(currentWits, newWits) directly. The new helper lives in the common module so both consensus and framework can depend on it without inverting the existing module hierarchy. Tests: SrSetChangeMetricTest exercises SR_ADD via the real doMaintenance() flow, plus the no-vote / metrics-disabled / SR_REMOVE branches. PrometheusApiServiceTest drops the obsolete SR assertions.

Sunny6889 · 2026-04-27T08:03:42Z

+      Metrics.counterInc(MetricKeys.Counter.TXS, txCount,
          MetricLabels.Counter.TXS_SUCCESS, MetricLabels.Counter.TXS_SUCCESS);
    }
+    if (Metrics.enabled()) {


I think this if (Metrics.enabled()) { is no long needed, or put it at first line of this function?

Thanks for the review. Two reasons I'd like to keep this guard where it is:

Not redundant — the histogram label StringUtil.encode58Check(address) runs a sha256 checksum, and Java evaluates it before histogramObserve is even entered. The internal Metrics.enabled() check inside observe() only short-circuits the Prometheus call, not the argument evaluation. Wrapping the call site is what actually saves the per-block sha256 work when metrics are disabled.

Can't move to the first line — the earlier part of applyBlock maintains witnessInfo / dupWitnessBlockNum, which are read by getDupWitness() to populate BlockChainInfo. That's non-metric state and must run regardless of Metrics.enabled(). An early return at the top of the function would skip the duplicate-witness bookkeeping.

So the guard is intentionally local to the histogram call.

warku123 added 6 commits March 18, 2026 11:41

Add SR set change and empty block detection

9b80f91

Add nextMaintenanceTime judge

1d2de1a

Empty Block check unit test done

1c0c1b7

SR change unit test

bc87e2e

Modify blank to passed the ci check

39dd708

warku123 changed the title ~~feat(metrics): add block transaction count histogram for empty block monitoring~~ feat(metrics): add block transaction count and SR set change monitoring Apr 1, 2026

warku123 added a commit to warku123/java-tron that referenced this pull request Apr 2, 2026

fix(test): define constant for repeated string literal

4433c52

Replace duplicated miner string literal with MINER_LABEL constant to fix SonarQube code smell. Relates tronprotocol#6624

warku123 force-pushed the metric_modify_clean branch from 4433c52 to 93d0083 Compare April 2, 2026 03:44

warku123 force-pushed the metric_modify_clean branch from 93d0083 to 6315ffc Compare April 2, 2026 03:54

warku123 force-pushed the metric_modify_clean branch from 6315ffc to e92b8e2 Compare April 2, 2026 06:09

Sunny6889 added this to java-tron Apr 6, 2026

kuny0707 requested review from Sunny6889 and halibobo1205 April 6, 2026 08:46

halibobo1205 reviewed Apr 7, 2026

View reviewed changes

Comment thread common/src/main/java/org/tron/common/prometheus/MetricKeys.java

halibobo1205 reviewed Apr 7, 2026

View reviewed changes

Little-Peony reviewed Apr 8, 2026

View reviewed changes

Comment thread framework/src/test/java/org/tron/core/metrics/prometheus/PrometheusApiServiceTest.java Outdated

Little-Peony reviewed Apr 8, 2026

View reviewed changes

Comment thread framework/src/test/java/org/tron/core/metrics/prometheus/PrometheusApiServiceTest.java

Little-Peony reviewed Apr 8, 2026

View reviewed changes

Comment thread framework/src/main/java/org/tron/core/metrics/blockchain/BlockChainMetricManager.java

warku123 requested review from bladehan1, lvs0075, lxcmyf and xxo1shine as code owners April 8, 2026 08:41

halibobo1205 added this to the GreatVoyage-v4.8.2 milestone Apr 8, 2026

fix(metrics): wrap long line to pass checkstyle 100-char limit

ff17a19

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

warku123 requested a review from halibobo1205 April 9, 2026 02:46

bladehan1 reviewed Apr 14, 2026

View reviewed changes

warku123 mentioned this pull request Apr 14, 2026

feat(dashboard): add block transaction count and SR set change panels tronprotocol/tron-docker#118

Open

lvs0075 reviewed Apr 14, 2026

View reviewed changes

warku123 mentioned this pull request Apr 14, 2026

Core Devs Community Call 60 tronprotocol/pm#198

Open

halibobo1205 reviewed Apr 15, 2026

View reviewed changes

halibobo1205 reviewed Apr 16, 2026

View reviewed changes

perf(metrics): skip metrics computation when prometheus is disabled

7791944

Wrap histogram observation and SR set change detection with Metrics.enabled() guard to avoid unnecessary computation (DB reads, encode58Check, stream mapping, HashSet diff) on every block when metrics are not enabled.

lvs0075 mentioned this pull request Apr 16, 2026

feat(metrics): add block transaction count histogram for empty block monitoring #6602

Closed

halibobo1205 assigned halibobo1205, Sunny6889 and xxo1shine and unassigned halibobo1205, Sunny6889 and xxo1shine Apr 21, 2026

warku123 requested a review from halibobo1205 April 22, 2026 08:36

halibobo1205 approved these changes Apr 22, 2026

View reviewed changes

warku123 requested a review from Little-Peony April 27, 2026 03:03

Sunny6889 reviewed Apr 27, 2026

View reviewed changes

Sunny6889 requested a review from halibobo1205 April 27, 2026 08:07

warku123 requested a review from Sunny6889 April 27, 2026 08:51

docs(metrics): add cardinality comment for SR_SET_CHANGE witness label

96bf315

Conversation

warku123 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bladehan1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

warku123 commented Apr 1, 2026 •

edited

Loading