Skip to content

[Feature] Exclude historical balance DBs from lite snapshot #6597

@halibobo1205

Description

@halibobo1205

Background

Lite nodes retain all state data, while non-state data is limited to the
most recent 65,536 blocks of block and transaction data.

toolkit db lite uses an archiveDbs list to determine which databases
are excluded when splitting a lite node snapshot:

// plugins/.../DbLite.java
private static final List<String> archiveDbs = Arrays.asList(
    BLOCK_DB_NAME,                    // "block"
    BLOCK_INDEX_DB_NAME,              // "block-index"
    TRANS_DB_NAME,                    // "trans"
    TRANSACTION_RET_DB_NAME,          // "transactionRetStore"
    TRANSACTION_HISTORY_DB_NAME);     // "transactionHistoryStore"

account-trace and balance-trace are not in this list, so they are
copied in full into the lite node snapshot. These two databases:

  • are populated only when the source full node runs with
    historyBalanceLookup=true (CLI --history-balance-lookup /
    config storage.balance.history.lookup); off by default
  • back the historical balance query API (getAccountBalance,
    getBlockBalance), which is inherently unavailable on lite nodes
    (the feature has state-tree semantics — it requires continuous
    accumulation from genesis)
  • can grow very large on historyBalanceLookup=true nodes
Database Contents Measured Size
balance-trace Historical balance change records at block and tx level ≈ 690 GB
account-trace Historical account balances indexed by address + block number ≈ 180 GB
Total Mainnet full node snapshot measured on 2026-03-11 ≈ 870 GB

Problem Statement

For operators running with historyBalanceLookup=true, toolkit db lite
copies approximately 870 GB of effectively unusable data into the lite
node snapshot. Lite nodes cannot answer historical balance queries
regardless, so the data contributes nothing to lite node functionality
and the size cost makes lite-snapshot slicing operationally infeasible
on these nodes.

For operators running with the default historyBalanceLookup=false,
both databases are empty Spring-initialized directories and the size
cost is negligible — the existing behavior is fine for them.


Rationale

Why should this feature exist?

  • Operational viability: For historyBalanceLookup=true nodes,
    ~870 GB savings is the difference between "lite snapshot slicing is
    unusable" and "lite snapshot slicing is feasible".
  • No functional regression: The excluded databases are not consulted
    by any online lite-node logic; they serve only the historical balance
    query API, which is unavailable on lite nodes.
  • Default users unaffected: Operators running with the default
    historyBalanceLookup=false see no change in behavior.

What are the use cases?

  • Node operators using toolkit db lite on a
    historyBalanceLookup=true source full node, where producing a
    usable lite snapshot is currently blocked by snapshot size.
  • Snapshot distribution scenarios where a significantly smaller
    snapshot size lowers the barrier to initial sync.
  • Storage-constrained environments where eliminating unused data
    preserves valuable disk space.

Who would benefit from this feature?

Node operators that explicitly run with historyBalanceLookup=true
and need to produce or distribute lite snapshots.


Proposed Solution

Specification

Introduce an opt-in CLI flag rather than changing the default behavior,
because:

  • Default-configured users (historyBalanceLookup=false, the majority)
    do not need the change and should not see any behavior shift.
  • Operators who enable the flag must explicitly accept that the lite
    snapshot will not carry historical balance data, and that running
    merge afterwards cannot restore the feature (state-tree semantics
    require continuous accumulation from genesis).
  • Keeping merge untouched avoids introducing fragile compatibility
    logic for cross-toolkit-version history packs.

1. Split (lite)

Add a boolean CLI option to toolkit db lite -o split -t snapshot:

--exclude-historical-balance    default: false

When the flag is set to true, account-trace and balance-trace
are excluded from the lite snapshot. When the flag is omitted (the
default), the legacy behavior is preserved exactly: trace stores are
included in the snapshot.

A loud warning is printed at split time when the flag is enabled,
covering:

  • the flag has functional impact only when the source full node ran
    with historyBalanceLookup=true
  • in that case the loss is permanent: lite nodes booted from the
    resulting snapshot cannot answer historical balance lookups, and
    running merge afterwards does NOT restore the feature
  • operators who need historical balance lookup on the resulting lite
    node must NOT enable this flag

split -t history and merge ignore the flag.

2. Merge (merge)

No changes. The merge pipeline continues to operate on the legacy
5-database archive set. Operators who use
--exclude-historical-balance=true accept that the historical balance
feature is permanently unavailable on the resulting lite node.

  • API Changes: None.
  • Configuration Changes: None.
  • Protocol Changes: None.

Testing Strategy

Test Scenarios

  1. Default path (flag omitted): existing DbLiteRocksDbTest /
    DbLiteRocksDbV2Test continue to pass without modification — zero
    regression on default-configured nodes.
  2. Opt-in path (--exclude-historical-balance=true): a new test
    asserts that the produced lite snapshot directory contains neither
    account-trace nor balance-trace.
  3. Manual verification on a real historyBalanceLookup=true node:
    compare snapshot sizes before and after the flag is enabled;
    confirm the reduction matches expectations (≈ 870 GB).
  4. Manual verification: start a lite node from the opt-in snapshot
    and confirm block sync and core state queries work; confirm that
    historical balance queries are unavailable (as documented).

Performance Considerations

The change affects only file-handling logic during snapshot split when
the flag is enabled; runtime performance of nodes is unaffected.


Scope of Impact

  • Core protocol
  • API/RPC
  • Database
  • Network layer
  • Smart contracts
  • Documentation (toolkit README)
  • Other: toolkit db lite

Breaking Changes

None. The flag defaults to off; default operators see no behavior
change.

Backward Compatibility

Fully backward compatible. Existing full-node and lite-node databases
are not modified, and previously generated snapshots are not affected.
Operators who opt in must accept that the resulting lite snapshot is
not interchangeable with one produced under the default behavior with
respect to the historical balance feature.


Implementation

Do you have ideas regarding the implementation?

Add a @CommandLine.Option for --exclude-historical-balance in
DbLite.java, plus a small helper that returns the legacy or extended
exclusion set for getSnapshotDbs. Add a runtime warning at split
entry when the flag is enabled. split -t history and merge are
left unchanged.

Are you willing to implement this feature?

  • Yes, I can implement this feature

Estimated Complexity

  • Low (minimal changes; ~100 LOC across DbLite, README, tests)

Alternatives Considered

  • Make exclusion the default (original proposal). Rejected because
    it imposes a breaking semantic change on operators who explicitly
    opted into historyBalanceLookup=true and may not realize lite
    snapshots will silently drop the feature. Opt-in via flag preserves
    default behavior and forces explicit operator acknowledgement.
  • Split + complementary merge support (move trace stores to the
    history dataset and have merge re-assemble them). Rejected because
    state-tree semantics mean a re-assembled lite node cannot correctly
    answer historical balance queries anyway, and the merge-side
    complexity (cross-version history packs, partial-presence handling,
    bak replay) introduces real risk of silent data loss for marginal
    benefit.

Additional Context

Related Issues/PRs

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions