Skip to content

feat(net): optimize disconnectRandom by tracking block receive time#6704

Open
xxo1shine wants to merge 1 commit intotronprotocol:developfrom
xxo1shine:feature/random-disconnect-optimization
Open

feat(net): optimize disconnectRandom by tracking block receive time#6704
xxo1shine wants to merge 1 commit intotronprotocol:developfrom
xxo1shine:feature/random-disconnect-optimization

Conversation

@xxo1shine
Copy link
Copy Markdown
Collaborator

What does this PR do?

Optimize disconnectRandom by tracking block receive time peer

  • Add blockRcvTime/blockRcvTimeCmp fields to PeerConnection to track when a peer last delivered a valid block
  • Set blockRcvTime in BlockMsgHandler after each block is received
  • Fix lastInteractiveTime update in InventoryMsgHandler: only update for block inventories above current head block num, preventing attackers from forging activity via stale block hashes
  • Add getRandomDisconnectionPeers() to ResilienceService: narrows the disconnect candidate pool to the oldest half by blockRcvTime, so peers that recently delivered blocks are protected from random eviction

Fixes #6504

Why are these changes required?

This PR has been tested by:

  • Unit Tests
  • Manual Testing

Follow up

Extra details

@github-actions github-actions Bot requested a review from 317787106 April 24, 2026 03:59
@xxo1shine xxo1shine changed the title feat(net): optimize disconnectRandom by tracking block receive time per peer feat(net): optimize disconnectRandom by tracking block receive time Apr 24, 2026
@xxo1shine xxo1shine requested a review from lvs0075 April 24, 2026 04:02
@halibobo1205 halibobo1205 added the topic:net p2p net work, synchronization label Apr 24, 2026
@halibobo1205 halibobo1205 added this to the GreatVoyage-v4.8.2 milestone Apr 24, 2026

@Setter
@Getter
private volatile long blockRcvTimeCmp;
Copy link
Copy Markdown
Contributor

@warku123 warku123 Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QUESTION]Why does blockRcvTimeCmp live in PeerConnection?

Its only use is inside ResilienceService.getRandomDisconnectionPeers() to snapshot blockRcvTime before sorting, so the comparator reads a stable value even if another thread updates blockRcvTime mid-sort. That's a valid concern, but it's an implementation detail of one method in one service — it doesn't describe any property of the peer itself.

A local snapshot map achieves the same safety without polluting the peer domain object:

private List<PeerConnection> getRandomDisconnectionPeers(List<PeerConnection> peers) {
    Map<PeerConnection, Long> snapshot = new IdentityHashMap<>();
    peers.forEach(p -> snapshot.put(p, p.getBlockRcvTime()));
    List<PeerConnection> sorted = new ArrayList<>(peers);
    sorted.sort(Comparator.comparingLong(snapshot::get));
    return sorted.subList(0, sorted.size() / 2);
}

Is there a reason this needs to be a field rather than a method-local variable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason blockRcvTimeCmp was introduced as a field was to solve a concrete problem: blockRcvTime is updated concurrently by network threads during the sort, which causes the Comparator to observe inconsistent values across comparisons and triggers IllegalArgumentException: Comparison method violates its general contract — failing the sort. Making it a field on the peer was the way I picked at the time to give the comparator a stable snapshot value to read.

That said, your direction is right — a stable snapshot is purely an implementation detail of getRandomDisconnectionPeers and shouldn't leak onto PeerConnection. Switching to a method-local IdentityHashMap<PeerConnection, Long> populated once before the sort and read-only during the sort gives the same stability guarantee without polluting the domain model.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've optimized it according to your suggestions.

@lvs0075 lvs0075 requested a review from halibobo1205 April 27, 2026 04:58
…er peer

- Add blockRcvTime/blockRcvTimeCmp fields to PeerConnection to track when
  a peer last delivered a valid block
- Set blockRcvTime in BlockMsgHandler after each block is received
- Fix lastInteractiveTime update in InventoryMsgHandler: only update for
  block inventories above current head block num, preventing attackers from
  forging activity via stale block hashes
- Add getRandomDisconnectionPeers() to ResilienceService: narrows the
  disconnect candidate pool to the oldest half by blockRcvTime, so peers
  that recently delivered blocks are protected from random eviction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@xxo1shine xxo1shine force-pushed the feature/random-disconnect-optimization branch from cdf1047 to a912a49 Compare April 27, 2026 06:35
if (type.equals(InventoryType.BLOCK) && peer.getAdvInvSpread().getIfPresent(item) == null) {
peer.setLastInteractiveTime(System.currentTimeMillis());
long headNum = tronNetDelegate.getHeadBlockId().getNum();
if (new BlockId(id).getNum() > headNum) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QUESTION] Small wording suggestion. The PR description says this prevents "attackers from forging activity via stale block hashes" — the stale-replay part is correct and a real improvement, but I don't think arbitrary forgery is prevented:

BlockId(Sha256Hash) reinterprets the first 8 bytes of the hash as a big-endian long (BlockCapsule.java:347-352) without verifying the hash matches any real block. An adversary can craft a 32-byte hash whose high bytes encode headNum + 1 and still pass the check.

The check still has clear value (this is the only path that refreshes lastInteractiveTime on INV — P2pEventHandlerImpl.updateLastInteractiveTime doesn't include INVENTORY in its switch, so without this, tightening any historical block hash refreshes the inactivity clock). Just suggest narrowing the description to "prevent stale block hashes from refreshing lastInteractiveTime" rather than the broader "forging activity" framing — keeps expectations accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:net p2p net work, synchronization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node fails to synchronize blocks after a peer is randomly disconnected

3 participants