feat(net): optimize disconnectRandom by tracking block receive time#6704
feat(net): optimize disconnectRandom by tracking block receive time#6704xxo1shine wants to merge 1 commit intotronprotocol:developfrom
Conversation
|
|
||
| @Setter | ||
| @Getter | ||
| private volatile long blockRcvTimeCmp; |
There was a problem hiding this comment.
[QUESTION]Why does blockRcvTimeCmp live in PeerConnection?
Its only use is inside ResilienceService.getRandomDisconnectionPeers() to snapshot blockRcvTime before sorting, so the comparator reads a stable value even if another thread updates blockRcvTime mid-sort. That's a valid concern, but it's an implementation detail of one method in one service — it doesn't describe any property of the peer itself.
A local snapshot map achieves the same safety without polluting the peer domain object:
private List<PeerConnection> getRandomDisconnectionPeers(List<PeerConnection> peers) {
Map<PeerConnection, Long> snapshot = new IdentityHashMap<>();
peers.forEach(p -> snapshot.put(p, p.getBlockRcvTime()));
List<PeerConnection> sorted = new ArrayList<>(peers);
sorted.sort(Comparator.comparingLong(snapshot::get));
return sorted.subList(0, sorted.size() / 2);
}Is there a reason this needs to be a field rather than a method-local variable?
There was a problem hiding this comment.
The reason blockRcvTimeCmp was introduced as a field was to solve a concrete problem: blockRcvTime is updated concurrently by network threads during the sort, which causes the Comparator to observe inconsistent values across comparisons and triggers IllegalArgumentException: Comparison method violates its general contract — failing the sort. Making it a field on the peer was the way I picked at the time to give the comparator a stable snapshot value to read.
That said, your direction is right — a stable snapshot is purely an implementation detail of getRandomDisconnectionPeers and shouldn't leak onto PeerConnection. Switching to a method-local IdentityHashMap<PeerConnection, Long> populated once before the sort and read-only during the sort gives the same stability guarantee without polluting the domain model.
There was a problem hiding this comment.
I've optimized it according to your suggestions.
…er peer - Add blockRcvTime/blockRcvTimeCmp fields to PeerConnection to track when a peer last delivered a valid block - Set blockRcvTime in BlockMsgHandler after each block is received - Fix lastInteractiveTime update in InventoryMsgHandler: only update for block inventories above current head block num, preventing attackers from forging activity via stale block hashes - Add getRandomDisconnectionPeers() to ResilienceService: narrows the disconnect candidate pool to the oldest half by blockRcvTime, so peers that recently delivered blocks are protected from random eviction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cdf1047 to
a912a49
Compare
| if (type.equals(InventoryType.BLOCK) && peer.getAdvInvSpread().getIfPresent(item) == null) { | ||
| peer.setLastInteractiveTime(System.currentTimeMillis()); | ||
| long headNum = tronNetDelegate.getHeadBlockId().getNum(); | ||
| if (new BlockId(id).getNum() > headNum) { |
There was a problem hiding this comment.
[QUESTION] Small wording suggestion. The PR description says this prevents "attackers from forging activity via stale block hashes" — the stale-replay part is correct and a real improvement, but I don't think arbitrary forgery is prevented:
BlockId(Sha256Hash) reinterprets the first 8 bytes of the hash as a big-endian long (BlockCapsule.java:347-352) without verifying the hash matches any real block. An adversary can craft a 32-byte hash whose high bytes encode headNum + 1 and still pass the check.
The check still has clear value (this is the only path that refreshes lastInteractiveTime on INV — P2pEventHandlerImpl.updateLastInteractiveTime doesn't include INVENTORY in its switch, so without this, tightening any historical block hash refreshes the inactivity clock). Just suggest narrowing the description to "prevent stale block hashes from refreshing lastInteractiveTime" rather than the broader "forging activity" framing — keeps expectations accurate.
What does this PR do?
Optimize disconnectRandom by tracking block receive time peer
Fixes #6504
Why are these changes required?
This PR has been tested by:
Follow up
Extra details