Skip to content

Fold resume into send — auto-detect prior incomplete transfers#4

Open
cdcseacave wants to merge 2 commits into
developfrom
resume
Open

Fold resume into send — auto-detect prior incomplete transfers#4
cdcseacave wants to merge 2 commits into
developfrom
resume

Conversation

@cdcseacave
Copy link
Copy Markdown
Owner

Summary

Re-running the identical send of an interrupted transfer now auto-detects the prior on-disk state and continues it. The explicit resume subcommand is removed. Detection is keyed by (peer fingerprint, file list) rather than a random UUID, so the user no longer has to remember a transfer id or pass a matching --state-dir.

A user runs p2p-transfer send /some/dir --rendezvous … --code …, gets disconnected at 60%, and runs the exact same command again → resumes from 60%. No --resume, no UUID, no --state-dir.

What changed

p2p-core

  • FolderTransferState records the negotiated peer_fingerprint (#[serde(default)]).
  • session::send_path captures the peer fingerprint, wires a per-file state-save callback (so an abrupt kill — not just a recoverable network error — leaves resumable state), and stamps the fingerprint on every persisted checkpoint.
  • Extracted public enumerate_files(path) so the live send and resume-detection build byte-identical file lists.

p2p-cli

  • util.rs: default_state_dir() (per-user data dir via the directories crate) and find_resumable_state() — strict, order-independent match on every file's (path, size, mtime); newest-wins + warning on conflict; unparseable/old state ignored (fresh start).
  • send.rs: handle_send locates resumable state after pairing, logs Resuming transfer …, and plumbs --no-resume.
  • Dropped the Resume command/variant/dispatch and deleted resume.rs.
  • Added directories = "5".

Resume state moved from the CWD to a per-user data dir by default (--state-dir still overrides).

Design notes

  • Because the session must be established first (to learn the peer fingerprint) before locating state, resume uses the current invocation's negotiated chunk size. This is correct for the intended "re-run the exact same command" path; re-running with a different --chunk-size starts fresh on mismatch.
  • Strict (path, size, mtime) match: any drift (resized/touched file, file added/removed) → fresh transfer. --no-resume is the manual escape hatch for content-edited sources whose size+mtime were restored.

Tests

  • Rewrote tests/rendezvous_disconnect_resume_test.rs to drive phase 2 through handle_send (asserts the resume is consumed, the remaining file is re-delivered, and Resuming transfer is logged). It reconstructs a deterministic mid-transfer checkpoint rather than racing a live interruption — in rendezvous mode the QUIC initiator/responder (and thus which side's ConfigMessage/bandwidth limit wins) is decided by a UUID race, so a live transfer's duration isn't stable enough to interrupt deterministically.
  • Added find_resumable_state unit tests for every match/reject branch.
  • Updated README / DESIGN / CHANGELOG / both AGENTS.md and the stress.sh T10 smoke step.

Verification

  • cargo fmt --all -- --check — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo test --workspace — green (integration test stable across repeated isolated + full-workspace runs)
  • cargo build --features full (GUI+CLI) — clean

🤖 Generated with Claude Code

Re-running the identical `send` of an interrupted transfer now auto-detects
the prior on-disk state and continues it; the explicit `resume` subcommand is
removed. Detection is keyed by (peer fingerprint, file list) rather than a
random UUID:

- FolderTransferState records the negotiated peer_fingerprint; send_path
  stamps it and persists a checkpoint on every file completion (so an abrupt
  kill, not just a recoverable network error, leaves resumable state).
- New p2p-cli util helpers: default_state_dir() (per-user data dir via the
  directories crate) and find_resumable_state() (strict, order-independent
  match on every file's (path, size, mtime); newest-wins on conflict).
- handle_send locates resumable state after pairing and logs "Resuming
  transfer …"; --no-resume forces a fresh transfer.
- enumerate_files extracted in p2p-core so the live send and resume-detection
  build identical file lists.

Removed: the resume subcommand, p2p-cli/src/resume.rs, and the Resume CLI
variant/dispatch. Resume state moved from CWD to a per-user data dir by
default (--state-dir still overrides).

Tests: rewrote the rendezvous disconnect/resume integration test to drive
phase 2 through handle_send; added find_resumable_state unit tests for every
match/reject branch. Docs (README/DESIGN/CHANGELOG/AGENTS) and the stress.sh
T10 smoke step updated for the new flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 30, 2026 19:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the explicit resume CLI command and folds resume detection into send, using peer fingerprint plus source file metadata to locate prior transfer state automatically.

Changes:

  • Adds peer fingerprint stamping and shared file enumeration for persisted transfer state.
  • Updates CLI send to auto-detect resumable state, adds --no-resume, and removes resume.
  • Updates tests, docs, smoke script, and changelog for the new resume flow.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/rendezvous_disconnect_resume_test.rs Reworks rendezvous resume integration test to exercise repeated handle_send.
smoke/src/stress.sh Updates smoke resume scenario to re-run send with a pinned state directory.
README.md Documents automatic send resume and removes explicit resume command docs.
p2p-core/src/transfer_folder.rs Exposes shared file enumeration and records peer fingerprint in folder state.
p2p-core/src/session.rs Stamps peer fingerprint and adds per-file checkpoint persistence.
p2p-cli/src/util.rs Adds default state directory and resumable state discovery.
p2p-cli/src/send.rs Adds auto-resume resolution and --no-resume plumbing.
p2p-cli/src/resume.rs Removes the old resume command implementation.
p2p-cli/src/lib.rs Removes resume module/dispatch and updates send dispatch.
p2p-cli/src/cli.rs Adds --no-resume and removes Resume command.
p2p-cli/Cargo.toml Adds directories dependency.
p2p-cli/AGENTS.md Updates CLI structure notes for automatic resume.
DESIGN.md Updates resume design documentation.
CHANGELOG.md Adds unreleased entry for folded resume.
Cargo.toml Updates test comments and adds tracing dependencies for integration logging.
AGENTS.md Updates usage/contributor notes for automatic resume.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread p2p-cli/src/send.rs Outdated
// fresh transfer (e.g. when the source content changed in a way the
// size+mtime check can't see).
let (transfer_id, state_file, resume_from_bytes) =
match resolve_resume(&state_dir, &path, peer_fp, no_resume).await? {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 212e39f. Resume now requires the saved config.chunk_size to equal the current invocation's negotiated chunk size (session.config().chunk_size); a candidate that matches peer+files but differs on chunk size is rejected (fresh transfer) with a warning. Re-negotiating the saved config isn't viable here — the handshake that exchanges chunk_size completes before we can identify the peer and locate the state — so treating a chunk-size change as drift is the safe path. See find_resumable_state in p2p-cli/src/util.rs and the new rejects_when_chunk_size_differs test.

Comment thread p2p-cli/src/util.rs Outdated
Comment on lines +135 to +140
if matches > 1 {
warn!(
"{} resumable state files match this source and peer; resuming the newest. \
Stale ones are removed when their transfer next completes.",
matches
);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 212e39f. find_resumable_state now collects all matching state files, resumes the newest, and deletes the older duplicates on the spot (best-effort tokio::fs::remove_file), so at most one matching checkpoint ever remains and a later identical send can't resurrect a stale one. Doc + warning text updated accordingly, and a keeps_only_newest_match_and_deletes_stale test covers it.

Comment thread p2p-core/src/session.rs Outdated
Comment on lines +332 to +336
folder_session.set_state_callback(Arc::new(move |st: &FolderTransferState| {
let mut snapshot = st.clone();
snapshot.peer_fingerprint = peer_fp;
if let Ok(json) = serde_json::to_string_pretty(&snapshot) {
let _ = std::fs::write(&path_buf, json);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 212e39f. The checkpoint callback is now throttled to at most once every 2s (gate checked before any clone/serialize, so skipped ticks are nearly free — kills the O(files²) serialization), serializes compactly with serde_json::to_vec, and performs the write via tokio::task::spawn_blocking with an atomic temp+rename so it no longer blocks a Tokio worker and readers never see a torn file. The error/reconnect paths still save the full final state, so durability is unchanged except for an abrupt SIGKILL between files (bounded to ~2s of completed-file progress).

- Guard resume on chunk_size: a re-run with a different --chunk-size has
  an incompatible .partial layout, so resume now requires the saved
  config.chunk_size to match the current negotiated size and otherwise
  starts a fresh transfer with a warning (no skipped/overwritten ranges).
- Clean up stale duplicate state files: when several checkpoints match
  the same source+peer, resume the newest and delete the rest so a later
  identical send can't pick up a stale checkpoint.
- Make checkpointing cheap: throttle to <=1 write/2s, serialize compact,
  and write via spawn_blocking with atomic temp+rename instead of a full
  pretty-JSON serialize + blocking std::fs::write per completed file
  (was O(files^2) serialization and blocked a Tokio worker).
- Correct the now-accurate FolderTransferState.config doc comment.
- Add unit tests for chunk-size rejection and stale-duplicate cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants