NetCopy ships two interchangeable data-plane protocols. The user picks one per transfer. This document explains the trade-offs and points to a manual reproduction for benchmark numbers.
Both protocols carry the same byte payload (file contents, in chunks, with XXH3-128 chunk-level verification). They differ in framing and how the hash gets onto the wire:
- HTTP —
GET /api/blob/{manifestId}/{fileId}with aRange: bytes=START-ENDheader per chunk. Connection reuse via keep-alive. Server pre-computes the chunk's XXH3-128, sets it asX-Chunk-Hashresponse header, then streams the body viaFileChannel.transferTo(which on Linux decays tosendfile(2)). Pro: trivial to debug withcurl, plays well with any HTTP-aware proxy. Con: HTTP parsing overhead per chunk, and HTTP/1.1 connection-per-concurrent-chunk. - TCP — one long-lived connection per peer, multiplexed by
reqId. Custom binary framing (seetasks/contracts/data-formats.md). Versioned protocol: v1 is two-pass (hash → DataHead → stream → DataEnd, identical to the HTTP path conceptually); v2 (default since v0.3.0) streams and hashes in a single pass, putting the digest in a trailingDataEndV2frame. Pro: fewer TCP connections (one per peer), no HTTP overhead, single read pass on the source-side disk. Con: needs its own port (--tcp-port), not curl-debuggable.
- Many small files (≤ 1 MB each). TCP wins clearly. HTTP pays a full request line + headers per chunk; with thousands of files this dominates.
- One big file (multi-GB) on a fast disk. Mostly identical. Both protocols are CPU-bound on the hash and IO-bound on the disk; framing overhead is in the noise.
- One big file on a cold-cache HDD. TCP v2 is meaningfully faster because it does one disk read per chunk on the source instead of two. v1's two-pass design was tractable on SSDs (the second pass came from the page cache) but on HDD the source ended up reading the file twice with cold seeks. v0.3.0 fixed that.
- Lossy network. Both rely on the kernel's TCP retransmit; the application layers don't differ. NetCopy retries failed chunks with exponential backoff identically.
In practice the user-visible bottleneck on a LAN is almost always the slower of the two disks (source HDD seek + receiver fsync), not the protocol. We've measured ~50–60 MB/s sustained from a single HDD source with 8 parallel chunks regardless of which protocol we pick.
-
Start two daemons with identical flags except
--port,--tcp-port, and roots. Pin the JVM with-XX:ActiveProcessorCount=Nif you want to compare across CPU budgets. -
Pre-generate the workload under one daemon's
--shared-root. -
From the UI on the other daemon, plan a transfer, then start it twice in a row — once with
protocol: "http", once with"tcp". Record theTransferCompletedevent'stotalDurationMsandavgThroughputBps, and screenshot the Performance modal's "This transfer (chunks)" tile for per-chunk timings. -
For loss runs:
sudo tc qdisc add dev <iface> root netem loss 1% # ... run the transfer ... sudo tc qdisc del dev <iface> root
-
Repeat with the TCP server disabled (
--tcp-port 0) on the source side to confirm the HTTP fallback works.
We deliberately don't ship a canned benchmark table here: numbers from a single hardware setup mislead readers comparing to their own. The Performance modal already exposes the per-chunk timings (source latency, wire time, persist time, pool acquire wait) you need to identify your own bottleneck.
| ID | Description |
|---|---|
| W1 | One 32 GB file (large-chunk path; tests sustained throughput) |
| W2 | 1000 small files of ~64 KB each (request count dominates) |
| W3 | Mixed: 4 GB ISO + 50 MB of small docs (typical real-world mix) |
| W4 | W1 again with --file-parallelism=1 --chunks-per-file=1 (single-stream baseline) |
W2 is the workload where TCP shows its largest advantage; W4 is where the two protocols converge.