fix(v0.4.0, resource limits): registry cleanup, body cap, SO_TIMEOUT, conn cap, posix perms#61
Merged
Merged
Conversation
…y cap, SO_TIMEOUT, conn cap, WS sub cap, posix perms, shared HttpClient Closes audit findings HIGH H2 and a cluster of MEDIUM resource-bound DoS issues. No behaviour change for honest clients; only caps the worst case when a peer goes hostile or a long-running daemon accumulates state. ManifestRegistry (H2) - App.java now calls manifestRegistry.startBackgroundCleanup(15min). Pre- v0.4.0 the eviction task was wired but never started, so the 24h TTL documented in the registry's javadoc was a lie — entries leaked forever. - New MAX_ENTRIES = 10_000 cap in ManifestRegistry.store: evicts the oldest entry to make room when full. Bounds memory under burst loads without rejecting brand-new manifests (the more useful behaviour). HTTP body cap - App.java sets cfg.http.maxRequestSize = 64 MiB. Javalin's 6.x default of ~1 MiB was too low for big-tree manifests; 64 MiB is comfortably above honest use (~300k entries) but well below trivial-DoS territory. WebSocket subscription cap - ProgressWebSocket: MAX_SUBS_PER_SESSION = 256. Pre-v0.4.0 a single session could subscribe to 2^31 transferIds (no validation, no cap), each one allocating a ProgressBus subscription. UI in practice holds a wildcard or a handful of subs. JsonJobStore POSIX perms - jobs/ directory: rwx------. Per-job *.json files: rw-------. Job-state files contain the full manifest with absolute paths and target subdirs — sensitive on multi-user hosts. No-op on Windows (FS uses ACLs). RelayRoutes shared HttpClient - AtomicReference-cached singleton instead of HttpClient.newBuilder() per request. JDK 21+ HttpClient keeps two background selector threads alive per client until shutdown(); pre-v0.4.0 every relay/push leaked threads until GC reclaimed them. TCP HELLO timeout - TcpConnectionHandler.handle: launches a virtual-thread watchdog that force-closes the channel if HELLO doesn't arrive in 30s. SocketChannel in blocking mode doesn't honour SO_TIMEOUT (Socket-level only), so we implement the deadline as a watchdog. Slowloris-style "open TCP, send one byte, sit there" no longer holds an FD indefinitely. TCP connection cap - BlobTcpServer.MAX_CONCURRENT_CONNECTIONS = 1024. Drops new accepts over the limit at the application layer. With chunksPerFile=8 a single honest peer keeps ~8 sockets per active file, so 1024 supports ~128 in-flight files — well above any realistic workload. Local mvn test: 361 tests / 0 failures / 56 errors all in the known Jetty-loopback Windows-env category. Linux CI exercises them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes audit findings HIGH H2 + cluster of medium resource-bound DoS issues. Third of ~5 PRs gating v0.4.0. No behaviour change for honest clients; only caps the worst case under hostile or long-lived workloads.
Changes
ManifestRegistry actually evicts now (H2)
Pre-v0.4.0 the eviction task was wired but
App.bootnever calledstartBackgroundCleanup, so manifests leaked forever despite the documented 24h TTL.App.java:manifestRegistry.startBackgroundCleanup(Duration.ofMinutes(15)).ManifestRegistry.MAX_ENTRIES = 10_000: when full, evict oldest instore(). Bounds memory under bursts without rejecting fresh manifests.HTTP body size cap (medium)
App.java:cfg.http.maxRequestSize = 64 MiB. Javalin 6.x default is ~1 MiB (too low for big-tree manifests). 64 MiB ≈ 300k entries — well above honest use, well below trivial-OOM territory.WebSocket subscription cap per session (medium)
ProgressWebSocket.MAX_SUBS_PER_SESSION = 256. Pre-v0.4.0 a single auth'd session could subscribe to 2³¹ transferIds, each one allocating aProgressBussubscription.POSIX perms on state files (medium)
JsonJobStore:<state-dir>/jobs/→rwx------, per-job*.json→rw-------. Job-state contains the full manifest with absolute paths and target subdirs — sensitive on multi-user hosts. Auto-no-op on Windows.Shared HttpClient in RelayRoutes (medium)
HttpClient.newBuilder()per request. JDK 21+ HttpClient keeps two background selector threads alive untilshutdown(); pre-v0.4.0 every relay/push leaked threads until GC reclaimed them.TCP HELLO timeout (medium)
TcpConnectionHandler.handle: virtual-thread watchdog force-closes the channel if HELLO doesn't arrive in 30s. SocketChannel blocking mode doesn't honour SO_TIMEOUT (Socket-level only), so the deadline is a watchdog. Closes the Slowloris-style "open TCP, send 1 byte, hold the FD" path.TCP connection cap (medium)
BlobTcpServer.MAX_CONCURRENT_CONNECTIONS = 1024. Drops new accepts over the limit at the app layer. WithchunksPerFile=8, one honest peer holds ~8 sockets per active file → 1024 supports ~128 in-flight files concurrently.Test plan
mvn test→ 361 tests, 0 failures, 56 errors all in the known Jetty-loopback Windows-env category. No new failures.POST /api/manifest; confirm registry size stays ≤ 10k and reaches steady state.<state-dir>/jobs/*.jsonon Linux → file mode-rw-------, dirdrwx------.🤖 Generated with Claude Code