Skip to content

fix(v0.4.0, resource limits): registry cleanup, body cap, SO_TIMEOUT, conn cap, posix perms#61

Merged
VirusAlex merged 1 commit into
mainfrom
fix/v040-resource-limits
Apr 30, 2026
Merged

fix(v0.4.0, resource limits): registry cleanup, body cap, SO_TIMEOUT, conn cap, posix perms#61
VirusAlex merged 1 commit into
mainfrom
fix/v040-resource-limits

Conversation

@VirusAlex
Copy link
Copy Markdown
Owner

Summary

Closes audit findings HIGH H2 + cluster of medium resource-bound DoS issues. Third of ~5 PRs gating v0.4.0. No behaviour change for honest clients; only caps the worst case under hostile or long-lived workloads.

Changes

ManifestRegistry actually evicts now (H2)

Pre-v0.4.0 the eviction task was wired but App.boot never called startBackgroundCleanup, so manifests leaked forever despite the documented 24h TTL.

  • App.java: manifestRegistry.startBackgroundCleanup(Duration.ofMinutes(15)).
  • ManifestRegistry.MAX_ENTRIES = 10_000: when full, evict oldest in store(). Bounds memory under bursts without rejecting fresh manifests.

HTTP body size cap (medium)

  • App.java: cfg.http.maxRequestSize = 64 MiB. Javalin 6.x default is ~1 MiB (too low for big-tree manifests). 64 MiB ≈ 300k entries — well above honest use, well below trivial-OOM territory.

WebSocket subscription cap per session (medium)

  • ProgressWebSocket.MAX_SUBS_PER_SESSION = 256. Pre-v0.4.0 a single auth'd session could subscribe to 2³¹ transferIds, each one allocating a ProgressBus subscription.

POSIX perms on state files (medium)

  • JsonJobStore: <state-dir>/jobs/rwx------, per-job *.jsonrw-------. Job-state contains the full manifest with absolute paths and target subdirs — sensitive on multi-user hosts. Auto-no-op on Windows.

Shared HttpClient in RelayRoutes (medium)

  • AtomicReference-cached singleton instead of HttpClient.newBuilder() per request. JDK 21+ HttpClient keeps two background selector threads alive until shutdown(); pre-v0.4.0 every relay/push leaked threads until GC reclaimed them.

TCP HELLO timeout (medium)

  • TcpConnectionHandler.handle: virtual-thread watchdog force-closes the channel if HELLO doesn't arrive in 30s. SocketChannel blocking mode doesn't honour SO_TIMEOUT (Socket-level only), so the deadline is a watchdog. Closes the Slowloris-style "open TCP, send 1 byte, hold the FD" path.

TCP connection cap (medium)

  • BlobTcpServer.MAX_CONCURRENT_CONNECTIONS = 1024. Drops new accepts over the limit at the app layer. With chunksPerFile=8, one honest peer holds ~8 sockets per active file → 1024 supports ~128 in-flight files concurrently.

Test plan

  • Local mvn test → 361 tests, 0 failures, 56 errors all in the known Jetty-loopback Windows-env category. No new failures.
  • CI green on Linux.
  • Manual: leave a daemon running 24h with periodic POST /api/manifest; confirm registry size stays ≤ 10k and reaches steady state.
  • Manual: open a TCP connection without sending HELLO; expect server to close after ~30s.
  • Manual: ls -l <state-dir>/jobs/*.json on Linux → file mode -rw-------, dir drwx------.

🤖 Generated with Claude Code

…y cap,

SO_TIMEOUT, conn cap, WS sub cap, posix perms, shared HttpClient

Closes audit findings HIGH H2 and a cluster of MEDIUM resource-bound DoS
issues. No behaviour change for honest clients; only caps the worst case
when a peer goes hostile or a long-running daemon accumulates state.

ManifestRegistry (H2)
- App.java now calls manifestRegistry.startBackgroundCleanup(15min). Pre-
  v0.4.0 the eviction task was wired but never started, so the 24h TTL
  documented in the registry's javadoc was a lie — entries leaked forever.
- New MAX_ENTRIES = 10_000 cap in ManifestRegistry.store: evicts the
  oldest entry to make room when full. Bounds memory under burst loads
  without rejecting brand-new manifests (the more useful behaviour).

HTTP body cap
- App.java sets cfg.http.maxRequestSize = 64 MiB. Javalin's 6.x default
  of ~1 MiB was too low for big-tree manifests; 64 MiB is comfortably
  above honest use (~300k entries) but well below trivial-DoS territory.

WebSocket subscription cap
- ProgressWebSocket: MAX_SUBS_PER_SESSION = 256. Pre-v0.4.0 a single
  session could subscribe to 2^31 transferIds (no validation, no cap),
  each one allocating a ProgressBus subscription. UI in practice holds
  a wildcard or a handful of subs.

JsonJobStore POSIX perms
- jobs/ directory: rwx------. Per-job *.json files: rw-------. Job-state
  files contain the full manifest with absolute paths and target subdirs
  — sensitive on multi-user hosts. No-op on Windows (FS uses ACLs).

RelayRoutes shared HttpClient
- AtomicReference-cached singleton instead of HttpClient.newBuilder()
  per request. JDK 21+ HttpClient keeps two background selector threads
  alive per client until shutdown(); pre-v0.4.0 every relay/push leaked
  threads until GC reclaimed them.

TCP HELLO timeout
- TcpConnectionHandler.handle: launches a virtual-thread watchdog that
  force-closes the channel if HELLO doesn't arrive in 30s. SocketChannel
  in blocking mode doesn't honour SO_TIMEOUT (Socket-level only), so we
  implement the deadline as a watchdog. Slowloris-style "open TCP, send
  one byte, sit there" no longer holds an FD indefinitely.

TCP connection cap
- BlobTcpServer.MAX_CONCURRENT_CONNECTIONS = 1024. Drops new accepts
  over the limit at the application layer. With chunksPerFile=8 a single
  honest peer keeps ~8 sockets per active file, so 1024 supports ~128
  in-flight files — well above any realistic workload.

Local mvn test: 361 tests / 0 failures / 56 errors all in the known
Jetty-loopback Windows-env category. Linux CI exercises them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@VirusAlex VirusAlex merged commit 93a9841 into main Apr 30, 2026
1 check passed
@VirusAlex VirusAlex deleted the fix/v040-resource-limits branch April 30, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants