Context
When SPs are under sustained pull load, Curio will enforce per-user and global pull queue caps, returning:
429 Too Many Requests with Retry-After: 30 — per-user cap (32 pieces) exceeded
503 Service Unavailable with Retry-After: 60 — global cap (128 pieces) exceeded
Today the SDK has a fixed 5-minute pull timeout and no handling for these status codes. Pieces that exceed server-side budgets silently disappear from the client's perspective.
See filecoin-project/curio#1241 and this comment for the full server-side design.
Work needed in synapse-sdk
-
Handle 429 and 503 responses from the pull endpoint. Respect the Retry-After header and retry automatically up to a configurable limit. Surface typed errors (with retryAfter value) when retries are exhausted so callers can act on them.
-
Expose and document the pull flow timeout. The current 5-minute client-side timeout should be configurable for high-throughput users doing bulk transfers. Default can stay at 5 min; server-side budget is being lowered to 20 min (from 1h).
-
Surface backpressure to callers actionably. StorageManager.upload() and StorageContext.pull() should propagate typed errors that distinguish "SP is overloaded, retry later" from permanent failures, so callers can implement their own backoff or queuing strategies.
References
Context
When SPs are under sustained pull load, Curio will enforce per-user and global pull queue caps, returning:
429 Too Many RequestswithRetry-After: 30— per-user cap (32 pieces) exceeded503 Service UnavailablewithRetry-After: 60— global cap (128 pieces) exceededToday the SDK has a fixed 5-minute pull timeout and no handling for these status codes. Pieces that exceed server-side budgets silently disappear from the client's perspective.
See filecoin-project/curio#1241 and this comment for the full server-side design.
Work needed in synapse-sdk
Handle 429 and 503 responses from the pull endpoint. Respect the
Retry-Afterheader and retry automatically up to a configurable limit. Surface typed errors (withretryAftervalue) when retries are exhausted so callers can act on them.Expose and document the pull flow timeout. The current 5-minute client-side timeout should be configurable for high-throughput users doing bulk transfers. Default can stay at 5 min; server-side budget is being lowered to 20 min (from 1h).
Surface backpressure to callers actionably.
StorageManager.upload()andStorageContext.pull()should propagate typed errors that distinguish "SP is overloaded, retry later" from permanent failures, so callers can implement their own backoff or queuing strategies.References