Skip to content

Handle SP pull queue backpressure (503/429 with Retry-After) #799

@BigLep

Description

@BigLep

Context

When SPs are under sustained pull load, Curio will enforce per-user and global pull queue caps, returning:

  • 429 Too Many Requests with Retry-After: 30 — per-user cap (32 pieces) exceeded
  • 503 Service Unavailable with Retry-After: 60 — global cap (128 pieces) exceeded

Today the SDK has a fixed 5-minute pull timeout and no handling for these status codes. Pieces that exceed server-side budgets silently disappear from the client's perspective.

See filecoin-project/curio#1241 and this comment for the full server-side design.

Work needed in synapse-sdk

  1. Handle 429 and 503 responses from the pull endpoint. Respect the Retry-After header and retry automatically up to a configurable limit. Surface typed errors (with retryAfter value) when retries are exhausted so callers can act on them.

  2. Expose and document the pull flow timeout. The current 5-minute client-side timeout should be configurable for high-throughput users doing bulk transfers. Default can stay at 5 min; server-side budget is being lowered to 20 min (from 1h).

  3. Surface backpressure to callers actionably. StorageManager.upload() and StorageContext.pull() should propagate typed errors that distinguish "SP is overloaded, retry later" from permanent failures, so callers can implement their own backoff or queuing strategies.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    🐱 Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions