Skip to content

TUS upload finalize fails for large (30+ GB) files on posix storage driver #295

@Tellervo89

Description

@Tellervo89

Describe the bug

TUS uploads of very large files (≥ ~30 GB) fail at the finalize step on the posix storage driver. All bytes transfer successfully (the final TUS PATCH reaches Offset == Size), but the post-upload metadata commit fails with a context-cancel chain originating in the posix idcache jetstream KV operation, ending in failed to init new node with an empty path. The HTTP PATCH then returns 500 to the client.

Affected clients (all symptoms identical from the server's perspective):

  • Web UI (drag-and-drop upload) — shows file as "Finalizing… few seconds left" then fails with "1 of 1 item failed".
  • Desktop client (Linux) — fails the same file repeatedly.
  • rclone with vendor = infinitescale — fails identically.

A direct WebDAV PUT of the same file (bypassing TUS entirely) succeeds reliably:

curl -u USER:APPPASS -T bigfile.mp4 \
  "https://server/remote.php/dav/spaces/<space-id>/path/bigfile.mp4"
# → 201 Created, file appears immediately and is fully usable

So the issue is specific to the TUS upload finalize code path, not the byte transfer, the proxy, the disk, or the posix driver's general ability to store large files.

Steps to reproduce

  1. Deploy OpenCloud 7.0.0 with STORAGE_USERS_DRIVER=posix and STORAGE_SYSTEM_DRIVER=decomposed (standard docker-compose, antivirus disabled).
  2. Authenticate as a normal user.
  3. Upload a single ~30+ GB file via the web UI or desktop client into the personal space.
  4. Wait for the upload to reach 100% and enter "Finalizing".

Reproduces 100% of the time across multiple attempts with different files (verified with two distinct MP4 files of 32.1 GB and 57.5 GB).

Expected behavior

File is committed to the target location and visible in the user's space, exactly as occurs for smaller files and as occurs for a direct WebDAV PUT of the same file.

Actual behavior

  • HTTP PATCH (the final TUS chunk) returns 500 after ~166 seconds.
  • Response body: ERR_INTERNAL_SERVER_ERROR: open : no such file or directory.
  • Server-side upload session is left in Processing: true with Offset == Size, indefinitely (does not expire to the "expired" list, does not finalize, does not produce a visible file).
  • Subsequent retries create additional stuck sessions for the same file.

Setup

  • OpenCloud image: opencloudeu/opencloud-rolling:7.0.0
  • Deployment: docker-compose, single container, listening on 127.0.0.1:9200 on the host.
  • Reverse proxy: Nginx Proxy Manager in front, with client_max_body_size 0, proxy_buffering off, proxy_request_buffering off, proxy_read_timeout 12h, proxy_send_timeout 12h.
  • Storage driver: STORAGE_USERS_DRIVER=posix, STORAGE_SYSTEM_DRIVER=decomposed.
  • Antivirus: disabled
  • Host disk: Over 400 GB free at time of failure.
  • Host RAM: not at capacity during the failure.
  • TUS staging directory (/var/lib/opencloud/storage/users/uploads) is on the same filesystem as the user data root.

Relevant .env values:

STORAGE_USERS_DRIVER=posix
STORAGE_SYSTEM_DRIVER=decomposed
STORAGE_USERS_POSIX_ROOT=/var/lib/opencloud/storage
STORAGE_USERS_ID_CACHE_STORE=nats-js-kv

Server logs

Failure chain at the moment the final TUS PATCH lands:

{"level":"error","service":"storage-users","driver":"posix","error":"context canceled",
 "line":".../pkg/storage/fs/posix/idcache/idcache.go:215",
 "message":"error in jetstream kv operation, retrying"}

(several identical "context canceled" retries over ~5 seconds)

{"level":"error","service":"storage-users","driver":"posix","error":"context canceled",
 "line":".../pkg/storage/pkg/decomposedfs/upload/store.go:244",
 "message":"failed to cache id"}

{"level":"error","service":"storage-users","driver":"posix","path":"",
 "error":"open : no such file or directory",
 "line":".../pkg/storage/pkg/decomposedfs/upload/store.go:250",
 "message":"failed to init new node"}

{"level":"error","service":"storage-users","datatx":"tus","method":"PATCH",
 "path":"/14fae9ba-6b4a-4556-9b4c-84e156c2e471",
 "message":"open : no such file or directory",
 "line":".../pkg/rhttp/datatx/manager/tus/tus.go:282",
 "message":"InternalServerError"}

{"level":"info","service":"storage-users","datatx":"tus","method":"PATCH",
 "status":500,"body":"ERR_INTERNAL_SERVER_ERROR: open : no such file or directory\n"}

{"level":"error","service":"frontend",
 "error":"Patch \"http://localhost:9158/data/tus/14fae9ba-6b4a-4556-9b4c-84e156c2e471\": EOF",
 "message":"error doing PATCH request to data service"}

Total PATCH duration before the 500: 166.46 seconds (time_ns: 166460835118). The context canceled error in the idcache jetstream KV op precedes the path:"" returned to failed to init new node, so the empty path is a consequence of the cancellation, not the cause.

Upload session state after failure

After several failed attempts at the same files, opencloud storage-users uploads sessions --processing lists multiple stuck sessions per file, all with Offset == Size and Processing: true:

filename-example1.mp4 │ 34470813213 / 34470813213 │ Processing: true   (×6 stuck sessions)
filename-example2.mp4          │ 61697531888 / 61697531888 │ Processing: true   (×3 stuck sessions)

None ever transition out of processing; none ever expire to the expired list; --restart does not recover them; only --clean removes them.

Workaround

Direct WebDAV PUT (no TUS) finalises identically-sized files cleanly:

curl -u USER:APPPASS -T "/path/filename" \
  "https://server/remote.php/dav/spaces/<space>/path/filename"
# HTTP 201, file appears normally, posix storage driver commits metadata without issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions