Upload session state is not persisted; server restart causes 404 on chunk PUT and orphaned files on disk

Filed as a follow-up to [postguard-website#117](https://github.com/encryption4all/postguard-website/issues/117) — that issue is about the website surfacing the failure better; this one is about the underlying persistence gap on the cryptify side.

## What is wrong

`Store` keeps all upload session state in memory (`HashMap<String, Arc<Mutex<FileState>>>` in `src/store.rs:37`). There is no on-disk persistence and no startup hydration. On any cryptify process restart — deploy, panic, OOM, container reschedule — every in-progress upload session is lost.

Two visible symptoms follow:

1. **Chunk PUTs return 404 forever after restart.** `upload_chunk` looks up the uuid via `store.get(uuid)` and returns `Ok(None)` (HTTP 404) when the session is missing (`src/main.rs:284-287`). The client has no way to recover; from the client's point of view this is indistinguishable from a typo'd uuid.

2. **On-disk files leak.** `upload_init` creates `data_dir/<uuid>` (`src/main.rs:101`) and stores a 14-day expiry on the in-memory `FileState`. After a restart, the on-disk file remains but no `FileState` references it any more, and the purge task in `src/store.rs:152-173` only walks the in-memory expirations map. There is no separate disk reaper, so the file sits there until manually cleaned.

<small>[Source: src/store.rs](https://github.com/encryption4all/cryptify/blob/58883a86b369af08d92db93aa1025f9eba3c73eb/src/store.rs#L34-L49) · [src/main.rs upload_chunk](https://github.com/encryption4all/cryptify/blob/58883a86b369af08d92db93aa1025f9eba3c73eb/src/main.rs#L276-L290) · [src/main.rs upload_init](https://github.com/encryption4all/cryptify/blob/58883a86b369af08d92db93aa1025f9eba3c73eb/src/main.rs#L91-L120)</small>

## Why it matters

- For a single-replica deployment (current postguard-ops setup), every cryptify deploy or crash silently kills any user mid-upload. Ruben's report on postguard-website#117 is consistent with a deploy-time eviction.
- It also blocks **resume support** (suggested fix #4 on website#117) and **horizontal scaling** — a multi-replica deployment would need shared state regardless.

## Possible designs

In rough order of effort:

1. **Sidecar JSON per upload.** On `init`, write `data_dir/<uuid>.meta.json` with the `FileState` fields. On chunk PUT, update `uploaded` + `cryptify_token` in that file (already syncing the disk file anyway). On startup, scan `data_dir/` and rebuild the `HashMap`. Cheapest change, single-replica only.

2. **SQLite file in `data_dir`.** Same shape as #1 but transactional and easier to query for the purge task. Still single-replica.

3. **PostgreSQL.** Reuses the postguard-business stack. Required if cryptify ever runs more than one replica.

The 14-day `expires` field already exists on `FileState`, so any of these can drive the purge of both the metadata and the disk file from a single source of truth.

## Out of scope here

- Client-side retry / better error message — covered by postguard-website#117.
- Resume support API — depends on this issue being fixed first.

## Suggested next step

Confirm the diagnosis in production: pick a 404 chunk PUT from logs and check whether its timestamp lines up with a cryptify restart, and whether `data_dir/<that-uuid>` still exists on the host. If both are true, scope the fix to design #1 above and I can have a draft PR ready.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload session state is not persisted; server restart causes 404 on chunk PUT and orphaned files on disk #116

What is wrong

Why it matters

Possible designs

Out of scope here

Suggested next step

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Upload session state is not persisted; server restart causes 404 on chunk PUT and orphaned files on disk #116

Description

What is wrong

Why it matters

Possible designs

Out of scope here

Suggested next step

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions