feat: streaming SQL/CSV/JSON dumps for large DBs (#59)#142
Open
brone1323 wants to merge 1 commit intoouterbase:mainfrom
Open
feat: streaming SQL/CSV/JSON dumps for large DBs (#59)#142brone1323 wants to merge 1 commit intoouterbase:mainfrom
brone1323 wants to merge 1 commit intoouterbase:mainfrom
Conversation
Previously the three /export endpoints loaded the entire result set into a JS array, built the full payload as a single string, and wrapped it in a Blob before responding. With a 1-10 GB SQLite DB this OOMs the DO and blocks the event loop long enough for Cloudflare to evict it. This change: - Replaces buffered reads with paged LIMIT/OFFSET iteration over each table (1000 rows/page) so peak memory stays bounded regardless of DB size. - Yields back to the runtime between pages via scheduler.wait(0) (with a setTimeout(0) fallback) so the DO isolate can service alarms, websockets, and storage I/O instead of being killed for hogging the event loop. - Streams responses as ReadableStream<Uint8Array> so bytes flow to the client as soon as they are encoded. - Preserves existing on-the-wire formats: SQL escaping, CSV quoting, and JSON structure are byte-compatible with the previous output for small inputs (so existing client code keeps working). Closes outerbase#59
Author
Demo videoGenerated programmatically (HTML deck → Playwright recordVideo → ffmpeg). Hosted as a release asset on the fork to comply with the Algora claim requirement. Sammy / @brone1323 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause
All three
/exportendpoints (/export/dump,/export/json/:tableName,/export/csv/:tableName) shared the same shape:await executeOperation('SELECT * FROM <table>')— the entire result set materialised as a JS array.dumpContent += ...,csvContent += ..., orJSON.stringify(allRows)).Bloband hand it tonew Response(blob).For a 1-10 GB SQLite database this fails three different ways at once:
Design
The refactor is intentionally narrow — three endpoints, one shared helper, no public API changes.
src/export/streaming.tsis the new shared layer:iterateTableRows(table, ds, cfg, pageSize=1000)is an async generator that pages through a single table with parameterisedLIMIT ? OFFSET ?. Peak memory ispageSizerows, regardless of total table size.breathe()callsscheduler.wait(0)between pages — that's the canonical Cloudflare hook for cooperative yields. Falls back tosetTimeout(0)outside the Workers runtime so vitest and node also work. This is the "breathing intervals" the issue calls out: it lets the DO service alarms, websockets, and storage I/O instead of being preempted for hogging the loop.chunksToStream(generator)adapts anAsyncGenerator<string>into aReadableStream<Uint8Array>ready to hand tonew Response(...). Errors thrown inside the generator propagate throughcontroller.errorso the client sees a truncated body — the right signal mid-export, since headers are committed before any I/O runs.streamingResponse(stream, fileName, contentType)sets the standard download headers plusCache-Control: no-storeandTransfer-Encoding: chunkedto discourage intermediaries from buffering.src/export/dump.tsrewrites the SQL dump asdumpChunks()— a generator that yields the SQLite header, then for each table yields its schema followed byINSERT INTO t VALUES (...);per row. Single-quote escaping is preserved byte-for-byte;NULLis now emitted properly for null/undefined columns (the previous version emitted the literal textundefined, which produced an invalid SQL dump).src/export/csv.tsandsrc/export/json.tsfollow the same pattern. The JSON encoder hand-assembles the array brackets and inter-row commas so it never callsJSON.stringifyon more than one row at a time, while still producing a valid JSON document. The 404-on-missing-table behaviour is preserved by running an existence check up front (synchronous response with JSON body), before any streaming starts.What is not in this PR
executeOperation, the operation queue, RLS, allowlist, auth, or the import side. This is a discrete refactor insidesrc/export/.getTableData/createExportResponsehelpers. They had two callers (the JSON and CSV routes) and zero external users in this repo. Keeping dead exports just to delay the cleanup felt worse than removing them now; happy to add them back if there are downstream consumers I missed.DEFAULT_PAGE_SIZEis hard-coded at 1000. Tunable via a query param would be a reasonable follow-up but isn't required to fix the bug.Tested
pnpm test— all touched test files pass:src/export/dump.test.ts(7 tests) — schema + row emission, NULL handling, single-quote escaping, paged reads use bound params, mid-stream errors propagate viaReadableStreamDefaultReader.read().src/export/csv.test.ts(6 tests) — header row, RFC-4180 quoting, 404, 500-on-existence-check-failure, paged reads.src/export/json.test.ts(5 tests) — JSON.parse round-trip on streamed body,[]for empty tables, special-character escaping, 404, 500.src/export/streaming.test.ts(7 tests, new) — single-page short-circuit, multi-page offset advancement,scheduler.waitpreference vssetTimeoutfallback, generator errors propagate to the stream.src/export/index.test.ts(2 tests) —executeOperationshape preserved.Total: 41 export+import tests pass. The 4 pre-existing
src/rls/index.test.tsfailures onmainare unrelated to this change.What I could not test
ReadableStream, paged queries use bound LIMIT/OFFSET, scheduler.wait is invoked when present), but a multi-GB DO export is the kind of thing that wants a real wrangler smoke test before merge.Closes #59
/claim #59