feat(assets): cursor-based pagination on GET /api/assets#14014
feat(assets): cursor-based pagination on GET /api/assets#14014mattmillerai wants to merge 14 commits into
Conversation
Add 'after' query param and 'next_cursor' response field for keyset pagination. Matches the cloud Go implementation (BE-893) so frontend sees a unified contract across runtimes. Offset/limit remain as a deprecated fallback.
Port of cloud common/pagination/cursor.go. Wire format is base64url of
{"s", "v", "id"} JSON; times are Unix microseconds UTC to match
PostgreSQL timestamp precision.
Includes a byte-identity fixture pinned against the cloud Go wire
format so cross-runtime FE pagination can't silently drift.
list_assets_page accepts an opaque 'after' cursor and returns next_cursor when more pages are available. The query applies a keyset WHERE clause and a secondary ORDER BY id for deterministic tiebreak. Cursor sort field is validated against the request sort, and a last_access_time sort (OSS-only) falls back to offset/limit. Offset is ignored whenever a cursor is supplied.
Adds integration tests for: full cursor walk, invalid-cursor 400, sort/cursor mismatch 400, cursor-wins-over-offset, absent next_cursor when no more results, and pagination stability across deletes.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds opaque keyset cursor pagination to GET /api/assets: a new cursor module implements encode/decode and validation (with InvalidCursorError), request/response schemas gain 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/assets/services/cursor.py`:
- Around line 117-125: In decode_cursor_time, catching only ValueError lets
extreme microsecond inputs bubble up from _unix_micros_to_datetime (which uses
datetime.fromtimestamp) as OSError or OverflowError; modify decode_cursor_time
to wrap the call to _unix_micros_to_datetime in a try/except that catches
OSError and OverflowError (in addition to the existing ValueError handling) and
re-raises InvalidCursorError with a clear message like "value is not a valid
timestamp" including the original exception text so out-of-range or absurdly
large cursor values produce a 400 INVALID_CURSOR instead of a 500.
In `@openapi.yaml`:
- Around line 1520-1523: The OpenAPI parameter "offset" is marked deprecated
only in prose; add a machine-readable deprecation flag by setting deprecated:
true on the "offset" parameter definition in openapi.yaml so generated
docs/clients surface it (keep the existing description and behavior that "after"
wins when both are supplied). Locate the parameter named offset in the same
parameter block that mentions "after" and add the deprecated: true property.
In `@tests-unit/assets_test/test_list_cursor.py`:
- Around line 99-110: The test reads JSON from HTTP responses without first
asserting success; update the requests in test_list_cursor.py that set r =
http.get(...) (the calls querying "/api/assets" with params including
"include_tags" and "limit") to validate the response before parsing: either call
r.raise_for_status() or assert r.status_code == 200 (or expected status)
immediately after each http.get and before r.json(), and apply this same change
to the other similar request sites in the file (the other r = http.get(...)
usages around the cursor/offset tests) so failures surface as HTTP errors
instead of JSON/key errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 22c42e99-003e-4fe5-8742-557ea32d6515
📒 Files selected for processing (10)
app/assets/api/routes.pyapp/assets/api/schemas_in.pyapp/assets/api/schemas_out.pyapp/assets/database/queries/asset_reference.pyapp/assets/services/asset_management.pyapp/assets/services/cursor.pyapp/assets/services/schemas.pyopenapi.yamltests-unit/assets_test/services/test_cursor.pytests-unit/assets_test/test_list_cursor.py
- Mint next_cursor on every cursor-supported sort, not only when 'after' was supplied. A first request (no 'after') previously returned next_cursor=None, leaving cursor mode unreachable from a clean start. - Over-fetch limit+1 so an exactly-full terminal page doesn't mint a spurious cursor pointing at a phantom next page. - Map crafted out-of-range microsecond cursors (OverflowError / OSError in datetime construction) to 400 INVALID_CURSOR instead of leaking 500. - Bump MAX_CURSOR_VALUE_LENGTH 256 -> 512 to match the AssetReference name column max; without this, a long-named asset minted a cursor the same server then refused on the next request. Cross-runtime byte identity with cloud is unaffected because no cloud cursor ever carries a value > 256 (cloud schema doesn't permit it). - Return None from _encode_next_cursor when the boundary row carries a NULL sort value (e.g. an Asset without size_bytes backfilled), instead of silently encoding 0 and mis-positioning the keyset. - Fix schemas_in.py comment so it matches actual handler behavior (last_access_time + 'after' raises 400, does not fall back). - Add AssetsApiError schema + 400 response to GET /api/assets in openapi.yaml so generated clients know the INVALID_CURSOR envelope. - Extend integration coverage: first-page mint, exact-multiple terminal page, cursor walks for created_at/updated_at/size sorts, datetime overflow surfaces as 400 not 500. - Add unit coverage for datetime overflow and 512-char round-trip.
Address three needs-judgment items from the cursor-review judge synthesis:
1. Cursor wire format now includes an "o" key carrying the sort
direction ("asc" / "desc") it was minted under. A request that
replays the cursor with a flipped `order` parameter is rejected
with 400 INVALID_CURSOR instead of silently walking the wrong
direction. Legacy cursors without "o" still decode (the binding
is best-effort until cloud mirrors the field — follow-up filed
separately).
2. JSON serialization now escapes `<`, `>`, `&`, U+2028, U+2029
to mirror Go's default `json.Marshal` behavior. Without this, an
asset name containing those characters produced different bytes on
Python vs cloud Go. The escaped form is what both runtimes emit.
3. Add direct query-layer tests for the keyset tiebreaker — the secondary
ORDER BY id branch was previously unexercised. Two scenarios: all
rows share a primary sort value, and mixed ties straddle page
boundaries. Both assert no row is dropped or duplicated across the
walk.
Wire-format note: Python cursors now differ from current cloud cursors
by exactly the "o" key. Cloud follow-up will bring the two back into
byte alignment.
- Soften offset param prose: it's not deprecated, just not preferred for sequential walks. Random-access UIs (jump-to-page, item count displays) legitimately still want offset, so dropping the 'deprecated' framing rather than promoting it to a machine-readable deprecated:true flag. - Add explicit HTTP status assertions before every json() / next_cursor read in test_list_cursor.py so a failing request surfaces as an HTTP error instead of a confusing KeyError on a 4xx/5xx body.
Cursor pagination hasn't shipped on either runtime yet — this PR is still draft and cloud's mirror is just behind it — so there are no legacy no-o cursors in the wild. Make o mandatory from day one rather than landing permissive and tightening later. decode_cursor now rejects any payload without o (or with a non-string o) as malformed. CursorPayload.order becomes a required str. Tests that constructed CursorPayload directly now pass order="desc"; test_legacy_cursor_without_order_accepted flips to test_cursor_without_order_rejected.
Strip prose references to sibling Go implementations and external ticket IDs from cursor.py, the cursor tests, the keyset integration tests, asset_management's sort-field comment, and the legacy prompt_id alias comment. Pure docstring/comment scrub — no behavior or wire-format changes. x-runtime: [cloud] field annotations in openapi.yaml are unchanged; those are the spec's structural cross-runtime convention, not internal references.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/assets/api/routes.py (1)
226-230:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep
INVALID_CURSORon the standard error envelope.This new branch bypasses
_build_error_response, soINVALID_CURSORis returned without the usualdetails: {}field. That introduces a response-shape inconsistency for clients consuming the shared error format.Suggested fix
except InvalidCursorError as e: - return web.json_response( - {"error": {"code": "INVALID_CURSOR", "message": str(e)}}, - status=400, - ) + return _build_error_response(400, "INVALID_CURSOR", str(e))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/assets/api/routes.py` around lines 226 - 230, The except InvalidCursorError branch returns a custom dict and skips the shared envelope; update the handler to use the existing _build_error_response helper so the response keeps the standard shape (including details: {}). Replace the manual return in the InvalidCursorError except block with a call to _build_error_response passing the error payload (code "INVALID_CURSOR" and message str(e)) and return that via web.json_response with status=400 so InvalidCursorError and _build_error_response are used to build the consistent error envelope.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests-unit/assets_test/services/test_cursor.py`:
- Around line 286-294: The test test_encoded_payload_shape_pinned currently only
checks for substring presence and should instead assert exact wire bytes to pin
byte identity; replace the substring assertions in that test with a single
equality check comparing decoded_bytes to the canonical expected bytes provided
by the TestByteIdentityFixtures (or a literal expected bytes sequence
constructed with the documented key order and exact serialization) so the test
fails on any change in key order, quoting or whitespace; use
encode_cursor/decoded_bytes as-is and assert decoded_bytes == expected_bytes
(and update the fixture to expose expected_bytes if needed).
In `@tests-unit/assets_test/test_list_cursor.py`:
- Around line 279-285: The test builds a cursor payload (payload / cursor) that
omits the required "o" (order) key so the request may fail before reaching the
microsecond-overflow decode branch; update the payload to include "o" with a
value matching the request param "sort" (e.g., "asc" or "desc") and ensure the
http.get params include the same order value so the request exercises the
intended decode path rather than an earlier missing-key failure.
---
Outside diff comments:
In `@app/assets/api/routes.py`:
- Around line 226-230: The except InvalidCursorError branch returns a custom
dict and skips the shared envelope; update the handler to use the existing
_build_error_response helper so the response keeps the standard shape (including
details: {}). Replace the manual return in the InvalidCursorError except block
with a call to _build_error_response passing the error payload (code
"INVALID_CURSOR" and message str(e)) and return that via web.json_response with
status=400 so InvalidCursorError and _build_error_response are used to build the
consistent error envelope.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1c513ea2-61ff-49aa-b912-70a1d0451bcc
📒 Files selected for processing (8)
app/assets/api/routes.pyapp/assets/api/schemas_in.pyapp/assets/services/asset_management.pyapp/assets/services/cursor.pyopenapi.yamltests-unit/assets_test/queries/test_asset_reference_keyset.pytests-unit/assets_test/services/test_cursor.pytests-unit/assets_test/test_list_cursor.py
The boundary test was building a cursor without the required `o` key, so decode failed on the missing-order branch before reaching the µs-overflow path the test is asserting. Both paths return 400 INVALID_CURSOR so the assertion passed for the wrong reason. Add `o` to the payload and matching `order=` to the request so the decode reaches the intended branch.
Six fact-checked findings from the multi-model review pass:
- Encoder/decoder length asymmetry: encode_cursor now rejects empty id,
oversized id (>128), oversized value (>512), and invalid order tokens
symmetrically with decode_cursor. Prevents the same server from minting
a cursor it then 400s on the next request (e.g. a filesystem-scanned
asset name >512 chars). The bad-order path now raises InvalidCursorError
(still subclasses ValueError) so route-layer handling stays uniform.
- Raw U+2028/U+2029 in cursor.py source: ripgrep treated those lines as
line-terminators, confirming the bytes were the actual separators. Any
editor save / autoformat / git tooling that normalizes invisibles would
silently break the encoder. Replaced with explicit
/
Python escape sequences.
- set(seen) == set(names) hid ordering regressions: a cursor walk that
dropped a row at a page boundary or returned duplicates could pass.
Reworked the assertion to (1) reject duplicates, (2) require full
coverage, and (3) assert strict positional order for size sort, the
only field with a clock-independent ordering.
- Flaky time.sleep(0.05) between inserts: Windows CI clock resolution is
~15ms, so back-to-back inserts under load could collide and exercise
the tiebreaker instead of the documented path. Removed the sleep and
let the strengthened assertion above carry coverage / no-duplicates,
with size sort carrying strict order.
- Cursor error envelope diverged from the rest of routes.py: cursor 400s
emitted {error: {code, message}} while every other 400 in the file
emits {error: {code, message, details}} via _build_error_response.
Switched to _build_error_response and added the details field to the
AssetsApiError schema in openapi.yaml.
- "Byte-identity fixtures" only checked substring containment, defeating
the test class's stated purpose of pinning the wire format. Switched
to exact-bytes equality against an inline expected payload string per
fixture, so any whitespace / key-order / escape drift fails loudly.
Also dropped Go / json.Marshal references from docstrings — the byte
format is the contract, not the runtime that mints it.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests-unit/assets_test/services/test_cursor.py (1)
263-270:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse escape sequences instead of literal U+2028/U+2029 in these fixture values.
These literals reintroduce the invisible-source-character problem the production code just avoided. Writing them as
"\u2028"/"\u2029"keeps the test stable across editors and VCS tooling.🧪 Suggested change
- ("foo bar.png", "\\u2028"), # JS line separator - ("foo bar.png", "\\u2029"), # JS paragraph separator + ("foo\u2028bar.png", "\\u2028"), # JS line separator + ("foo\u2029bar.png", "\\u2029"), # JS paragraph separator🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests-unit/assets_test/services/test_cursor.py` around lines 263 - 270, Replace the literal U+2028 and U+2029 characters in the pytest parametrize fixture for "value, escaped_substring" with explicit escape sequences; specifically, update the entries currently written as ("foo bar.png", "\\u2028") and ("foo bar.png", "\\u2029") to use the string escape forms "\u2028" and "\u2029" (e.g., "foo\\u2028bar.png") so the test_cursor parametrized data contains visible escapes rather than invisible Unicode characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/assets/services/cursor.py`:
- Around line 77-95: The current check uses character counts
(len(value)/len(id)) against MAX_CURSOR_VALUE_LENGTH but the wire cap
MAX_ENCODED_CURSOR_LENGTH applies to the final serialized UTF-8 +
escape-expanded + base64 payload, so compute the serialized form first (the
payload/raw -> escaped raw -> base64 urlsafe bytes as in this file) and then
enforce MAX_ENCODED_CURSOR_LENGTH on the final encoded ASCII string (the value
returned by base64.urlsafe_b64encode(...).rstrip(b"=").decode("ascii")), raising
InvalidCursorError if it exceeds the encoded cap; keep or tighten the existing
character-length guard (MAX_CURSOR_VALUE_LENGTH) if desired, and add a
regression test that generates a cursor with many multi-byte and escapable
characters (e.g., repeated "<" and non-ASCII like "é") to ensure the encoder
accepts/declines appropriately.
---
Outside diff comments:
In `@tests-unit/assets_test/services/test_cursor.py`:
- Around line 263-270: Replace the literal U+2028 and U+2029 characters in the
pytest parametrize fixture for "value, escaped_substring" with explicit escape
sequences; specifically, update the entries currently written as ("foo
bar.png",
"\\u2028") and ("foo
bar.png", "\\u2029") to use the string escape forms
"\u2028" and "\u2029" (e.g., "foo\\u2028bar.png") so the test_cursor
parametrized data contains visible escapes rather than invisible Unicode
characters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ca75bf61-4ef9-4fce-9c75-6e9c557fab60
📒 Files selected for processing (5)
app/assets/api/routes.pyapp/assets/services/cursor.pyopenapi.yamltests-unit/assets_test/services/test_cursor.pytests-unit/assets_test/test_list_cursor.py
Char-count guards on value/id can still let multibyte or escape-heavy inputs blow past MAX_ENCODED_CURSOR_LENGTH once UTF-8 + escape expansion + base64url runs. A 512-character name of 'é' (2 bytes UTF-8) or '<' (serializes to the 6-byte '<' escape) passes the char check, mints a ~1500-byte cursor, then 400s when handed back on the next request. Compute the final encoded form and reject it before returning if it exceeds the wire cap. Adds regression tests for both inflation paths.
| # Match the default JSON escaping of HTML-significant characters and JS | ||
| # line/paragraph separators (U+2028 / U+2029) so an asset name carrying | ||
| # any of them encodes to identical bytes across runtimes. None of these | ||
| # characters appear in JSON structural syntax, so a global replace on the | ||
| # serialized output can only touch encoded values. Use explicit \uXXXX | ||
| # escapes for U+2028 / U+2029 so the source survives any editor / git | ||
| # tooling that normalizes invisible separators. | ||
| raw = ( | ||
| raw.replace("<", "\\u003c") | ||
| .replace(">", "\\u003e") | ||
| .replace("&", "\\u0026") | ||
| .replace("\u2028", "\\u2028") | ||
| .replace("\u2029", "\\u2029") | ||
| ) |
There was a problem hiding this comment.
Is there not a function we can use to handle this instead of hand-duplicating the replacement?
| # Final wire-size guard: the per-field caps above are char-counted, but the | ||
| # wire cap applies to the base64url of the UTF-8-encoded, escape-expanded | ||
| # payload. A value full of multibyte or HTML-significant characters (e.g. | ||
| # 512 \u00d7 "\u00e9" or 512 \u00d7 "<") inflates well past MAX_ENCODED_CURSOR_LENGTH even | ||
| # though it passes the char-count check. Refuse to mint a cursor the decoder | ||
| # on the next request would reject. | ||
| if len(encoded) > MAX_ENCODED_CURSOR_LENGTH: | ||
| raise InvalidCursorError("encoded cursor exceeds maximum length") |
There was a problem hiding this comment.
Can we just set the MAX_ENCODED_CURSOR_LENGTH to be large enough to not need to worry about this? This seems like it would be a confusing failure condition for users if it ever happened.
Summary
afterquery parameter andnext_cursorresponse field onGET /api/assetsasc/descrequestsContract
Cursor format (encoded as base64url, no padding):
The
ofield binds the cursor to the sort direction it was minted under — replaying adesccursor against anascrequest fails withINVALID_CURSORrather than silently walking the wrong direction.ois mandatory on every payload.Time values are Unix microseconds (UTC) — microsecond precision matches PostgreSQL's
timestampso a stored timestamp round-trips through the cursor exactly, with no risk of dropping rows in the same millisecond bucket.Sort fields supported:
created_at,updated_at,name,size. Passingafterwithsort=last_access_timereturns 400INVALID_CURSOR.INVALID_CURSORINVALID_CURSORINVALID_CURSORnext_cursoromitted,has_morefalseImplementation
app/assets/services/cursor.py— opaque cursor encode/decode with wire-format caps (1024 encoded / 512 value / 128 id bytes; value cap matches theAssetReference.namecolumn)field <op> v OR (field = v AND id <op> cursor_id)) and a secondaryORDER BY idfor stable tiebreakInvalidCursorError→ 400INVALID_CURSORTests
tests-unit/assets_test/services/test_cursor.py):tests-unit/assets_test/test_list_cursor.py):next_cursoromitted when no more resultsOpenAPI
Spec updated in
openapi.yaml—afterparam on/api/assets,next_cursoronListAssetsResponse.