Skip to content

feat(assets): cursor-based pagination on GET /api/assets#14014

Open
mattmillerai wants to merge 14 commits into
masterfrom
matt/be-944-core-cursor-based-pagination-for-get-apiassets
Open

feat(assets): cursor-based pagination on GET /api/assets#14014
mattmillerai wants to merge 14 commits into
masterfrom
matt/be-944-core-cursor-based-pagination-for-get-apiassets

Conversation

@mattmillerai
Copy link
Copy Markdown
Contributor

@mattmillerai mattmillerai commented May 20, 2026

Summary

  • Adds after query parameter and next_cursor response field on GET /api/assets
  • Cursors are opaque to the client; sort direction is baked into the cursor so it can't be misapplied across asc/desc requests

Contract

Cursor format (encoded as base64url, no padding):

JSON{"s": sort_field, "v": value, "id": id, "o": order}

The o field binds the cursor to the sort direction it was minted under — replaying a desc cursor against an asc request fails with INVALID_CURSOR rather than silently walking the wrong direction. o is mandatory on every payload.

Time values are Unix microseconds (UTC) — microsecond precision matches PostgreSQL's timestamp so a stored timestamp round-trips through the cursor exactly, with no risk of dropping rows in the same millisecond bucket.

Sort fields supported: created_at, updated_at, name, size. Passing after with sort=last_access_time returns 400 INVALID_CURSOR.

Scenario Behavior
Cursor sort field ≠ request sort 400 INVALID_CURSOR
Cursor order ≠ request order 400 INVALID_CURSOR
Malformed / oversized cursor 400 INVALID_CURSOR
Last page reached next_cursor omitted, has_more false
Asset deleted between pages Subsequent cursors still valid; skip missing rows

Implementation

  • New app/assets/services/cursor.py — opaque cursor encode/decode with wire-format caps (1024 encoded / 512 value / 128 id bytes; value cap matches the AssetReference.name column)
  • Query layer adds a keyset WHERE clause (field <op> v OR (field = v AND id <op> cursor_id)) and a secondary ORDER BY id for stable tiebreak
  • Handler maps InvalidCursorError → 400 INVALID_CURSOR

Tests

  • Unit tests on the cursor module (tests-unit/assets_test/services/test_cursor.py):
    • Encode/decode round-trip across sort field types (string, int, time, unicode)
    • Microsecond precision preserved through the wire
    • UTC normalization for time values
    • All invalid-cursor branches: oversized cursor, not base64, not JSON, empty/oversized id, oversized value, unsupported sort field, non-dict payload, missing/mismatched order
    • Per-field-cap encoding fits within the wire cap
    • Byte-identity wire-format fixtures so silent drift fails the test
  • Integration tests on the live HTTP endpoint (tests-unit/assets_test/test_list_cursor.py):
    • Full multi-page cursor walk returns items in order with no duplicates
    • Invalid cursor → 400
    • Sort/cursor mismatch → 400
    • next_cursor omitted when no more results
    • Pagination stable across deletes of already-returned rows
  • Existing list-filter tests pass unchanged

OpenAPI

Spec updated in openapi.yamlafter param on /api/assets, next_cursor on ListAssetsResponse.

Add 'after' query param and 'next_cursor' response field for keyset
pagination. Matches the cloud Go implementation (BE-893) so frontend
sees a unified contract across runtimes. Offset/limit remain as a
deprecated fallback.
Port of cloud common/pagination/cursor.go. Wire format is base64url of
{"s", "v", "id"} JSON; times are Unix microseconds UTC to match
PostgreSQL timestamp precision.

Includes a byte-identity fixture pinned against the cloud Go wire
format so cross-runtime FE pagination can't silently drift.
list_assets_page accepts an opaque 'after' cursor and returns
next_cursor when more pages are available. The query applies a keyset
WHERE clause and a secondary ORDER BY id for deterministic tiebreak.

Cursor sort field is validated against the request sort, and a
last_access_time sort (OSS-only) falls back to offset/limit. Offset is
ignored whenever a cursor is supplied.
Adds integration tests for: full cursor walk, invalid-cursor 400,
sort/cursor mismatch 400, cursor-wins-over-offset, absent next_cursor
when no more results, and pagination stability across deletes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d529ea2c-8a79-4263-90f0-741c92f79291

📥 Commits

Reviewing files that changed from the base of the PR and between 37764dc and 2015bbb.

📒 Files selected for processing (2)
  • app/assets/services/cursor.py
  • tests-unit/assets_test/services/test_cursor.py

📝 Walkthrough

Walkthrough

This PR adds opaque keyset cursor pagination to GET /api/assets: a new cursor module implements encode/decode and validation (with InvalidCursorError), request/response schemas gain after/next_cursor, DB queries accept typed cursor inputs and use (sort_col,id) keyset filters, the service layer decodes/validates cursors and mints next-cursors via over-fetch, the API route maps InvalidCursorError to HTTP 400, OpenAPI is updated, and unit/integration tests cover primitives and end-to-end behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(assets): cursor-based pagination on GET /api/assets' clearly and concisely summarizes the main change—adding cursor-based pagination to the assets API endpoint.
Description check ✅ Passed The description provides comprehensive detail about the cursor implementation, contract, integration points, test coverage, and API changes, all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/assets/services/cursor.py`:
- Around line 117-125: In decode_cursor_time, catching only ValueError lets
extreme microsecond inputs bubble up from _unix_micros_to_datetime (which uses
datetime.fromtimestamp) as OSError or OverflowError; modify decode_cursor_time
to wrap the call to _unix_micros_to_datetime in a try/except that catches
OSError and OverflowError (in addition to the existing ValueError handling) and
re-raises InvalidCursorError with a clear message like "value is not a valid
timestamp" including the original exception text so out-of-range or absurdly
large cursor values produce a 400 INVALID_CURSOR instead of a 500.

In `@openapi.yaml`:
- Around line 1520-1523: The OpenAPI parameter "offset" is marked deprecated
only in prose; add a machine-readable deprecation flag by setting deprecated:
true on the "offset" parameter definition in openapi.yaml so generated
docs/clients surface it (keep the existing description and behavior that "after"
wins when both are supplied). Locate the parameter named offset in the same
parameter block that mentions "after" and add the deprecated: true property.

In `@tests-unit/assets_test/test_list_cursor.py`:
- Around line 99-110: The test reads JSON from HTTP responses without first
asserting success; update the requests in test_list_cursor.py that set r =
http.get(...) (the calls querying "/api/assets" with params including
"include_tags" and "limit") to validate the response before parsing: either call
r.raise_for_status() or assert r.status_code == 200 (or expected status)
immediately after each http.get and before r.json(), and apply this same change
to the other similar request sites in the file (the other r = http.get(...)
usages around the cursor/offset tests) so failures surface as HTTP errors
instead of JSON/key errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 22c42e99-003e-4fe5-8742-557ea32d6515

📥 Commits

Reviewing files that changed from the base of the PR and between a8d2519 and df1f6a7.

📒 Files selected for processing (10)
  • app/assets/api/routes.py
  • app/assets/api/schemas_in.py
  • app/assets/api/schemas_out.py
  • app/assets/database/queries/asset_reference.py
  • app/assets/services/asset_management.py
  • app/assets/services/cursor.py
  • app/assets/services/schemas.py
  • openapi.yaml
  • tests-unit/assets_test/services/test_cursor.py
  • tests-unit/assets_test/test_list_cursor.py

Comment thread app/assets/services/cursor.py Outdated
Comment thread openapi.yaml
Comment thread tests-unit/assets_test/test_list_cursor.py
- Mint next_cursor on every cursor-supported sort, not only when 'after'
  was supplied. A first request (no 'after') previously returned
  next_cursor=None, leaving cursor mode unreachable from a clean start.
- Over-fetch limit+1 so an exactly-full terminal page doesn't mint a
  spurious cursor pointing at a phantom next page.
- Map crafted out-of-range microsecond cursors (OverflowError / OSError
  in datetime construction) to 400 INVALID_CURSOR instead of leaking 500.
- Bump MAX_CURSOR_VALUE_LENGTH 256 -> 512 to match the AssetReference
  name column max; without this, a long-named asset minted a cursor the
  same server then refused on the next request. Cross-runtime byte
  identity with cloud is unaffected because no cloud cursor ever carries
  a value > 256 (cloud schema doesn't permit it).
- Return None from _encode_next_cursor when the boundary row carries a
  NULL sort value (e.g. an Asset without size_bytes backfilled), instead
  of silently encoding 0 and mis-positioning the keyset.
- Fix schemas_in.py comment so it matches actual handler behavior
  (last_access_time + 'after' raises 400, does not fall back).
- Add AssetsApiError schema + 400 response to GET /api/assets in
  openapi.yaml so generated clients know the INVALID_CURSOR envelope.
- Extend integration coverage: first-page mint, exact-multiple terminal
  page, cursor walks for created_at/updated_at/size sorts, datetime
  overflow surfaces as 400 not 500.
- Add unit coverage for datetime overflow and 512-char round-trip.
Address three needs-judgment items from the cursor-review judge synthesis:

1. Cursor wire format now includes an "o" key carrying the sort
   direction ("asc" / "desc") it was minted under. A request that
   replays the cursor with a flipped `order` parameter is rejected
   with 400 INVALID_CURSOR instead of silently walking the wrong
   direction. Legacy cursors without "o" still decode (the binding
   is best-effort until cloud mirrors the field — follow-up filed
   separately).

2. JSON serialization now escapes `<`, `>`, `&`, U+2028, U+2029
   to mirror Go's default `json.Marshal` behavior. Without this, an
   asset name containing those characters produced different bytes on
   Python vs cloud Go. The escaped form is what both runtimes emit.

3. Add direct query-layer tests for the keyset tiebreaker — the secondary
   ORDER BY id branch was previously unexercised. Two scenarios: all
   rows share a primary sort value, and mixed ties straddle page
   boundaries. Both assert no row is dropped or duplicated across the
   walk.

Wire-format note: Python cursors now differ from current cloud cursors
by exactly the "o" key. Cloud follow-up will bring the two back into
byte alignment.
- Soften offset param prose: it's not deprecated, just not preferred for
  sequential walks. Random-access UIs (jump-to-page, item count displays)
  legitimately still want offset, so dropping the 'deprecated' framing
  rather than promoting it to a machine-readable deprecated:true flag.
- Add explicit HTTP status assertions before every json() / next_cursor
  read in test_list_cursor.py so a failing request surfaces as an HTTP
  error instead of a confusing KeyError on a 4xx/5xx body.
Cursor pagination hasn't shipped on either runtime yet — this PR is
still draft and cloud's mirror is just behind it — so there are no
legacy no-o cursors in the wild. Make o mandatory from day one
rather than landing permissive and tightening later.

decode_cursor now rejects any payload without o (or with a non-string
o) as malformed. CursorPayload.order becomes a required str. Tests
that constructed CursorPayload directly now pass order="desc";
test_legacy_cursor_without_order_accepted flips to
test_cursor_without_order_rejected.
Strip prose references to sibling Go implementations and external
ticket IDs from cursor.py, the cursor tests, the keyset integration
tests, asset_management's sort-field comment, and the legacy
prompt_id alias comment. Pure docstring/comment scrub — no behavior
or wire-format changes. x-runtime: [cloud] field annotations in
openapi.yaml are unchanged; those are the spec's structural
cross-runtime convention, not internal references.
@mattmillerai mattmillerai marked this pull request as ready for review May 21, 2026 20:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/assets/api/routes.py (1)

226-230: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep INVALID_CURSOR on the standard error envelope.

This new branch bypasses _build_error_response, so INVALID_CURSOR is returned without the usual details: {} field. That introduces a response-shape inconsistency for clients consuming the shared error format.

Suggested fix
     except InvalidCursorError as e:
-        return web.json_response(
-            {"error": {"code": "INVALID_CURSOR", "message": str(e)}},
-            status=400,
-        )
+        return _build_error_response(400, "INVALID_CURSOR", str(e))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/assets/api/routes.py` around lines 226 - 230, The except
InvalidCursorError branch returns a custom dict and skips the shared envelope;
update the handler to use the existing _build_error_response helper so the
response keeps the standard shape (including details: {}). Replace the manual
return in the InvalidCursorError except block with a call to
_build_error_response passing the error payload (code "INVALID_CURSOR" and
message str(e)) and return that via web.json_response with status=400 so
InvalidCursorError and _build_error_response are used to build the consistent
error envelope.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests-unit/assets_test/services/test_cursor.py`:
- Around line 286-294: The test test_encoded_payload_shape_pinned currently only
checks for substring presence and should instead assert exact wire bytes to pin
byte identity; replace the substring assertions in that test with a single
equality check comparing decoded_bytes to the canonical expected bytes provided
by the TestByteIdentityFixtures (or a literal expected bytes sequence
constructed with the documented key order and exact serialization) so the test
fails on any change in key order, quoting or whitespace; use
encode_cursor/decoded_bytes as-is and assert decoded_bytes == expected_bytes
(and update the fixture to expose expected_bytes if needed).

In `@tests-unit/assets_test/test_list_cursor.py`:
- Around line 279-285: The test builds a cursor payload (payload / cursor) that
omits the required "o" (order) key so the request may fail before reaching the
microsecond-overflow decode branch; update the payload to include "o" with a
value matching the request param "sort" (e.g., "asc" or "desc") and ensure the
http.get params include the same order value so the request exercises the
intended decode path rather than an earlier missing-key failure.

---

Outside diff comments:
In `@app/assets/api/routes.py`:
- Around line 226-230: The except InvalidCursorError branch returns a custom
dict and skips the shared envelope; update the handler to use the existing
_build_error_response helper so the response keeps the standard shape (including
details: {}). Replace the manual return in the InvalidCursorError except block
with a call to _build_error_response passing the error payload (code
"INVALID_CURSOR" and message str(e)) and return that via web.json_response with
status=400 so InvalidCursorError and _build_error_response are used to build the
consistent error envelope.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1c513ea2-61ff-49aa-b912-70a1d0451bcc

📥 Commits

Reviewing files that changed from the base of the PR and between df1f6a7 and e4508af.

📒 Files selected for processing (8)
  • app/assets/api/routes.py
  • app/assets/api/schemas_in.py
  • app/assets/services/asset_management.py
  • app/assets/services/cursor.py
  • openapi.yaml
  • tests-unit/assets_test/queries/test_asset_reference_keyset.py
  • tests-unit/assets_test/services/test_cursor.py
  • tests-unit/assets_test/test_list_cursor.py

Comment thread tests-unit/assets_test/services/test_cursor.py Outdated
Comment thread tests-unit/assets_test/test_list_cursor.py Outdated
The boundary test was building a cursor without the required `o` key, so
decode failed on the missing-order branch before reaching the µs-overflow
path the test is asserting. Both paths return 400 INVALID_CURSOR so the
assertion passed for the wrong reason. Add `o` to the payload and matching
`order=` to the request so the decode reaches the intended branch.
Six fact-checked findings from the multi-model review pass:

- Encoder/decoder length asymmetry: encode_cursor now rejects empty id,
  oversized id (>128), oversized value (>512), and invalid order tokens
  symmetrically with decode_cursor. Prevents the same server from minting
  a cursor it then 400s on the next request (e.g. a filesystem-scanned
  asset name >512 chars). The bad-order path now raises InvalidCursorError
  (still subclasses ValueError) so route-layer handling stays uniform.
- Raw U+2028/U+2029 in cursor.py source: ripgrep treated those lines as
  line-terminators, confirming the bytes were the actual separators. Any
  editor save / autoformat / git tooling that normalizes invisibles would
  silently break the encoder. Replaced with explicit 
 / 

  Python escape sequences.
- set(seen) == set(names) hid ordering regressions: a cursor walk that
  dropped a row at a page boundary or returned duplicates could pass.
  Reworked the assertion to (1) reject duplicates, (2) require full
  coverage, and (3) assert strict positional order for size sort, the
  only field with a clock-independent ordering.
- Flaky time.sleep(0.05) between inserts: Windows CI clock resolution is
  ~15ms, so back-to-back inserts under load could collide and exercise
  the tiebreaker instead of the documented path. Removed the sleep and
  let the strengthened assertion above carry coverage / no-duplicates,
  with size sort carrying strict order.
- Cursor error envelope diverged from the rest of routes.py: cursor 400s
  emitted {error: {code, message}} while every other 400 in the file
  emits {error: {code, message, details}} via _build_error_response.
  Switched to _build_error_response and added the details field to the
  AssetsApiError schema in openapi.yaml.
- "Byte-identity fixtures" only checked substring containment, defeating
  the test class's stated purpose of pinning the wire format. Switched
  to exact-bytes equality against an inline expected payload string per
  fixture, so any whitespace / key-order / escape drift fails loudly.

Also dropped Go / json.Marshal references from docstrings — the byte
format is the contract, not the runtime that mints it.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests-unit/assets_test/services/test_cursor.py (1)

263-270: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use escape sequences instead of literal U+2028/U+2029 in these fixture values.

These literals reintroduce the invisible-source-character problem the production code just avoided. Writing them as "\u2028" / "\u2029" keeps the test stable across editors and VCS tooling.

🧪 Suggested change
-            ("foo
bar.png", "\\u2028"),  # JS line separator
-            ("foo
bar.png", "\\u2029"),  # JS paragraph separator
+            ("foo\u2028bar.png", "\\u2028"),  # JS line separator
+            ("foo\u2029bar.png", "\\u2029"),  # JS paragraph separator
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests-unit/assets_test/services/test_cursor.py` around lines 263 - 270,
Replace the literal U+2028 and U+2029 characters in the pytest parametrize
fixture for "value, escaped_substring" with explicit escape sequences;
specifically, update the entries currently written as ("foo
bar.png", "\\u2028")
and ("foo
bar.png", "\\u2029") to use the string escape forms "\u2028" and
"\u2029" (e.g., "foo\\u2028bar.png") so the test_cursor parametrized data
contains visible escapes rather than invisible Unicode characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/assets/services/cursor.py`:
- Around line 77-95: The current check uses character counts
(len(value)/len(id)) against MAX_CURSOR_VALUE_LENGTH but the wire cap
MAX_ENCODED_CURSOR_LENGTH applies to the final serialized UTF-8 +
escape-expanded + base64 payload, so compute the serialized form first (the
payload/raw -> escaped raw -> base64 urlsafe bytes as in this file) and then
enforce MAX_ENCODED_CURSOR_LENGTH on the final encoded ASCII string (the value
returned by base64.urlsafe_b64encode(...).rstrip(b"=").decode("ascii")), raising
InvalidCursorError if it exceeds the encoded cap; keep or tighten the existing
character-length guard (MAX_CURSOR_VALUE_LENGTH) if desired, and add a
regression test that generates a cursor with many multi-byte and escapable
characters (e.g., repeated "<" and non-ASCII like "é") to ensure the encoder
accepts/declines appropriately.

---

Outside diff comments:
In `@tests-unit/assets_test/services/test_cursor.py`:
- Around line 263-270: Replace the literal U+2028 and U+2029 characters in the
pytest parametrize fixture for "value, escaped_substring" with explicit escape
sequences; specifically, update the entries currently written as ("foo
bar.png",
"\\u2028") and ("foo
bar.png", "\\u2029") to use the string escape forms
"\u2028" and "\u2029" (e.g., "foo\\u2028bar.png") so the test_cursor
parametrized data contains visible escapes rather than invisible Unicode
characters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca75bf61-4ef9-4fce-9c75-6e9c557fab60

📥 Commits

Reviewing files that changed from the base of the PR and between cc62f2a and 37764dc.

📒 Files selected for processing (5)
  • app/assets/api/routes.py
  • app/assets/services/cursor.py
  • openapi.yaml
  • tests-unit/assets_test/services/test_cursor.py
  • tests-unit/assets_test/test_list_cursor.py

Comment thread app/assets/services/cursor.py Outdated
Char-count guards on value/id can still let multibyte or escape-heavy
inputs blow past MAX_ENCODED_CURSOR_LENGTH once UTF-8 + escape expansion
+ base64url runs. A 512-character name of 'é' (2 bytes UTF-8) or '<'
(serializes to the 6-byte '<' escape) passes the char check, mints
a ~1500-byte cursor, then 400s when handed back on the next request.

Compute the final encoded form and reject it before returning if it
exceeds the wire cap. Adds regression tests for both inflation paths.
@mattmillerai mattmillerai added the Core Core team dependency label May 22, 2026
Comment on lines +81 to +94
# Match the default JSON escaping of HTML-significant characters and JS
# line/paragraph separators (U+2028 / U+2029) so an asset name carrying
# any of them encodes to identical bytes across runtimes. None of these
# characters appear in JSON structural syntax, so a global replace on the
# serialized output can only touch encoded values. Use explicit \uXXXX
# escapes for U+2028 / U+2029 so the source survives any editor / git
# tooling that normalizes invisible separators.
raw = (
raw.replace("<", "\\u003c")
.replace(">", "\\u003e")
.replace("&", "\\u0026")
.replace("\u2028", "\\u2028")
.replace("\u2029", "\\u2029")
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a function we can use to handle this instead of hand-duplicating the replacement?

Comment on lines +96 to +103
# Final wire-size guard: the per-field caps above are char-counted, but the
# wire cap applies to the base64url of the UTF-8-encoded, escape-expanded
# payload. A value full of multibyte or HTML-significant characters (e.g.
# 512 \u00d7 "\u00e9" or 512 \u00d7 "<") inflates well past MAX_ENCODED_CURSOR_LENGTH even
# though it passes the char-count check. Refuse to mint a cursor the decoder
# on the next request would reject.
if len(encoded) > MAX_ENCODED_CURSOR_LENGTH:
raise InvalidCursorError("encoded cursor exceeds maximum length")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just set the MAX_ENCODED_CURSOR_LENGTH to be large enough to not need to worry about this? This seems like it would be a confusing failure condition for users if it ever happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Core team dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants