fix(eap): speed up GetTrace endpoint pagination #7614

xurui-c · 2026-01-05T22:28:05Z

Product team reports pagination to be slow, and we think making the order by clause of the query match the sort key of the CH table will help (consequently we change the pagination to match the order by). This is related to the pre-existing bug uncovered as part of #7586

We will be measuring the speed up by using our internal DD metrics

rollout plan:

monitor error count via DD: https://app.datadoghq.com/s/FH6-Y3/vet-mu7-eq2
monitor sentry issues
if things go wrong, click revert

kylemumma · 2026-01-05T22:38:57Z

snuba/web/rpc/v1/endpoint_get_trace.py

        ),
    ]
    new_order_by = [
-        OrderBy(
-            direction=OrderByDirection.ASC,
-            expression=column("timestamp"),
-        ),
-        OrderBy(
-            direction=OrderByDirection.ASC,
-            expression=column("item_timestamp"),
-        ),
-        OrderBy(
-            direction=OrderByDirection.ASC,
-            expression=column("item_id"),
-        ),
+        # we add organization_id and project_id to the order by to optimize data reading
+        # https://clickhouse.com/docs/sql-reference/statements/select/order-by#optimization-of-data-reading
+        OrderBy(direction=OrderByDirection.ASC, expression=column("organization_id")),
+        OrderBy(direction=OrderByDirection.ASC, expression=column("project_id")),
+        OrderBy(direction=OrderByDirection.ASC, expression=column("item_type")),


i think we still need item_timestamp in the order by (between trace_id and item_id). its a more precise timestamp than timestamp but we want to ensure its ordering. you can double check with pierre, and you will have to verify the implications on performance.

did you already test the performance of the existing order by using set_skip_transform_order_by? If not I think that would be worth testing first before changing the order by.

Copilot

Pull request overview

This PR fixes a pagination performance issue in the GetTrace endpoint by aligning the ORDER BY clause with the ClickHouse table's sort key. The mismatch was causing slow pagination as ClickHouse couldn't efficiently skip data.

Updated ORDER BY to match table sort key: organization_id, project_id, item_type, timestamp, trace_id, item_id
Added set_skip_transform_order_by(True) to prevent column transformations that would break the alignment
Added test to verify the SQL ORDER BY clause matches the expected ordering

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
snuba/web/rpc/v1/endpoint_get_trace.py	Updated ORDER BY clause to match ClickHouse table sort key (added organization_id, project_id, item_type, trace_id; removed item_timestamp alias). Added set_skip_transform_order_by(True) to preserve exact column names in ORDER BY.
tests/web/rpc/v1/test_endpoint_get_trace.py	Added new test `test_no_transformation_on_order_by` to verify that the ORDER BY clause in generated SQL matches the expected column order without transformations. Added necessary imports (QueryResult, run_query).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-05T22:38:10Z

snuba/web/rpc/v1/endpoint_get_trace.py

+        OrderBy(direction=OrderByDirection.ASC, expression=column("organization_id")),
+        OrderBy(direction=OrderByDirection.ASC, expression=column("project_id")),
+        OrderBy(direction=OrderByDirection.ASC, expression=column("item_type")),
+        OrderBy(direction=OrderByDirection.ASC, expression=column("timestamp")),


The new ORDER BY clause doesn't align with the existing pagination filter logic (page_token_filter at lines 254-283). The page_token_filter uses item_timestamp (a computed alias) and timestamp to determine the next page, but the new ORDER BY doesn't include item_timestamp. This mismatch means pagination will not work correctly.

For correct pagination behavior, either:

Update the page_token_filter to match this new ORDER BY (filtering on organization_id, project_id, item_type, timestamp, trace_id, item_id)

Include item_timestamp in the ORDER BY if precision-based pagination is required

Ensure the page_token stores and filters on all ORDER BY columns in the correct order

Currently, the page_token_filter uses timestamp and item_timestamp, but this ORDER BY only uses timestamp without item_timestamp, which will cause incorrect pagination boundaries.

Suggested change

OrderBy(direction=OrderByDirection.ASC, expression=column("timestamp")),

OrderBy(direction=OrderByDirection.ASC, expression=column("timestamp")),

OrderBy(direction=OrderByDirection.ASC, expression=column("item_timestamp")),

Copilot · 2026-01-05T22:38:10Z

tests/web/rpc/v1/test_endpoint_get_trace.py

+    def test_no_transformation_on_order_by(self, setup_teardown: Any, monkeypatch: Any) -> None:
+        # Wrap the real run_query to capture the actual QueryResult while still hitting ClickHouse.
+        captured: dict[str, Any] = {}
+
+        def wrapper(dataset, request, timer, robust: bool = False, concurrent_queries_gauge=None) -> QueryResult:  # type: ignore[no-untyped-def]
+            qr = run_query(dataset, request, timer, robust, concurrent_queries_gauge)
+            captured["query_result"] = qr
+            return qr
+
+        monkeypatch.setattr("snuba.web.rpc.v1.endpoint_get_trace.run_query", wrapper)
+
+        ts = Timestamp(seconds=int(_BASE_TIME.timestamp()))
+        three_hours_later = int((_BASE_TIME + timedelta(hours=3)).timestamp())
+        message = GetTraceRequest(
+            meta=RequestMeta(
+                project_ids=[1, 2, 3],
+                organization_id=1,
+                cogs_category="something",
+                referrer="something",
+                start_timestamp=ts,
+                end_timestamp=Timestamp(seconds=three_hours_later),
+                request_id=_REQUEST_ID,
+            ),
+            trace_id=_TRACE_ID,
+            items=[
+                GetTraceRequest.TraceItem(
+                    item_type=TraceItemType.TRACE_ITEM_TYPE_SPAN,
+                )
+            ],
+        )
+
+        EndpointGetTrace().execute(message)
+
+        qr = captured["query_result"]
+        assert (
+            "ORDER BY organization_id ASC, project_id ASC, item_type ASC, timestamp ASC, trace_id ASC, item_id ASC"
+            in qr.extra["sql"]
+        )


Consider explicitly setting state.set_config("enable_trace_pagination", 1) at the beginning of this test to make it clear that it's testing the new pagination order by behavior. While pagination is enabled by default (ENABLE_TRACE_PAGINATION_DEFAULT = 1), explicit configuration makes the test intent clearer and more resilient to future default changes.

volokluev · 2026-01-07T21:21:36Z

What will you monitor to see if this has an impact on the product in aggregate?

snuba/web/rpc/v1/endpoint_get_trace.py

+
+        # Update last seen pagination values
+        last_seen_item_type = int(row.pop("item_type", TraceItemType.TRACE_ITEM_TYPE_UNSPECIFIED))
+        last_seen_timestamp = row.get("integer_timestamp", None)


snuba/web/rpc/v1/endpoint_get_trace.py

+
+        # Update last seen pagination values
+        last_seen_item_type = int(row.pop("item_type", TraceItemType.TRACE_ITEM_TYPE_UNSPECIFIED))
+        last_seen_timestamp = row.pop("integer_timestamp", None)


snuba/web/rpc/v1/endpoint_get_trace.py

sentry · 2026-01-16T19:33:06Z

snuba/web/rpc/v1/endpoint_get_trace.py

+                                    column("trace_id"), literal(page_token.last_seen_trace_id)
+                                ),
+                                f.greater(
+                                    f.reinterpretAsUInt128(f.reverse(f.unhex(column("item_id")))),
+                                    literal(page_token.last_seen_item_id),
+                                ),
+                            ),
+                        ),
                    ),
                ),
            )


Bug: The keyset pagination filter omits organization_id, creating a mismatch with the ORDER BY clause and negating a performance optimization.
_{Severity: MEDIUM}

Suggested Fix

Update the pagination filter logic to include organization_id as the first condition. The filter conditions should be built to match the full ORDER BY clause: (organization_id, project_id, item_type, timestamp, trace_id, item_id). This will ensure the pagination is logically correct and fully leverages the intended ClickHouse optimization.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: snuba/web/rpc/v1/endpoint_get_trace.py#L327-L394 Potential issue: The keyset pagination filter logic omits `organization_id`, while the `ORDER BY` clause includes it. A code comment explicitly states `organization_id` is added to the `ORDER BY` for ClickHouse performance optimization, which relies on the order matching the table's sort key. Keyset pagination requires the filter conditions to match the `ORDER BY` columns. Although current queries are filtered to a single organization, this mismatch renders the optimization ineffective and makes the pagination logic technically incorrect, risking silent failures if the API is ever used for cross-organization queries.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

Rachel Chen added 2 commits January 5, 2026 14:25

fix(eap): GetTrace endpoint uses correct order by for pagination

56eb13b

breakpoint

1634737

xurui-c marked this pull request as ready for review January 5, 2026 22:31

xurui-c requested a review from a team as a code owner January 5, 2026 22:31

Copilot AI review requested due to automatic review settings January 5, 2026 22:31

xurui-c requested a review from a team as a code owner January 5, 2026 22:31

Copilot started reviewing on behalf of xurui-c January 5, 2026 22:31 View session

xurui-c changed the title ~~fix(eap): GetTrace endpoint uses correct order by for pagination~~ fix(eap): speed up GetTrace endpoint pagination Jan 5, 2026

sentry bot reviewed Jan 5, 2026

View reviewed changes

Copilot AI reviewed Jan 5, 2026

View reviewed changes

Rachel Chen and others added 3 commits January 9, 2026 13:50

pagination change

ba442fc

fix

d987994

timestamp

ff1ea91

sentry bot reviewed Jan 12, 2026

View reviewed changes

snuba/web/rpc/v1/endpoint_get_trace.py Outdated

# Update last seen pagination values

last_seen_item_type = int(row.pop("item_type", TraceItemType.TRACE_ITEM_TYPE_UNSPECIFIED))

last_seen_timestamp = row.get("integer_timestamp", None)

This comment was marked as outdated.

Sign in to view

fix

8cb5e4c

sentry bot reviewed Jan 12, 2026

View reviewed changes

snuba/web/rpc/v1/endpoint_get_trace.py

# Update last seen pagination values

last_seen_item_type = int(row.pop("item_type", TraceItemType.TRACE_ITEM_TYPE_UNSPECIFIED))

last_seen_timestamp = row.pop("integer_timestamp", None)

This comment was marked as outdated.

Sign in to view

phacops added bug Something isn't working and removed bug Something isn't working labels Jan 12, 2026

cleanup

46df390

sentry bot reviewed Jan 13, 2026

View reviewed changes

snuba/web/rpc/v1/endpoint_get_trace.py Outdated Show resolved Hide resolved

more rigorous snuba test

c68efe1

sentry bot reviewed Jan 15, 2026

View reviewed changes

snuba/web/rpc/v1/endpoint_get_trace.py Show resolved Hide resolved

Rachel Chen added 3 commits January 16, 2026 00:48

project id

dc2312e

fix

84d5973

fix

b7a1255

phacops approved these changes Jan 16, 2026

View reviewed changes

remove comment

b24d50b

sentry bot reviewed Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(eap): speed up GetTrace endpoint pagination #7614

fix(eap): speed up GetTrace endpoint pagination #7614

xurui-c commented Jan 5, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

kylemumma Jan 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

volokluev commented Jan 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

sentry bot Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	OrderBy(direction=OrderByDirection.ASC, expression=column("timestamp")),
	OrderBy(direction=OrderByDirection.ASC, expression=column("timestamp")),
	OrderBy(direction=OrderByDirection.ASC, expression=column("item_timestamp")),

Uh oh!

fix(eap): speed up GetTrace endpoint pagination #7614

Are you sure you want to change the base?

fix(eap): speed up GetTrace endpoint pagination #7614

Conversation

xurui-c commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

kylemumma Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

volokluev commented Jan 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

sentry bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xurui-c commented Jan 5, 2026 •

edited

Loading