WIP: Troubleshoot the bucket limit for sync streams#374
WIP: Troubleshoot the bucket limit for sync streams#374
Conversation
simolus3
left a comment
There was a problem hiding this comment.
I've only checked the examples so far, not the suggested workarounds.
debugging/troubleshooting.mdx
Outdated
| | Subscription parameter: `WHERE project_id = subscription.parameter('project_id')` | 1 per unique parameter value the client subscribes with | | ||
| | Subquery returning N rows: `WHERE id IN (SELECT org_id FROM org_membership WHERE user_id = auth.user_id())` | N — one per result row of the subquery | | ||
| | INNER JOIN through an intermediate table: `SELECT tasks.* FROM tasks JOIN projects ON tasks.project_id = projects.id WHERE projects.org_id IN (...)` | N — one per row of the joined table (one per project) | | ||
| | Many-to-many JOIN: `SELECT assets.* FROM assets JOIN project_assets ON project_assets.asset_id = assets.id WHERE project_assets.project_id IN (...)` | N — one per primary table row (one per asset) | |
There was a problem hiding this comment.
This is true, but I wonder if explaining it as a special case (the paragraph above also calls out many-to-many joins as an exception to the rule) is really that helpful. The two paragraphs below also point this out separately.
For subqueries and one-to-many JOINs, each row returned creates a bucket
That also applies here, there would be one bucket per project_assets row for the user. So reach row returend in the joined table creates one bucket, the many-to-many join doesn't make this an exception.
Maybe it makes sense to explain the general rule first (for each expression of the form a = b, a IN b and a && b where either a or b depend on the table being synced, we create a parameter). Assuming that a is the expression depending on the table being synced, we create one bucket per row of b. All of the cases can be explained with that rule, it might be harder to grasp but it avoids having to explain many-to-many joins separately.
| | No parameters: `SELECT * FROM regions` | 1 global bucket, shared by all users | | ||
| | Direct auth filter only: `WHERE user_id = auth.user_id()` | 1 per user | | ||
| | Subscription parameter: `WHERE project_id = subscription.parameter('project_id')` | 1 per unique parameter value the client subscribes with | | ||
| | Subquery returning N rows: `WHERE id IN (SELECT org_id FROM org_membership WHERE user_id = auth.user_id())` | N — one per result row of the subquery | |
There was a problem hiding this comment.
Maybe also add a JSON array example, e.g. WHERE id IN auth.parameter('project_ids') would give jwt.project_ids.length buckets.
debugging/troubleshooting.mdx
Outdated
| user_projects → [proj-1, proj-2, proj-3, proj-4, proj-5, proj-6] (6 values) | ||
| ``` | ||
|
|
||
| Each query creates its own bucket namespace, even when two queries use the same CTE: |
There was a problem hiding this comment.
This is not generally true, the compiler is allowed to merge the projects and tasks queries into a single bucket in this example, precisely because the CTE is the same (or generally, because the buckets have the same instantiation and are part of the same stream).
(I haven't checked whether the buckets are actually merged in this case, but we are supposed to be able to expoit that)
debugging/troubleshooting.mdx
Outdated
| | `tasks` | `user_projects` | proj-1 … proj-6 | 6 | | ||
| | | | **Total** | **14** | | ||
|
|
||
| At scale — 10 orgs and 50 projects per org — this becomes 10 + 500 + 500 = 1,010 buckets, which exceeds the limit. |
There was a problem hiding this comment.
This O(n * m) blowup still applies if the buckets are merged and is a good thing to be aware of so I think we should mention it fwiw. But removing one of the projects / tasks queries might be easier to understand.
| ↔ users (org_membership.user_id → users.id) | ||
| ``` | ||
|
|
||
| | Query pattern | Buckets per user | |
There was a problem hiding this comment.
Perhaps also add an example using multiple parameters (WHERE id IN (SELECT org_id FROM org_membership WHERE user_id = auth.user_id()) AND region = subscription.parameter('region')).
This would give one bucket per (org_id, region) pair, so up to N * M for N org ids and M distinct subscriptions.
We could also explain OR separately (those would give N + M buckets in most cases).
This is currently not necessarily an exhaustive list