Track download size in `ps_buckets` #160

simolus3 · 2026-01-26T19:47:19Z

This adds a download_size column to ps_buckets. It is initialized to 0, the sync client increments it whenever it receives a data line for a particular bucket.

This also helps with #155: By tracking this in the Rust client, SDKs won't have to decode lines to correlate buckets with original message sizes.

rkistner

I'm happy with tracking this in the ps_buckets table.

Do we already have something similar to track the number of operations we have downloaded for the bucket? I know we have count_at_last / count_since_last, but I believe those are reset when the checkpoint completes, so not quite the same?

We should also keep in mind there are some differences in tracking it here versus the current diagnostics client logic:

Diagnostics client tracks empty buckets using the checkpoint data, this only tracks buckets when it has data. Empty buckets can be useful to see to distinguish parameter query / subquery issues from data query issues.
The diagnostics client keeps the bucket download history around when removing buckets, while the ps_buckets row can be deleted (and potentially re-created) when the bucket is removed, or when there is a checksum failure.

crates/core/src/sync/operations.rs

simolus3 · 2026-01-28T10:15:08Z

but I believe those are reset when the checkpoint completes, so not quite the same?

count_since_last would get reset on checkpoints since the download is complete. But count_at_last would get set to BucketChecksum.count on completed checkpoints, so it should be a reasonable estimate for the amount of downloaded operations? I think that would only be inaccurate for re-deploys without the versioned-bucket-names compatibility option, but in that case we'd redownload either way.

Diagnostics client tracks empty buckets using the checkpoint data, this only tracks buckets when it has data

That's true, but those phantom buckets are essentially in-memory only right? They would also be tracked through diagnostics from this PR, so SDKs could react to that (and infer a default downloaded size of zero for buckets that exist in memory but not in ps_buckets).

The diagnostics client keeps the bucket download history around when removing buckets, while the ps_buckets row can be deleted (and potentially re-created) when the bucket is removed, or when there is a checksum failure.

That is something worth remembering. I can see pros and cons for either approach (a growing download size without any notification when we've thrown a bucket away and are downloading it again is also not that helpful).

rkistner · 2026-01-28T11:00:36Z

But count_at_last would get set to BucketChecksum.count on completed checkpoints, so it should be a reasonable estimate for the amount of downloaded operations? I think that would only be inaccurate for re-deploys without the versioned-bucket-names compatibility option, but in that case we'd redownload either way.

I think that's fine. We should perhaps then just add a note somewhere in the diagnostics client on what it specifically means (i.e. it's the count from the server, not the actual count on the client).

That's true, but those phantom buckets are essentially in-memory only right? They would also be tracked through diagnostics from this PR, so SDKs could react to that (and infer a default downloaded size of zero for buckets that exist in memory but not in ps_buckets).

I think it's semi-persisted in a localOnly table at the moment, but it's fine to have it in-memory only.

That is something worth remembering. I can see pros and cons for either approach (a growing download size without any notification when we've thrown a bucket away and are downloading it again is also not that helpful).

Yeah, I think that's fine.

We can also keep track of the total downloaded data size for the current session in-memory on the client (i.e. count the bytes, no need to inspect the messages). That plus the new diagnostics events should be sufficient to identify cases of for example repeatedly re-downloading the same buckets.

simolus3 added 2 commits January 26, 2026 18:02

Track download size

256fb53

Don't track for legacy oplog vtab

555b467

simolus3 requested a review from rkistner January 26, 2026 19:47

rkistner reviewed Jan 27, 2026

View reviewed changes

crates/core/src/sync/operations.rs Outdated Show resolved Hide resolved

download_size -> downloaded_size

77e8f05

rkistner approved these changes Jan 28, 2026

View reviewed changes

simolus3 merged commit b61f87c into main Jan 28, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track download size in `ps_buckets` #160

Track download size in `ps_buckets` #160

simolus3 commented Jan 26, 2026

Uh oh!

rkistner left a comment

Uh oh!

Uh oh!

simolus3 commented Jan 28, 2026

Uh oh!

rkistner commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Track download size in ps_buckets #160

Track download size in ps_buckets #160

Conversation

simolus3 commented Jan 26, 2026

Uh oh!

rkistner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

simolus3 commented Jan 28, 2026

Uh oh!

rkistner commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Track download size in `ps_buckets` #160

Track download size in `ps_buckets` #160