Skip to content

perf(sync): eliminate scheduling delay and increase download buffer#9992

Open
qoole wants to merge 3 commits into
nextcloud:masterfrom
qoole:perf/sync-scheduling
Open

perf(sync): eliminate scheduling delay and increase download buffer#9992
qoole wants to merge 3 commits into
nextcloud:masterfrom
qoole:perf/sync-scheduling

Conversation

@qoole
Copy link
Copy Markdown
Contributor

@qoole qoole commented May 7, 2026

Summary

Two related sync throughput improvements:

scheduleNextJob 3ms timer

OwncloudPropagator::scheduleNextJob schedules the next call via QTimer::singleShot(3, ...). The 3ms delay was originally there to debounce re-entry, but the _jobScheduled flag at the top of the function already prevents that — once set, repeat calls return immediately.

Replaced with QMetaObject::invokeMethod(Qt::QueuedConnection) which posts to the event loop with no artificial delay.

With 10,000 small files this delay accumulates to roughly 30 seconds of scheduling latency that is pure idle time on the event loop. On a sync of many tiny files this is the dominant cost of the propagator.

Download read buffer

GETFileJob::newReplyHook and slotMetaDataChanged both set reply->setReadBufferSize(16 * 1024). The original comment says "keep low so we can easier limit the bandwidth", but BandwidthManager quantizes by qMin(bytesAvailable, quota) per readyRead, so the buffer size only changes the maximum granularity of a single read, not the limiting precision.

Increased to 256 * 1024 to match the order of magnitude of the checksum buffer (500KB). Bandwidth limits still apply correctly via the per-read qMin clamp; they simply operate on slightly larger reads when no limit is set. For unthrottled downloads this reduces read/write iteration count by 16x.

Checklist

AI (if applicable)

  • The content of this PR was partly or fully generated using AI

@github-actions
Copy link
Copy Markdown

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

Copy link
Copy Markdown
Collaborator

@mgallien mgallien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some benchmark in the test folder to prove that we would indeed really gain the 30 seconds for 10 000 files ?
I would need to be convinced
you also need to split unrelated changes in different commits with proper commit messages ?
the change of the buffer size is unrelated to the other change

qoole added 3 commits May 26, 2026 20:25
scheduleNextJob() deferred the next scheduling pass with
QTimer::singleShot(3, ...). The _jobScheduled flag already guarantees
only one scheduling pass is pending at a time, so the 3ms delay served
no purpose other than throttling. Because the flag serializes passes,
each of the ~N scheduling passes for a sync of N files paid the full
3ms latency back-to-back; for 10,000 small files that is ~30 seconds of
pure scheduling latency.

Replace it with QMetaObject::invokeMethod(Qt::QueuedConnection), which
defers to the next event-loop iteration (preserving re-entrancy safety)
without the artificial delay.

Signed-off-by: Qoole <2862661+qoole@users.noreply.github.com>
The per-iteration read buffer in GETFileJob was capped at 8KB. Raising
it to 256KB (still bounded by reply()->bytesAvailable() via qMin) cuts
the number of read/write iterations by up to 32x for fast downloads,
matching the scale of the existing 500KB checksum buffer.

Signed-off-by: Qoole <2862661+qoole@users.noreply.github.com>
Add ScheduleDelay benchmark isolating the deferral mechanism used by
OwncloudPropagator::scheduleNextJob(). Because the propagator serializes
scheduling passes with a single _jobScheduled flag, each of the N passes
for an N-file sync pays the deferral latency back-to-back. The benchmark
runs 10,000 such passes through both mechanisms.

Measured locally (RelWithDebInfo, Windows/MSVC):

  ROUNDS 10000
  QTimer::singleShot(3ms):                     34845 ms
  QMetaObject::invokeMethod(QueuedConnection):   472 ms
  SAVED:                                       34373 ms

This confirms the ~30 s scheduling latency eliminated for a 10,000-file
sync by replacing the 3 ms QTimer with a queued invocation.

Signed-off-by: Qoole <2862661+qoole@users.noreply.github.com>
@qoole qoole force-pushed the perf/sync-scheduling branch from c0615ec to caabebc Compare May 26, 2026 20:15
@qoole
Copy link
Copy Markdown
Contributor Author

qoole commented May 26, 2026

Addressed both points.

1. Split unrelated changes into separate commits:

  • 27e80d7 perf(sync): remove 3ms scheduling delay between propagation jobs
  • 1cfb96a perf(download): increase download read buffer from 8KB to 256KB

2. Benchmark for the 30 s claim. Added test/benchmarks/benchscheduledelay.cpp (caabebc), registered as the ScheduleDelay benchmark. It isolates the deferral mechanism: because scheduleNextJob() serializes passes with the single _jobScheduled flag, each of the N passes for an N-file sync pays the deferral latency back-to-back. Running 10,000 passes through each mechanism, measured locally (RelWithDebInfo, Windows/MSVC):

ROUNDS 10000
QTimer::singleShot(3ms):                     34845 ms
QMetaObject::invokeMethod(QueuedConnection):   472 ms
SAVED:                                       34373 ms

So ~34 s of pure scheduling latency is removed for a 10,000-file sync, confirming the original estimate. The benchmark is runnable in CI so you can verify the delta independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants