perf(sync): eliminate scheduling delay and increase download buffer#9992
perf(sync): eliminate scheduling delay and increase download buffer#9992qoole wants to merge 3 commits into
Conversation
|
Hello there, We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process. Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6 Thank you for contributing to Nextcloud and we hope to hear from you soon! (If you believe you should not receive this message, you can add yourself to the blocklist.) |
mgallien
left a comment
There was a problem hiding this comment.
can you add some benchmark in the test folder to prove that we would indeed really gain the 30 seconds for 10 000 files ?
I would need to be convinced
you also need to split unrelated changes in different commits with proper commit messages ?
the change of the buffer size is unrelated to the other change
scheduleNextJob() deferred the next scheduling pass with QTimer::singleShot(3, ...). The _jobScheduled flag already guarantees only one scheduling pass is pending at a time, so the 3ms delay served no purpose other than throttling. Because the flag serializes passes, each of the ~N scheduling passes for a sync of N files paid the full 3ms latency back-to-back; for 10,000 small files that is ~30 seconds of pure scheduling latency. Replace it with QMetaObject::invokeMethod(Qt::QueuedConnection), which defers to the next event-loop iteration (preserving re-entrancy safety) without the artificial delay. Signed-off-by: Qoole <2862661+qoole@users.noreply.github.com>
The per-iteration read buffer in GETFileJob was capped at 8KB. Raising it to 256KB (still bounded by reply()->bytesAvailable() via qMin) cuts the number of read/write iterations by up to 32x for fast downloads, matching the scale of the existing 500KB checksum buffer. Signed-off-by: Qoole <2862661+qoole@users.noreply.github.com>
Add ScheduleDelay benchmark isolating the deferral mechanism used by OwncloudPropagator::scheduleNextJob(). Because the propagator serializes scheduling passes with a single _jobScheduled flag, each of the N passes for an N-file sync pays the deferral latency back-to-back. The benchmark runs 10,000 such passes through both mechanisms. Measured locally (RelWithDebInfo, Windows/MSVC): ROUNDS 10000 QTimer::singleShot(3ms): 34845 ms QMetaObject::invokeMethod(QueuedConnection): 472 ms SAVED: 34373 ms This confirms the ~30 s scheduling latency eliminated for a 10,000-file sync by replacing the 3 ms QTimer with a queued invocation. Signed-off-by: Qoole <2862661+qoole@users.noreply.github.com>
c0615ec to
caabebc
Compare
|
Addressed both points. 1. Split unrelated changes into separate commits:
2. Benchmark for the 30 s claim. Added So ~34 s of pure scheduling latency is removed for a 10,000-file sync, confirming the original estimate. The benchmark is runnable in CI so you can verify the delta independently. |
Summary
Two related sync throughput improvements:
scheduleNextJob3ms timerOwncloudPropagator::scheduleNextJobschedules the next call viaQTimer::singleShot(3, ...). The 3ms delay was originally there to debounce re-entry, but the_jobScheduledflag at the top of the function already prevents that — once set, repeat calls return immediately.Replaced with
QMetaObject::invokeMethod(Qt::QueuedConnection)which posts to the event loop with no artificial delay.With 10,000 small files this delay accumulates to roughly 30 seconds of scheduling latency that is pure idle time on the event loop. On a sync of many tiny files this is the dominant cost of the propagator.
Download read buffer
GETFileJob::newReplyHookandslotMetaDataChangedboth setreply->setReadBufferSize(16 * 1024). The original comment says "keep low so we can easier limit the bandwidth", butBandwidthManagerquantizes byqMin(bytesAvailable, quota)perreadyRead, so the buffer size only changes the maximum granularity of a single read, not the limiting precision.Increased to
256 * 1024to match the order of magnitude of the checksum buffer (500KB). Bandwidth limits still apply correctly via the per-readqMinclamp; they simply operate on slightly larger reads when no limit is set. For unthrottled downloads this reduces read/write iteration count by 16x.Checklist
AI (if applicable)