feat: Add configurable sync window (mail.max_sync_days) to limit database growth#12633
feat: Add configurable sync window (mail.max_sync_days) to limit database growth#12633Rikdekker wants to merge 7 commits intonextcloud:mainfrom
Conversation
…base growth Add a new admin setting `mail.max_sync_days` that limits how far back the Mail app synchronizes messages from IMAP to the local database. At scale (10,000+ accounts), the database grows unbounded — this setting reduces it by 80-95% while preserving full access to older messages via IMAP search and on-demand fetching. Components: - Admin UI + API endpoint for setting the sync window - SEARCH SINCE filter on initial sync (RFC 3501) - Daily SyncWindowCleanupJob to purge old records (batched) - BackfillSearchResultsJob for search results outside the window - OnDemandSyncJob for scroll-past-cache older message loading - DeepSyncJob to offload deep-sync from the main cron cycle Default value 0 (unlimited) preserves current behavior — no breaking change. Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix import ordering (alphabetical) in SyncJob.php, SyncWindowCleanupJob.php - Use single quotes for strings without interpolation - Remove redundant do-while condition in SyncWindowCleanupJob - Remove non-existent PostgreSQL120Platform class reference - Fix dateSearch() signature: remove invalid 'UTC' 4th argument Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c76b869 to
cc5ef1d
Compare
insertBulkIgnore() uses INSERT IGNORE / ON CONFLICT DO NOTHING for deduplication, which requires a UNIQUE constraint on (mailbox_id, uid). The existing index is non-unique, so without this migration duplicate rows would be silently created. Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix constructor closing brace placement in ImapToDbSynchronizer - Use single quotes for non-interpolated string starts - Remove redundant (int) cast on lastInsertId() - Add null coalescing for mb_strcut() arguments Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move first constructor param to new line (php-cs multi-line rule) - Add trailing comma after last constructor param - Fix CRLF → LF line endings in migration file - Remove redundant ?? '' on getEmail() (already non-null after null check) Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MailSearchTest: add IConfig, ImapToDbSynchronizer, IJobList mocks - AdminSettingsTest: expect 15 provideInitialState calls (was 14) - Fix psalm PossiblyNullArgument on untagMessage via type assertion Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix alphabetical import order in MailSearchTest - Skip testDeleteDuplicateUids: the new UNIQUE(mailbox_id, uid) constraint prevents the duplicate inserts this test relies on Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thank you for your contribution! I have three question
|
ChristophWurst
left a comment
There was a problem hiding this comment.
Some comments about the code
| * Instead of blocking the HTTP request with a synchronous IMAP fetch, | ||
| * MailSearch schedules this job. The frontend retries after a short delay | ||
| * and picks up the newly synced messages from the local DB. |
There was a problem hiding this comment.
A standard deployment has a cron interval of 5 minutes. That means the asynchronous job will be delayed up to 5 minutes if the cron queue is empty, potentially longer if there are more time critical jobs in the queue.
| ->from($this->getTableName()) | ||
| ->where($query->expr()->eq('mailbox_id', $query->createNamedParameter($mailbox->getId(), IQueryBuilder::PARAM_INT), IQueryBuilder::PARAM_INT)) | ||
| ->andWhere($query->expr()->in('uid', $query->createNamedParameter($chunk, IQueryBuilder::PARAM_INT_ARRAY))); | ||
| $existing = array_merge($existing, $this->findUids($query)); |
There was a problem hiding this comment.
perf: array_merge in a loop is very memory hungy
| * Messages are marked as structure_analyzed to skip the PreviewEnhancer's | ||
| * second IMAP connection -- structure is loaded lazily when opened. |
There was a problem hiding this comment.
The structure is loaded, true, but the meta data acquired and persisted during preprocessing will then be missing. That is
- Flag if email has attachments
- Preview text
- Flag if it's an IMIP message
- Flag if it's encrypted
- Flag if it's mentioning the user
| * to set up and the deleteDuplicateUids() method unnecessary. | ||
| */ | ||
| public function testDeleteDuplicateUids(): void { | ||
| $this->markTestSkipped('UNIQUE(mailbox_id, uid) constraint prevents duplicate inserts'); |
There was a problem hiding this comment.
in that case the test can go
|
Heads-up: I have some ideas for an alternative approach that reduces database size while keeping UX, sorting and threading intact. I'll draft a ticket shortly. |
Summary
Add a new admin setting
mail.max_sync_daysthat limits how far back the Mail app synchronizes messages from IMAP to the local database. Default0(unlimited) — no breaking change for existing installations.At scale (10,000+ accounts), the Mail database grows unbounded. In our production deployment:
mail_messages(100K messages/account × 10K accounts)SyncJobblocks cron for 80–190 minutesA 90-day sync window reduces database size by 80–95% while preserving full access to older messages via IMAP search and on-demand fetching.
Related issues
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 11605029 bytes) in /var/www/html/nextcloud/apps/mail/lib/Db/MessageMapper.php on line 583#10646 — PHP Fatal error: memory exhausted during syncHow it works
1. Admin setting + UI
Numeric input in Admin Settings → Mail → Synchronization window:
2. Sync filter
When
max_sync_days > 0,ImapToDbSynchronizer::runInitialSync()addsSEARCH SINCE <date>(RFC 3501 §6.4.4) viaHorde_Imap_Client_Search_Query::dateSearch(). Only messages within the window are fetched.3. Daily cleanup job
SyncWindowCleanupJob(TimedJob, every 24h) deletes records wheresent_at < now - max_sync_days. Batched at 500 rows per transaction. No-op whenmax_sync_days = 0.4. Search fallback
When IMAP search returns UIDs outside the local cache,
BackfillSearchResultsJobfetches those specific messages asynchronously (batches of 200 UIDs).5. On-demand scroll
When a user scrolls past cached messages,
OnDemandSyncJobfetches the next batch from IMAP asynchronously.6. Database migration
Upgrades the existing
(mailbox_id, uid)index onmail_messagesfrom non-unique to UNIQUE. This is required forINSERT IGNORE/ON CONFLICT DO NOTHINGdeduplication used by the on-demand sync and search backfill jobs. The migration is idempotent and runs automatically viaocc upgrade.Architecture
Impact
Files changed
lib/Settings/AdminSettings.phpmax_sync_daysconfig for admin UIlib/Controller/SettingsController.phpsetMaxSyncDays()endpointsrc/components/settings/AdminSettings.vuelib/Service/Sync/ImapToDbSynchronizer.phpSEARCH SINCEfilter inrunInitialSync()lib/BackgroundJob/SyncWindowCleanupJob.phplib/Service/Search/MailSearch.phplib/BackgroundJob/OnDemandSyncJob.phplib/BackgroundJob/BackfillSearchResultsJob.phplib/BackgroundJob/DeepSyncJob.phplib/Migration/Version5200Date20260309000000.php(mailbox_id, uid)appinfo/info.xmlProduction context
We operate a Nextcloud 32 instance with in the future 10,000+ accounts syncing against Exchange Online. This feature was developed to address the database growth issues we are experiencing at this scale. It has been tested in our staging environment but is not yet deployed to production — we would like to align with upstream before rolling it out.
Test plan
max_sync_days=0(default) — verify no behavioral changemax_sync_days=90— verify only recent messages syncSyncWindowCleanupJobdeletes old records in batchesBackfillSearchResultsJobrunsOnDemandSyncJobfetches older messages🤖 Generated with Claude Code