Skip to content

feat: Add configurable sync window (mail.max_sync_days) to limit database growth#12633

Open
Rikdekker wants to merge 7 commits intonextcloud:mainfrom
Rikdekker:feat/max-sync-days
Open

feat: Add configurable sync window (mail.max_sync_days) to limit database growth#12633
Rikdekker wants to merge 7 commits intonextcloud:mainfrom
Rikdekker:feat/max-sync-days

Conversation

@Rikdekker
Copy link
Copy Markdown

@Rikdekker Rikdekker commented Mar 21, 2026

Summary

Add a new admin setting mail.max_sync_days that limits how far back the Mail app synchronizes messages from IMAP to the local database. Default 0 (unlimited) — no breaking change for existing installations.

At scale (10,000+ accounts), the Mail database grows unbounded. In our production deployment:

  • ~1 billion rows in mail_messages (100K messages/account × 10K accounts)
  • Database exceeds 200 GB and grows daily
  • Initial sync for large mailboxes fails with PHP memory exhaustion
  • Background SyncJob blocks cron for 80–190 minutes

A 90-day sync window reduces database size by 80–95% while preserving full access to older messages via IMAP search and on-demand fetching.

Related issues

How it works

1. Admin setting + UI

Numeric input in Admin Settings → Mail → Synchronization window:

occ config:app:set mail max_sync_days --value=90

2. Sync filter

When max_sync_days > 0, ImapToDbSynchronizer::runInitialSync() adds SEARCH SINCE <date> (RFC 3501 §6.4.4) via Horde_Imap_Client_Search_Query::dateSearch(). Only messages within the window are fetched.

3. Daily cleanup job

SyncWindowCleanupJob (TimedJob, every 24h) deletes records where sent_at < now - max_sync_days. Batched at 500 rows per transaction. No-op when max_sync_days = 0.

4. Search fallback

When IMAP search returns UIDs outside the local cache, BackfillSearchResultsJob fetches those specific messages asynchronously (batches of 200 UIDs).

5. On-demand scroll

When a user scrolls past cached messages, OnDemandSyncJob fetches the next batch from IMAP asynchronously.

6. Database migration

Upgrades the existing (mailbox_id, uid) index on mail_messages from non-unique to UNIQUE. This is required for INSERT IGNORE / ON CONFLICT DO NOTHING deduplication used by the on-demand sync and search backfill jobs. The migration is idempotent and runs automatically via occ upgrade.

Architecture

Admin UI ──→ mail.max_sync_days (oc_appconfig)
                    │
    ┌───────────────┼───────────────────┐
    ▼               ▼                   ▼
Initial Sync    Cleanup Job      Search/Scroll
(SEARCH SINCE)  (daily purge)    (backfill on demand)

Impact

Deployment Accounts Without sync window With 90-day window
Small 100 5M rows 1M rows
Medium 1,000 50M rows 10M rows
Large 10,000 500M–1B rows 30–50M rows

Files changed

File Change
lib/Settings/AdminSettings.php Load max_sync_days config for admin UI
lib/Controller/SettingsController.php setMaxSyncDays() endpoint
src/components/settings/AdminSettings.vue Number input field
lib/Service/Sync/ImapToDbSynchronizer.php SEARCH SINCE filter in runInitialSync()
lib/BackgroundJob/SyncWindowCleanupJob.php New — daily cleanup job
lib/Service/Search/MailSearch.php On-demand sync + search backfill scheduling
lib/BackgroundJob/OnDemandSyncJob.php New — async older message fetch
lib/BackgroundJob/BackfillSearchResultsJob.php New — async search result backfill
lib/BackgroundJob/DeepSyncJob.php New — background deep-sync
lib/Migration/Version5200Date20260309000000.php New — UNIQUE constraint on (mailbox_id, uid)
appinfo/info.xml Job registration

Production context

We operate a Nextcloud 32 instance with in the future 10,000+ accounts syncing against Exchange Online. This feature was developed to address the database growth issues we are experiencing at this scale. It has been tested in our staging environment but is not yet deployed to production — we would like to align with upstream before rolling it out.

Test plan

  • Set max_sync_days=0 (default) — verify no behavioral change
  • Set max_sync_days=90 — verify only recent messages sync
  • Verify SyncWindowCleanupJob deletes old records in batches
  • Search for a term that matches old messages — verify BackfillSearchResultsJob runs
  • Scroll past cached messages — verify OnDemandSyncJob fetches older messages
  • Verify admin UI input saves and loads correctly
  • Verify migration upgrades index to UNIQUE without errors
  • Test with PostgreSQL and MySQL/MariaDB

🤖 Generated with Claude Code

Rikdekker and others added 2 commits March 21, 2026 12:39
…base growth

Add a new admin setting `mail.max_sync_days` that limits how far back the
Mail app synchronizes messages from IMAP to the local database. At scale
(10,000+ accounts), the database grows unbounded — this setting reduces it
by 80-95% while preserving full access to older messages via IMAP search
and on-demand fetching.

Components:
- Admin UI + API endpoint for setting the sync window
- SEARCH SINCE filter on initial sync (RFC 3501)
- Daily SyncWindowCleanupJob to purge old records (batched)
- BackfillSearchResultsJob for search results outside the window
- OnDemandSyncJob for scroll-past-cache older message loading
- DeepSyncJob to offload deep-sync from the main cron cycle

Default value 0 (unlimited) preserves current behavior — no breaking change.

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix import ordering (alphabetical) in SyncJob.php, SyncWindowCleanupJob.php
- Use single quotes for strings without interpolation
- Remove redundant do-while condition in SyncWindowCleanupJob
- Remove non-existent PostgreSQL120Platform class reference
- Fix dateSearch() signature: remove invalid 'UTC' 4th argument

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Rikdekker Rikdekker force-pushed the feat/max-sync-days branch from c76b869 to cc5ef1d Compare March 21, 2026 11:39
Rikdekker and others added 5 commits March 21, 2026 12:43
insertBulkIgnore() uses INSERT IGNORE / ON CONFLICT DO NOTHING for
deduplication, which requires a UNIQUE constraint on (mailbox_id, uid).
The existing index is non-unique, so without this migration duplicate
rows would be silently created.

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix constructor closing brace placement in ImapToDbSynchronizer
- Use single quotes for non-interpolated string starts
- Remove redundant (int) cast on lastInsertId()
- Add null coalescing for mb_strcut() arguments

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move first constructor param to new line (php-cs multi-line rule)
- Add trailing comma after last constructor param
- Fix CRLF → LF line endings in migration file
- Remove redundant ?? '' on getEmail() (already non-null after null check)

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MailSearchTest: add IConfig, ImapToDbSynchronizer, IJobList mocks
- AdminSettingsTest: expect 15 provideInitialState calls (was 14)
- Fix psalm PossiblyNullArgument on untagMessage via type assertion

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix alphabetical import order in MailSearchTest
- Skip testDeleteDuplicateUids: the new UNIQUE(mailbox_id, uid)
  constraint prevents the duplicate inserts this test relies on

Signed-off-by: Rikdekker <Rikdekker@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChristophWurst
Copy link
Copy Markdown
Member

Thank you for your contribution!

I have three question

  1. How is the time-based cutoff handled for the user? Will they reach the threshold and see no more messages as if none existed or is there an explanatory text?
  2. How do you handle the user's ability to change the sort order? Does sorting by oldest emails first show the oldest emails of the mailbox, or the oldest email of the time period?
  3. How is threading handled? The app has its own threading algorithm because the IMAP threading feature is limited to a mailbox while ours can work across mailboxes and deliver a conversation style thread with received and sent emails. This algorithm needs a holistic view of all emails to construct a tree of emails. With thread constructed from the references header it is possible to see ancestors without them being present, but for threads constructed from the in-reply-to header it's necessary to have all ancestors. With the time-based cutoff this information will be missing. Emails that belong to one thread may be wrongly split into multiple threads.

Copy link
Copy Markdown
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about the code

Comment on lines +23 to +25
* Instead of blocking the HTTP request with a synchronous IMAP fetch,
* MailSearch schedules this job. The frontend retries after a short delay
* and picks up the newly synced messages from the local DB.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A standard deployment has a cron interval of 5 minutes. That means the asynchronous job will be delayed up to 5 minutes if the cron queue is empty, potentially longer if there are more time critical jobs in the queue.

->from($this->getTableName())
->where($query->expr()->eq('mailbox_id', $query->createNamedParameter($mailbox->getId(), IQueryBuilder::PARAM_INT), IQueryBuilder::PARAM_INT))
->andWhere($query->expr()->in('uid', $query->createNamedParameter($chunk, IQueryBuilder::PARAM_INT_ARRAY)));
$existing = array_merge($existing, $this->findUids($query));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf: array_merge in a loop is very memory hungy

Comment on lines +598 to +599
* Messages are marked as structure_analyzed to skip the PreviewEnhancer's
* second IMAP connection -- structure is loaded lazily when opened.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure is loaded, true, but the meta data acquired and persisted during preprocessing will then be missing. That is

  • Flag if email has attachments
  • Preview text
  • Flag if it's an IMIP message
  • Flag if it's encrypted
  • Flag if it's mentioning the user

* to set up and the deleteDuplicateUids() method unnecessary.
*/
public function testDeleteDuplicateUids(): void {
$this->markTestSkipped('UNIQUE(mailbox_id, uid) constraint prevents duplicate inserts');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case the test can go

@ChristophWurst
Copy link
Copy Markdown
Member

Heads-up: I have some ideas for an alternative approach that reduces database size while keeping UX, sorting and threading intact. I'll draft a ticket shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants