Skip to content

Commit 70e9726

Browse files
waleedlatif1claude
andcommitted
fix(sync-engine): dedup externalIds, enable deletion reconciliation, merge sourceUrl
Three sync engine gaps identified during audit: 1. Duplicate externalId guard: if a connector returns the same externalId across pages (pagination overlap), skip the second occurrence to prevent unique constraint violations on add and double-uploads on update. 2. Deletion reconciliation: previously required explicit fullSync or syncMode='full', meaning docs deleted from the source accumulated in the KB forever. Now runs on all non-incremental syncs (which return ALL docs). Includes a safety threshold: if >50% of existing docs (and >5 docs) would be deleted, skip and warn — protects against partial listing failures. Explicit fullSync bypasses the threshold. 3. sourceUrl merge: hydration now picks up sourceUrl from getDocument, falling back to the stub's sourceUrl if getDocument doesn't set one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8c409c2 commit 70e9726

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

apps/sim/lib/knowledge/connectors/sync-engine.ts

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,7 @@ export async function executeSync(
403403

404404
const pendingOps: DocOp[] = []
405405
for (const extDoc of externalDocs) {
406+
if (seenExternalIds.has(extDoc.externalId)) continue
406407
seenExternalIds.add(extDoc.externalId)
407408

408409
if (excludedExternalIds.has(extDoc.externalId)) {
@@ -472,6 +473,7 @@ export async function executeSync(
472473
content: fullDoc.content,
473474
contentHash: hydratedHash,
474475
contentDeferred: false,
476+
sourceUrl: fullDoc.sourceUrl ?? op.extDoc.sourceUrl,
475477
metadata: { ...op.extDoc.metadata, ...fullDoc.metadata },
476478
},
477479
}
@@ -552,15 +554,26 @@ export async function executeSync(
552554
}
553555
}
554556

555-
// Skip deletion reconciliation during incremental syncs — results only contain changed docs
556-
if (!isIncremental && (options?.fullSync || connector.syncMode === 'full')) {
557+
// Reconcile deletions for non-incremental syncs (which return ALL docs).
558+
// Skip for incremental syncs since results only contain changed docs.
559+
if (!isIncremental) {
557560
const removedIds = existingDocs
558561
.filter((d) => d.externalId && !seenExternalIds.has(d.externalId))
559562
.map((d) => d.id)
560563

561564
if (removedIds.length > 0) {
562-
await hardDeleteDocuments(removedIds, syncLogId)
563-
result.docsDeleted += removedIds.length
565+
const deletionRatio =
566+
existingDocs.length > 0 ? removedIds.length / existingDocs.length : 0
567+
568+
if (deletionRatio > 0.5 && removedIds.length > 5 && !options?.fullSync) {
569+
logger.warn(
570+
`Skipping deletion of ${removedIds.length}/${existingDocs.length} docs — exceeds safety threshold. Trigger a full sync to force cleanup.`,
571+
{ connectorId, deletionRatio: Math.round(deletionRatio * 100) }
572+
)
573+
} else {
574+
await hardDeleteDocuments(removedIds, syncLogId)
575+
result.docsDeleted += removedIds.length
576+
}
564577
}
565578
}
566579

0 commit comments

Comments
 (0)