SDK-471 Fix Swift access race in Pending/Fulfill#1063
Open
sumeruchat wants to merge 2 commits into
Open
Conversation
Thread Sanitizer flagged concurrent read/write on `successCallbacks`, `errorCallbacks`, and `result` because `Pending` had zero synchronization while URLSession delegate queues, persistenceContext, and main thread all mutate the same instance during flows like `setUserId`. Serialize all state mutations through a private `DispatchQueue` (matches the `AuthManager.callbackQueue` convention) and snapshot callbacks before invoking them outside the critical section so re-entrant registrations do not deadlock. Folds in three related correctness fixes: - `onCompletion` now appends both callbacks atomically (previously a `.failure` resolve landing between the two appends would lose the error handler). - Late registration on an already-resolved `Pending` now fires only the newly-registered callback (was: iterated the full accumulated array, causing O(N^2) firing). - Double-resolve / double-reject are now ignored with a debug log (was: silently double-fired through `didSet`). Adds `PendingConcurrencyTests` covering concurrent register+resolve, `onCompletion` atomicity under race, re-entrant callback registration, flatMap cross-queue resolution, late-register-after-resolve, double resolve/reject, and a 1000-iteration stress run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1063 +/- ##
==========================================
+ Coverage 71.11% 71.22% +0.10%
==========================================
Files 112 112
Lines 9328 9337 +9
==========================================
+ Hits 6634 6650 +16
+ Misses 2694 2687 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Address review feedback: the previous commit over-scoped the fix by turning `Fulfill` into a one-shot promise. In-tree callers depend on `Fulfill` being usable as a multi-event broadcast — `MockInAppDisplayer` calls `onShow.resolve(with:)` once per shown in-app message and expects the single registered `onSuccess` to fire each time. The one-shot guard broke `testAutoShowInAppMultipleWithOrdering` and similar tests. This commit narrows the fix to thread-safety only: - `setResult` no longer drops duplicate resolves and no longer drains the callback arrays. Each call overwrites `result` and fires every currently-registered callback against a snapshot taken under the lock. Callbacks still fire OUTSIDE the critical section so re-entrant registrations cannot deadlock. - `onSuccess` / `onError` / `onCompletion` always append the callback, including on already-resolved Pendings, so future resolves continue to notify late-registered listeners (matching prior behavior). - The O(N^2) latent bug remains fixed: late registration on a resolved Pending replays the stored result for the newly-registered callback only, instead of iterating the full accumulated array. - `onCompletion` still appends both callbacks inside a single `stateQueue.sync`, eliminating the partial-append race. Tests updated: - `testOnCompletionAtomicityUnderRace` switched from a 50ms `asyncAfter` drain (flaky under TSan / busy CI) to a `DispatchGroup`-driven assert that waits on actual completion before reading counters. - Removed `testDoubleResolveIsIgnored`, `testDoubleRejectIsIgnored`, and `testResolveAfterRejectIsIgnored` — those semantics are no longer claimed. - Added `testRepeatedResolveNotifiesRegisteredCallbacks`, `testRepeatedRejectNotifiesRegisteredCallbacks`, `testResolveAfterRejectNotifiesMatchingBranches`, and `testLateRegisterAfterResolveStaysRegisteredForFutureResolves` to pin down the preserved multi-event behavior. Verified locally: PendingTests 9/9, PendingConcurrencyTests 11/11, InAppTests 43/43 (including the previously-broken case), AuthTests 37/37. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Thread Sanitizer flagged a Swift access race inside
Pending/Fulfill(reproducible in customer apps, GitHub issue #767, easiest repro viasetUserId). All mutable state —successCallbacks,errorCallbacks, andresult— was accessed with zero synchronization while URLSession delegate queues, persistenceContext, and main thread routinely fan into the same instance.This change serializes all state mutations behind a private
DispatchQueue(matching the establishedAuthManager.callbackQueuepattern) and snapshots callbacks before invoking them outside the critical section so re-entrant registrations cannot deadlock. Multi-event semantics ofFulfill(used byMockInAppDisplayer.onShowand similar in-tree callers) are preserved.Changes
swift-sdk/Internal/Pending.swift— All reads/writes ofresult,successCallbacks,errorCallbacksnow run insidestateQueue.sync. Callbacks fire OUTSIDE the sync block.Fulfill.resolve/rejectroute through a newsetResult(_:)instead of relying ondidSet. Repeated resolution / rejection still notifies every registered callback (Fulfillis still usable as an event signal — no behavior change).onCompletionatomicity — both callbacks are appended inside one sync block; a.failureresolve landing between the two appends can no longer lose the error handler.tests/unit-tests/PendingConcurrencyTests.swift— 11 new tests covering concurrent register+resolve,onCompletionatomicity under race (deterministic viaDispatchGroup), re-entrant callback registration, flatMap cross-queue resolution, late-register-after-resolve, repeated resolve/reject behaviour, mixed resolve-after-reject, and a 1000-iteration stress run.Impact
dispatch_syncper operation.Testing
Automated (verified locally):
./agent_test.sh PendingTests— all 9 original tests pass./agent_test.sh PendingConcurrencyTests— all 11 new tests pass./agent_test.sh InAppTests— 43/43 pass (coversMockInAppDisplayer.onShowmulti-event use)./agent_test.sh AuthTests— 37/37 passHow to verify TSan locally:
swift-sdk.xcodeprojin Xcodeswift-sdkscheme → Test → Diagnostics → enable Thread SanitizerPendingConcurrencyTests— expect zero TSan reportssetUserIdflow against a sample app with TSan enabled — the original prestolabs warning should be goneEdge cases covered:
onSuccessregistrations from a concurrent queueonErrorregistrationsonCompletion(value:error:)partial-append race (200 iterations,DispatchGroup-synchronized)onSuccessregistration from inside an executing callbackflatMapchain crossing 3 queues (background → main → background)resolve/rejectnotifies every registered callbackrejectthenresolvenotifies the matching branch each timeconcurrentPerform-based stress run of 1000 Fulfills