Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,14 @@ internal class SubscriptionOperationExecutor(
val startingOp = operations.first()

return if (startingOp is CreateSubscriptionOperation) {
createSubscription(startingOp, operations)
// If the subscription already exists on the backend (non-local id), POSTing
// to /subscriptions with that id is silently a server-side no-op and drops
// the merged enabled/status payload. Dispatch as an update instead.
if (!IDManager.isLocalId(startingOp.subscriptionId)) {
updateExistingSubscriptionFromCreate(startingOp, operations)
} else {
createSubscription(startingOp, operations)
}
Comment on lines 59 to +67
} else if (operations.any { it is DeleteSubscriptionOperation }) {
if (operations.size > 1) {
throw Exception("Only supports one operation! Attempted operations:\n$operations")
Expand All @@ -76,6 +83,30 @@ internal class SubscriptionOperationExecutor(
}
}

private suspend fun updateExistingSubscriptionFromCreate(
createOperation: CreateSubscriptionOperation,
operations: List<Operation>,
): ExecutionResponse {
// if there are any deletes all operations should be tossed, nothing to do.
if (operations.any { it is DeleteSubscriptionOperation }) {
return ExecutionResponse(ExecutionResult.SUCCESS)
}

val lastUpdateOperation = operations.lastOrNull { it is UpdateSubscriptionOperation } as UpdateSubscriptionOperation?
val patchOp =
UpdateSubscriptionOperation(
createOperation.appId,
createOperation.onesignalId,
createOperation.externalId,
createOperation.subscriptionId,
createOperation.type,
lastUpdateOperation?.enabled ?: createOperation.enabled,
lastUpdateOperation?.address ?: createOperation.address,
lastUpdateOperation?.status ?: createOperation.status,
)
return updateSubscription(patchOp, listOf(patchOp))
Comment on lines +86 to +107
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Routing every CreateSubscriptionOperation with a non-local subscriptionId through PATCH breaks the existing 404/MISSING recovery flow in updateSubscription(): outside the missing-retry window, updateSubscription returns FAIL_NORETRY with a follow-up CreateSubscriptionOperation that re-uses the same non-local subscriptionId, which is now re-routed back through updateExistingSubscriptionFromCreate -> PATCH -> 404 -> ..., looping indefinitely with no escape. Suggested fix: catch BackendException(404) inside updateExistingSubscriptionFromCreate and fall back to createSubscription (POST) so the rebuild-user / FAIL_NORETRY recovery in createSubscription can terminate the cycle (or null out the id on the recovery op so it forces POST).

Extended reasoning...

Bug

The new updateExistingSubscriptionFromCreate helper unconditionally dispatches every CreateSubscriptionOperation whose subscriptionId is non-local through updateSubscription (PATCH /subscriptions/{id}). When that PATCH returns 404 outside the missing-retry window, updateSubscription (lines 235-260) returns FAIL_NORETRY with a recovery op:

ExecutionResponse(
    ExecutionResult.FAIL_NORETRY,
    operations = listOf(
        CreateSubscriptionOperation(
            lastOperation.appId,
            lastOperation.onesignalId,
            lastOperation.subscriptionId, // <-- same non-local id
            ...
        ),
    ),
)

OperationRepo.executeOperations (OperationRepo.kt:314-322) enqueues response.operations at the front of the queue regardless of the FAIL_NORETRY result. On the next pass, startingOp is CreateSubscriptionOperation and IDManager.isLocalId(...) is still false, so it re-enters updateExistingSubscriptionFromCreate -> PATCH -> 404 -> another CreateSubscriptionOperation with the same id... loop.

Step-by-step proof

  1. Operation queue contains CreateSubscriptionOperation(subscriptionId = "remote-uuid", ...) (e.g. server-side deletion happened after the local model cached the server uuid).
  2. execute() (line 56) sees startingOp is CreateSubscriptionOperation and !IDManager.isLocalId("remote-uuid") is true -> calls updateExistingSubscriptionFromCreate.
  3. The helper builds a patchOp and calls updateSubscription(patchOp, listOf(patchOp)).
  4. Backend returns 404. _newRecordState.isInMissingRetryWindow(...) is false (window expired or never opened). updateSubscription enters the MISSING branch and returns FAIL_NORETRY + a CreateSubscriptionOperation re-using lastOperation.subscriptionId = "remote-uuid".
  5. OperationRepo.executeOperations removes the original op (FAIL_NORETRY clears retries) and if (response.operations != null) enqueues the new CreateSubscriptionOperation("remote-uuid") at the front.
  6. Loop repeats from step 2. Each iteration produces a fresh CreateSubscriptionOperation UUID but the same non-local subscriptionId, and retries never accumulates because each cycle is FAIL_NORETRY. Backoff is whatever delayBeforeNextExecution(0, null) gives - effectively nothing.

Why pre-PR code did not have this problem

Pre-PR, the recovery CreateSubscriptionOperation went through createSubscription (POST). On 404, createSubscription's MISSING handler calls _buildUserService.getRebuildOperationsIfCurrentUser(...), which either bootstraps a rebuild (terminating the create cycle for this id) or returns null -> terminating FAIL_NORETRY. Either way the cycle ends. The new helper has no such fallback and no 404 handling at all - it just delegates to updateSubscription and lets the recovery op come back to the same branch.

Impact

Trigger scenarios are uncommon but plausible: server-side subscription deletion (admin/GDPR cleanup, manual REST delete, server lifecycle), or any case where the cached server uuid no longer exists on the backend. Consequence is unbounded queue churn - the SDK will hammer PATCH /subscriptions/{id} indefinitely with no client-side termination, draining battery and traffic until the process dies. There is no test exercising this path.

Fix

Either:

  • Catch BackendException with 404 in updateExistingSubscriptionFromCreate and fall through to createSubscription (POST), restoring the rebuild-user recovery path; or
  • In updateSubscription's 404 handler, set subscriptionId = null on the rebuilt CreateSubscriptionOperation so it is forced through the local-id POST branch on re-execution.

A unit test exercising PATCH-404-outside-missing-retry-window -> recovery should be added.

🔬 also observed by copilot-pr-reviewer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherwinski should we look at this?

}

private suspend fun createSubscription(
createOperation: CreateSubscriptionOperation,
operations: List<Operation>,
Expand All @@ -91,12 +122,11 @@ internal class SubscriptionOperationExecutor(
val enabled = lastUpdateOperation?.enabled ?: createOperation.enabled
val address = lastUpdateOperation?.address ?: createOperation.address
val status = lastUpdateOperation?.status ?: createOperation.status
val subId = if (!IDManager.isLocalId(createOperation.subscriptionId)) createOperation.subscriptionId else null

try {
val subscription =
SubscriptionObject(
id = subId,
id = null,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert(createOperation.type),
address,
enabled,
Expand Down Expand Up @@ -220,7 +250,12 @@ internal class SubscriptionOperationExecutor(
) {
return ExecutionResponse(ExecutionResult.FAIL_RETRY, retryAfterSeconds = ex.retryAfterSeconds)
}
// toss this, but create an identical CreateSubscriptionOperation to re-create the subscription being updated.
// The original update target no longer exists on the backend, so issue
// a fresh local-id Create. A non-local id here would re-route through
// updateExistingSubscriptionFromCreate -> PATCH -> 404 indefinitely because
// execute() dispatches non-local-id Creates as PATCHes; only a local id
// forces the POST/rebuild-user recovery path that can actually re-create
// the subscription.
ExecutionResponse(
ExecutionResult.FAIL_NORETRY,
operations =
Expand All @@ -229,7 +264,7 @@ internal class SubscriptionOperationExecutor(
lastOperation.appId,
lastOperation.onesignalId,
lastOperation.externalId,
lastOperation.subscriptionId,
IDManager.createLocalId(),
lastOperation.type,
lastOperation.enabled,
lastOperation.address,
Expand Down
Loading
Loading