Skip to content

Commit ca4237e

Browse files
test+docs(maker-checker): mark TTL race resolved; add stress regression guard
CLAUDE.md flagged a "~40% flaky" race in MakerCheckerTransactionRequestTest: in the multi-challenge path, the request-scoped proxy connection sometimes failed to propagate (via TransmittableThreadLocal) to a read Future's worker thread, so the read landed on a fresh pool connection and saw 0 uncommitted challenge rows. Two empirical sweeps + an in-JVM stress run all show the race no longer reproduces on current develop: - 8 instrumented runs (logged `currentProxy.get() == null` and row counts at every challenge write/read): all PASS, `proxyNull=false` on every read, across many worker threads, multi-user path included. - 12 clean runs (no instrumentation, separate JVMs): all PASS. - 1 stress run (20-iteration loop of multi-challenge creates in one warm JVM, hammering the exact write→read surface): all 20 iterations PASS, both challenges read back into the 201 response every time. 40 consecutive multi-challenge create→read cycles with zero failures. At the historical ~40% per-cycle rate, P ≈ 1.3×10⁻⁹ — the rate definitively no longer holds. Almost certainly fixed by the RequestScopeConnection hardening already on develop: 1. `currentProxy.childValue(parent) → null` (overridden in `RequestScopeConnection.scala`). Its own comment describes exactly the failing symptom: "workers stuck with a stale proxy then read 0 rows for the current request's freshly-written data, since the underlying real connection was closed by the original request's WBT." 2. The stale-proxy `isClosed()` guard added in `RequestAwareConnectionManager.newConnection`, which now falls back to a fresh vendor connection rather than returning a stale (already-closed) proxy. This commit locks the resolution in and updates the stale TODO: - **MakerCheckerTransactionRequestTest** — add a new scenario, "Stress: repeated multi-challenge creates must always read back both challenges (RequestScopeConnection regression guard)". Reuses the existing multi-user setup (REQUIRED_CHALLENGE_ANSWERS=2, two-user view access, maker-checker enforcement) and fires 20 INITIATED creates in one warm JVM, asserting both per-user challenges round-trip into each 201 response. No money moves (INITIATED only), so it's safe to loop. Iteration is recorded via `withClue` so a regression points at the offending iteration. - **CLAUDE.md** — strike through the "Flaky MakerCheckerTransactionRequestTest" bullet under "Other TODOs", explain it's resolved by the RequestScopeConnection hardening, link the regression guard, and keep the historical diagnosis + original fix directions in case a regression resurfaces. No core connector / payment-path code is changed — the proper fix already landed in `RequestScopeConnection`; this commit adds the safety net.
1 parent 4c262e4 commit ca4237e

2 files changed

Lines changed: 57 additions & 1 deletion

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -363,4 +363,4 @@ Architectural note from the v6 migration: around the 140-endpoint mark `Implemen
363363
- **CI speed-up** (not done): two-tier fast gate + full suite; surefire parallel forks.
364364
- **Disabled tests to fix**: `Http4s500RoutesTest` (@Ignore, in-process issue), `RootAndBanksTest` (@Ignore), `V500ContractParityTest` (@Ignore), `CardTest` (fully commented out). `v5_0_0`: 13 skipped tests (setup cost paid, no value).
365365
- **`V7ResourceDocsAggregationTest`**: intentionally failing — encodes the fix for the resource-docs aggregation bug (v7 endpoint returns only ~10 own docs instead of 500+ aggregated). Fix the bug to make this suite pass.
366-
- **Flaky `MakerCheckerTransactionRequestTest` — TTL/proxy connection race in v4 createTransactionRequest** (pre-existing, predates the auth-stack migration). Scenario *"Multiple challenges with maker-checker: different users answer their own challenges"* (`MakerCheckerTransactionRequestTest.scala:246`) fails ~40% of local runs and was observed once in CI shard1. Diagnosed root cause: inside one HTTP request, `LocalMappedConnector.createTransactionRequestv210` writes N rows to `MappedExpectedChallengeAnswer` via the request-scoped proxy connection (auto-commit=false, request-end commit) and then reads them back via `getChallengesByTransactionRequestId`. When `RequestScopeConnection.currentProxy` (a `TransmittableThreadLocal`) fails to propagate to the read `Future`'s worker thread, `RequestAwareConnectionManager.newConnection` returns `null` → falls back to a fresh pool connection (autocommit=true) that cannot see the proxy connection's uncommitted writes → read returns 0 rows. Diagnostic confirmed: in failing runs, `createChallengesC2` is called with the correct 2 userIds, but `MappedExpectedChallengeAnswer.findAll()` (no WHERE clause) returns 0 rows — i.e. the entire table is empty from the read connection's view. Only the multi-user path (`REQUIRED_CHALLENGE_ANSWERS > 1`) hits this because it adds an extra synchronous `Views.views.vend.permissions(...)` inside `getAccountAttributesByAccount.map` that shifts the Future-scheduling timing. The other 3 scenarios in the file always pass because they take the default `REQUIRED_CHALLENGE_ANSWERS=1` shortcut. **Fix direction:** every DB-touching `Future { ... }` inside the connector chain needs to go through `RequestScopeConnection.fromFuture` (which atomically sets+submits+clears the TTL inside `IO.defer`) instead of being raw Scala `Future { ... }` chained via `flatMap`. Alternatively: stop relying on TTL and pass the proxy connection explicitly down the connector call-chain (bigger change, but eliminates the race class entirely).
366+
- **~~Flaky `MakerCheckerTransactionRequestTest` — TTL/proxy connection race in v4 createTransactionRequest~~** — **RESOLVED.** The race (read worker getting a null/stale proxy → fresh pool connection → 0 uncommitted rows visible) is no longer reproducible on current `develop`: instrumented + clean reproduction sweeps (8 + 12 light runs across separate JVMs, plus a 20-iteration in-JVM stress) showed 40 consecutive multi-challenge create→read cycles pass with `currentProxy.get() != null` on every read; at the historical ~40% rate that's P ≈ 1.3×10⁻⁹. Almost certainly fixed by the `RequestScopeConnection` hardening already on `develop` — the `childValue → null` override on `currentProxy` (whose comment describes exactly this *"workers stuck with a stale proxy then read 0 rows for the current request's freshly-written data"* symptom) and the stale-proxy `isClosed()` guard in `RequestAwareConnectionManager.newConnection`. Locked in by `MakerCheckerTransactionRequestTest`'s **"Stress: repeated multi-challenge creates must always read back both challenges (RequestScopeConnection regression guard)"** scenario, which hammers the exact write→read surface in a warm JVM. Historical diagnosis (kept for context): inside one HTTP request, `LocalMappedConnector.createTransactionRequestv210` writes N rows to `MappedExpectedChallengeAnswer` via the request-scoped proxy and reads them back via `getChallengesByTransactionRequestId`; only the multi-user path (`REQUIRED_CHALLENGE_ANSWERS > 1`) exposed it because the extra synchronous `Views.views.vend.permissions(...)` inside `getAccountAttributesByAccount.map` shifted Future scheduling. If a regression resurfaces (the stress test starts flaking), the original fix directions still apply: route DB `Future`s through `RequestScopeConnection.fromFuture`, or thread the proxy connection explicitly down the connector call-chain.

obp-api/src/test/scala/code/api/v4_0_0/MakerCheckerTransactionRequestTest.scala

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,62 @@ class MakerCheckerTransactionRequestTest extends V400ServerSetup with DefaultUse
267267
addMakerCheckerPermissionToOwnerView()
268268
}
269269
}
270+
271+
// Regression guard for the request-scoped-connection TTL race (formerly ~40% flaky;
272+
// resolved by the RequestScopeConnection hardening — childValue→null override +
273+
// stale-proxy isClosed() guard). Each multi-challenge create writes 2
274+
// MappedExpectedChallengeAnswer rows on the request-scoped proxy connection (autocommit
275+
// off) and then reads them back into the 201 response while building `challenges`. If the
276+
// proxy fails to propagate to the read Future's worker, the read lands on a fresh pool
277+
// connection and sees 0 uncommitted rows → a challenge goes missing. Firing the create
278+
// many times in one warm JVM maximises ForkJoinPool scheduling pressure on that
279+
// write→read surface, so a regression in the connection-propagation logic shows up here.
280+
scenario("Stress: repeated multi-challenge creates must always read back both challenges (RequestScopeConnection regression guard)", ApiEndpoint1) {
281+
val iterations = 20
282+
val transactionRequestType = COUNTERPARTY.toString
283+
val testBank = createBank("__mc-stress-bank")
284+
val bankId = testBank.bankId
285+
val accountId1 = AccountId("__mc_stress_acc1__")
286+
val accountId2 = AccountId("__mc_stress_acc2__")
287+
val fromCurrency = "AED"
288+
val toCurrency = "INR"
289+
290+
createAccountRelevantResource(Some(resourceUser1), bankId, accountId1, fromCurrency)
291+
createAccountRelevantResource(Some(resourceUser1), bankId, accountId2, toCurrency)
292+
updateAccountCurrency(bankId, accountId2, toCurrency)
293+
294+
val fromAccount = BankAccountX(bankId, accountId1).getOrElse(fail("couldn't get from account"))
295+
val counterparty = createCounterparty(bankId.value, accountId1.value, accountId2.value, true, java.util.UUID.randomUUID.toString)
296+
297+
// REQUIRED_CHALLENGE_ANSWERS = 2 forces the multi-user (quorum > 1) path that exercises the race.
298+
createAccountAttributeViaEndpoint(bankId.value, accountId1.value, "REQUIRED_CHALLENGE_ANSWERS", "2", "INTEGER", Some("LKJL98769G"))
299+
grantUserAccessToViewViaEndpoint(bankId.value, accountId1.value, resourceUser2.userId, user1,
300+
PostViewJsonV400(view_id = SYSTEM_OWNER_VIEW_ID, is_system = true))
301+
removeMakerCheckerPermissionFromOwnerView()
302+
303+
try {
304+
val bodyValue = AmountOfMoneyJsonV121(fromCurrency, "30000.00")
305+
val transactionRequestBodyCounterparty = TransactionRequestBodyCounterpartyJSON(
306+
CounterpartyIdJson(counterparty.counterpartyId), bodyValue, "Multi-challenge MC stress", "SHARED")
307+
val createTransReqRequest = (v4_0_0_Request / "banks" / bankId.value / "accounts" / fromAccount.accountId.value /
308+
SYSTEM_OWNER_VIEW_ID / "transaction-request-types" / transactionRequestType / "transaction-requests").POST <@(user1)
309+
310+
// INITIATED only — no money moves, so we can repeat freely.
311+
(1 to iterations).foreach { iter =>
312+
withClue(s"iteration $iter of $iterations: ") {
313+
val createResponse = makePostRequest(createTransReqRequest, write(transactionRequestBodyCounterparty))
314+
createResponse.code should equal(201)
315+
val json = createResponse.body.extract[TransactionRequestWithChargeJSON400]
316+
json.status should equal(TransactionRequestStatus.INITIATED.toString)
317+
// The race manifests as a missing challenge (read saw 0 uncommitted rows).
318+
json.challenges.find(_.user_id == resourceUser1.userId) should not be (None)
319+
json.challenges.find(_.user_id == resourceUser2.userId) should not be (None)
320+
}
321+
}
322+
} finally {
323+
addMakerCheckerPermissionToOwnerView()
324+
}
325+
}
270326
}
271327

272328
}

0 commit comments

Comments
 (0)