JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899
JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899nhachicha wants to merge 44 commits intomongodb:backpressurefrom
Conversation
…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…mmediate retry without backoff.
- Adding logic for DRIVERS-3391
…lBackoffTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
|
The commit corresponding to the last reviewed commit is the previous PR is fda498c. |
stIncMale
left a comment
There was a problem hiding this comment.
I haven't finished reviewing the changes.
| import static org.junit.jupiter.api.Assertions.assertEquals; | ||
| import static org.junit.jupiter.api.Assertions.assertTrue; | ||
|
|
||
| public class ExponentialBackoffTest { |
There was a problem hiding this comment.
Original suggestion #1852 (comment) was auto-resolved, but the corresponding change 1655dad hasn't been included in the current PR:
| public class ExponentialBackoffTest { | |
| class ExponentialBackoffTest { |
| String.format("Attempt %d: backoff should be 0-%d ms, got: %d", attemptNumber, | ||
| expectedBackoff, backoff)); |
There was a problem hiding this comment.
Original suggestion #1852 (comment) was auto-resolved, but the corresponding change f5274cb hasn't been fully included in the current PR:
| String.format("Attempt %d: backoff should be 0-%d ms, got: %d", attemptNumber, | |
| expectedBackoff, backoff)); | |
| String.format("Attempt %d: backoff should be between 0 ms and %d ms, got: %d", attemptNumber, | |
| expectedBackoff, backoff)); |
| @Test | ||
| void testCalculateTransactionBackoffMsRespectsMaximum() { | ||
| // Even at high attempt numbers, backoff should never exceed ExponentialBackoff.TRANSACTION_MAX_MS | ||
| for (int attemptNumber = 1; attemptNumber < 26; attemptNumber++) { |
There was a problem hiding this comment.
Let's make the number of iterations here based on EXPECTED_BACKOFFS_MAX_VALUES.length (this way it is more clear than when a magic number like 26 is used)
| for (int attemptNumber = 1; attemptNumber < 26; attemptNumber++) { | |
| for (int attemptNumber = 1; attemptNumber < EXPECTED_BACKOFFS_MAX_VALUES.length * 2; attemptNumber++) { |
| } | ||
| } | ||
|
|
||
| private long measureTransactionLatency(final double jitter, final BsonDocument failPointDocument) throws InterruptedException { |
There was a problem hiding this comment.
Let's either specify the units returned in the method name, like measureTransactionLatencyMs, or, better, return Duration, and let the calling code do the conversion by calling, for example, Duration.toMillis, if needed.
| public void testRetryBackoffIsEnforced() throws InterruptedException { | ||
| // Run with jitter = 0 (no backoff) | ||
| ExponentialBackoff.setTestJitterSupplier(() -> 0.0); | ||
|
|
||
| BsonDocument failPointDocument = BsonDocument.parse("{'configureFailPoint': 'failCommand', 'mode': {'times': 13}, " | ||
| + "'data': {'failCommands': ['commitTransaction'], 'errorCode': 251}}"); | ||
|
|
||
| long noBackoffTime; | ||
| try (ClientSession session = client.startSession(); | ||
| FailPoint ignored = FailPoint.enable(failPointDocument, getPrimary())) { | ||
| StartTime startTime = StartTime.now(); | ||
| session.withTransaction(() -> collection.insertOne(session, Document.parse("{}"))); | ||
| noBackoffTime = startTime.elapsed().toMillis(); | ||
| } finally { | ||
| // Clear the test jitter supplier to avoid affecting other tests | ||
| ExponentialBackoff.clearTestJitterSupplier(); | ||
| } | ||
|
|
||
| // Run with jitter = 1 (full backoff) | ||
| ExponentialBackoff.setTestJitterSupplier(() -> 1.0); | ||
|
|
||
| failPointDocument = BsonDocument.parse("{'configureFailPoint': 'failCommand', 'mode': {'times': 13}, " | ||
| final BsonDocument failPointDocument = BsonDocument.parse("{'configureFailPoint': 'failCommand', 'mode': {'times': 13}, " | ||
| + "'data': {'failCommands': ['commitTransaction'], 'errorCode': 251}}"); | ||
|
|
||
| long withBackoffTime; | ||
| try (ClientSession session = client.startSession(); | ||
| FailPoint ignored = FailPoint.enable(failPointDocument, getPrimary())) { | ||
| StartTime startTime = StartTime.now(); | ||
| session.withTransaction(() -> collection.insertOne(session, Document.parse("{}"))); | ||
| withBackoffTime = startTime.elapsed().toMillis(); | ||
| } finally { | ||
| ExponentialBackoff.clearTestJitterSupplier(); | ||
| } | ||
| long noBackoffTime = measureTransactionLatency(0.0, failPointDocument); | ||
| long withBackoffTime = measureTransactionLatency(1.0, failPointDocument); |
There was a problem hiding this comment.
Code deduplication in this test was achieved by extracting the duplicated code into measureTransactionLatency. There was a separate comment about the duplicated string that represents the failpoint document, but that comment wasn't about saving the computing resources by avoiding to parse the document more than once.
Having failPointDocument outside of the measureTransactionLatency method hurts readability. It may in principle be tolerable if the performance improvement its provides is worth the compromise. However, that's not the case here, as performance is mostly irrelevant.
Let's move the whole final BsonDocument failPointDocument = BsonDocument.parse... into failPointDocument.
| public <T> T withTransaction(final TransactionBody<T> transactionBody, final TransactionOptions options) { | ||
| notNull("transactionBody", transactionBody); | ||
| TimeoutContext withTransactionTimeoutContext = createTimeoutContext(options); | ||
| final boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS(); |
There was a problem hiding this comment.
[just information]
Us using both timeoutOrAlternative (or getTimeout) and hasTimeoutMS now got me thinking if it's obvious that getTimeout() is null iff getTimeoutSettings().getTimeoutMS() is null. It is not, I had to spend time analyzing the constructors and how they are called to verify that. To make this invariant explicit, and to enforce it, I created #1908.
| } | ||
| } | ||
|
|
||
| private static MongoException timeoutException(final boolean hasTimeoutMS, final Throwable cause) { |
There was a problem hiding this comment.
Let's rename hasTimeoutMS to something like timeoutMsConfigured or csotConfigured? That way the name does not beg the question "what is the thing that has the timeoutMS?".
The same applies to the hasTimeoutMS parameter of the backoff method, and to the local variable hasTimeoutMS in the withTransaction method.
| private TimeoutContext(final TimeoutSettings timeoutSettings, @Nullable final Timeout timeout) { | ||
| this(false, timeoutSettings, timeout); | ||
| } |
There was a problem hiding this comment.
When we removed the copyTimeoutContext method in this PR, the constructor TimeoutContext(final TimeoutSettings timeoutSettings, @Nullable final Timeout timeout) became dead code. Let's remove the constructor.
| final boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS(); | ||
| Timeout withTransactionTimeout = withTransactionTimeoutContext.timeoutOrAlternative( | ||
| assertNotNull(TimeoutContext.startTimeout(MAX_RETRY_TIME_LIMIT_MS))); |
There was a problem hiding this comment.
Since we now have to call hasTimeoutMS, the timeoutOrAlternative we added is longer useful, as it was introduced specifically to avoid doing the explicit hasTimeoutMS check.
Let's remove timeoutOrAlternative, and change the code here as follows (note also how the local variable hasTimeoutMS is no longer final, because in this codebase we don't mark local variables as final):
| final boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS(); | |
| Timeout withTransactionTimeout = withTransactionTimeoutContext.timeoutOrAlternative( | |
| assertNotNull(TimeoutContext.startTimeout(MAX_RETRY_TIME_LIMIT_MS))); | |
| boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS(); | |
| Timeout withTransactionTimeout = assertNotNull(hasTimeoutMS | |
| ? withTransactionTimeoutContext.getTimeout() | |
| : TimeoutContext.startTimeout(MAX_RETRY_TIME_LIMIT_MS)); |
| public Timeout timeoutOrAlternative(final Timeout alternative) { | ||
| return timeout == null ? alternative : timeout; | ||
| } | ||
|
|
There was a problem hiding this comment.
Let's remove this new method. See #1899 (comment) for the details.
| public Timeout timeoutOrAlternative(final Timeout alternative) { | |
| return timeout == null ? alternative : timeout; | |
| } |
Original PR accidentally closed #1852, it has outstanding review comments for @stIncMale to go over when re-reviewing.
Relevant specification changes:
JAVA-5950, JAVA-6046, JAVA-6093, JAVA-6113 (only the part about
transactions-convenient-api)