Skip to content

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899

Open
nhachicha wants to merge 44 commits intomongodb:backpressurefrom
nhachicha:nh/backpressure_convenient_api
Open

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899
nhachicha wants to merge 44 commits intomongodb:backpressurefrom
nhachicha:nh/backpressure_convenient_api

Conversation

@nhachicha
Copy link
Collaborator

@nhachicha nhachicha commented Feb 25, 2026

Original PR accidentally closed #1852, it has outstanding review comments for @stIncMale to go over when re-reviewing.

Relevant specification changes:

JAVA-5950, JAVA-6046, JAVA-6093, JAVA-6113 (only the part about transactions-convenient-api)

strogiyotec and others added 30 commits February 25, 2026 13:00
…Impl.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
@nhachicha nhachicha self-assigned this Feb 25, 2026
@nhachicha nhachicha marked this pull request as ready for review February 25, 2026 16:57
@nhachicha nhachicha requested a review from a team as a code owner February 25, 2026 16:57
@nhachicha nhachicha requested review from rozza and stIncMale and removed request for rozza February 25, 2026 16:57
@stIncMale
Copy link
Member

stIncMale commented Mar 6, 2026

The commit corresponding to the last reviewed commit is the previous PR is fda498c.

Copy link
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished reviewing the changes.

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

public class ExponentialBackoffTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original suggestion #1852 (comment) was auto-resolved, but the corresponding change 1655dad hasn't been included in the current PR:

Suggested change
public class ExponentialBackoffTest {
class ExponentialBackoffTest {

Comment on lines +39 to +40
String.format("Attempt %d: backoff should be 0-%d ms, got: %d", attemptNumber,
expectedBackoff, backoff));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original suggestion #1852 (comment) was auto-resolved, but the corresponding change f5274cb hasn't been fully included in the current PR:

Suggested change
String.format("Attempt %d: backoff should be 0-%d ms, got: %d", attemptNumber,
expectedBackoff, backoff));
String.format("Attempt %d: backoff should be between 0 ms and %d ms, got: %d", attemptNumber,
expectedBackoff, backoff));

@Test
void testCalculateTransactionBackoffMsRespectsMaximum() {
// Even at high attempt numbers, backoff should never exceed ExponentialBackoff.TRANSACTION_MAX_MS
for (int attemptNumber = 1; attemptNumber < 26; attemptNumber++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the number of iterations here based on EXPECTED_BACKOFFS_MAX_VALUES.length (this way it is more clear than when a magic number like 26 is used)

Suggested change
for (int attemptNumber = 1; attemptNumber < 26; attemptNumber++) {
for (int attemptNumber = 1; attemptNumber < EXPECTED_BACKOFFS_MAX_VALUES.length * 2; attemptNumber++) {

}
}

private long measureTransactionLatency(final double jitter, final BsonDocument failPointDocument) throws InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's either specify the units returned in the method name, like measureTransactionLatencyMs, or, better, return Duration, and let the calling code do the conversion by calling, for example, Duration.toMillis, if needed.

Comment on lines 221 to +226
public void testRetryBackoffIsEnforced() throws InterruptedException {
// Run with jitter = 0 (no backoff)
ExponentialBackoff.setTestJitterSupplier(() -> 0.0);

BsonDocument failPointDocument = BsonDocument.parse("{'configureFailPoint': 'failCommand', 'mode': {'times': 13}, "
+ "'data': {'failCommands': ['commitTransaction'], 'errorCode': 251}}");

long noBackoffTime;
try (ClientSession session = client.startSession();
FailPoint ignored = FailPoint.enable(failPointDocument, getPrimary())) {
StartTime startTime = StartTime.now();
session.withTransaction(() -> collection.insertOne(session, Document.parse("{}")));
noBackoffTime = startTime.elapsed().toMillis();
} finally {
// Clear the test jitter supplier to avoid affecting other tests
ExponentialBackoff.clearTestJitterSupplier();
}

// Run with jitter = 1 (full backoff)
ExponentialBackoff.setTestJitterSupplier(() -> 1.0);

failPointDocument = BsonDocument.parse("{'configureFailPoint': 'failCommand', 'mode': {'times': 13}, "
final BsonDocument failPointDocument = BsonDocument.parse("{'configureFailPoint': 'failCommand', 'mode': {'times': 13}, "
+ "'data': {'failCommands': ['commitTransaction'], 'errorCode': 251}}");

long withBackoffTime;
try (ClientSession session = client.startSession();
FailPoint ignored = FailPoint.enable(failPointDocument, getPrimary())) {
StartTime startTime = StartTime.now();
session.withTransaction(() -> collection.insertOne(session, Document.parse("{}")));
withBackoffTime = startTime.elapsed().toMillis();
} finally {
ExponentialBackoff.clearTestJitterSupplier();
}
long noBackoffTime = measureTransactionLatency(0.0, failPointDocument);
long withBackoffTime = measureTransactionLatency(1.0, failPointDocument);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code deduplication in this test was achieved by extracting the duplicated code into measureTransactionLatency. There was a separate comment about the duplicated string that represents the failpoint document, but that comment wasn't about saving the computing resources by avoiding to parse the document more than once.

Having failPointDocument outside of the measureTransactionLatency method hurts readability. It may in principle be tolerable if the performance improvement its provides is worth the compromise. However, that's not the case here, as performance is mostly irrelevant.

Let's move the whole final BsonDocument failPointDocument = BsonDocument.parse... into failPointDocument.

public <T> T withTransaction(final TransactionBody<T> transactionBody, final TransactionOptions options) {
notNull("transactionBody", transactionBody);
TimeoutContext withTransactionTimeoutContext = createTimeoutContext(options);
final boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[just information]

Us using both timeoutOrAlternative (or getTimeout) and hasTimeoutMS now got me thinking if it's obvious that getTimeout() is null iff getTimeoutSettings().getTimeoutMS() is null. It is not, I had to spend time analyzing the constructors and how they are called to verify that. To make this invariant explicit, and to enforce it, I created #1908.

}
}

private static MongoException timeoutException(final boolean hasTimeoutMS, final Throwable cause) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename hasTimeoutMS to something like timeoutMsConfigured or csotConfigured? That way the name does not beg the question "what is the thing that has the timeoutMS?".

The same applies to the hasTimeoutMS parameter of the backoff method, and to the local variable hasTimeoutMS in the withTransaction method.

Comment on lines 112 to 114
private TimeoutContext(final TimeoutSettings timeoutSettings, @Nullable final Timeout timeout) {
this(false, timeoutSettings, timeout);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we removed the copyTimeoutContext method in this PR, the constructor TimeoutContext(final TimeoutSettings timeoutSettings, @Nullable final Timeout timeout) became dead code. Let's remove the constructor.

Comment on lines +261 to +263
final boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS();
Timeout withTransactionTimeout = withTransactionTimeoutContext.timeoutOrAlternative(
assertNotNull(TimeoutContext.startTimeout(MAX_RETRY_TIME_LIMIT_MS)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now have to call hasTimeoutMS, the timeoutOrAlternative we added is longer useful, as it was introduced specifically to avoid doing the explicit hasTimeoutMS check.

Let's remove timeoutOrAlternative, and change the code here as follows (note also how the local variable hasTimeoutMS is no longer final, because in this codebase we don't mark local variables as final):

Suggested change
final boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS();
Timeout withTransactionTimeout = withTransactionTimeoutContext.timeoutOrAlternative(
assertNotNull(TimeoutContext.startTimeout(MAX_RETRY_TIME_LIMIT_MS)));
boolean hasTimeoutMS = withTransactionTimeoutContext.hasTimeoutMS();
Timeout withTransactionTimeout = assertNotNull(hasTimeoutMS
? withTransactionTimeoutContext.getTimeout()
: TimeoutContext.startTimeout(MAX_RETRY_TIME_LIMIT_MS));

Comment on lines +171 to +174
public Timeout timeoutOrAlternative(final Timeout alternative) {
return timeout == null ? alternative : timeout;
}

Copy link
Member

@stIncMale stIncMale Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this new method. See #1899 (comment) for the details.

Suggested change
public Timeout timeoutOrAlternative(final Timeout alternative) {
return timeout == null ? alternative : timeout;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants