Skip to content
/ server Public

Conversation

@dr-m
Copy link
Contributor

@dr-m dr-m commented Jan 26, 2026

The design of "binlog group commit" involves carrying some state across transaction boundaries. This includes trx_t::commit_lsn, which keeps track of how much write-ahead log needs to be written. Unfortunately, this field was not reset in a commit where a log write was elided. That would cause an unnecessary wait in a subsequent read-only transaction that happened to reuse the same transaction object.

trx_deregister_from_2pc(): Reset trx->commit_lsn so that an earlier write that was executed in the same client connection will not result in an unnecessary wait during a subsequent read operation.

trx_commit_complete_for_mysql(): Unless we are inside a binlog group commit, reset trx->commit_lsn.

unlock_and_close_files(): Reset trx->commit_lsn after durably writing the log, and remove a redundant log write call from some callers.

@dr-m dr-m requested a review from vaintroub January 26, 2026 10:49
@dr-m dr-m self-assigned this Jan 26, 2026
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines 4382 to 4383
log_write_up_to(trx->commit_lsn, true);
trx->commit_lsn= 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more in line with later branches, I intend to change this to the following:

  if (lsn_t lsn= trx->commit_lsn)
  {
    trx->commit_lsn= 0;
    log_write_up_to(lsn, true);
  }

Copy link
Member

@vaintroub vaintroub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to trust you on most of that code. But I have 2 remarks about it . could you maybe add an assertion in trx_flush_log_if_needed() that it is not participating in write/flush/group commit logic if it is in read-only transaction? Because that was what caused problems in the MDEV-38589. Something along the lines

diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 9175e5098c0..0792224f74f 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1244,6 +1244,8 @@ static void trx_flush_log_if_needed(lsn_t lsn, trx_t *trx)
   if (log_sys.get_flushed_lsn() > lsn)
     return;

+  ut_ad(!trx->mysql_thd || !thd_tx_is_read_only(trx->mysql_thd));
+
   const bool flush=
     (srv_file_flush_method != SRV_NOSYNC &&
      (srv_flush_log_at_trx_commit & 1));

And second, adding debug only code to zero trx-commit_lsn, to workaround debug assertion in trx->free(), which then always zeros trx->commit_lsn looks slightly weird, but I understand it is appropriate for old branches, because we do not want to introduce new non-debug assertions, that would be risky.

I think however, that it would be better, later when you merge into development branch, remove all these ut_d(), around zeroing trx->commit_lsn just prior to trx->free(). And, inside trx_free, add ut_a(!commit_lsn) , i.e unconditional assert, not just debug. Will help to discover such bugs in optimized code, e.g during benchmarks, at a very low cost of single ut_a()

@dr-m
Copy link
Contributor Author

dr-m commented Jan 26, 2026

I'll have to trust you on most of that code. But I have 2 remarks about it . could you maybe add an assertion in trx_flush_log_if_needed() that it is not participating in write/flush/group commit logic if it is in read-only transaction? Because that was what caused problems in the MDEV-38589. Something along the lines

diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 9175e5098c0..0792224f74f 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1244,6 +1244,8 @@ static void trx_flush_log_if_needed(lsn_t lsn, trx_t *trx)
   if (log_sys.get_flushed_lsn() > lsn)
     return;

+  ut_ad(!trx->mysql_thd || !thd_tx_is_read_only(trx->mysql_thd));
+
   const bool flush=
     (srv_file_flush_method != SRV_NOSYNC &&
      (srv_flush_log_at_trx_commit & 1));

Your suggestion would depend on 9aca89f, which was not allowed to enter the 10.6 branch. Maybe I should target the 10.11 branch with this one as well.

I think however, that it would be better, later when you merge into development branch, remove all these ut_d(), around zeroing trx->commit_lsn just prior to trx->free(). And, inside trx_free, add ut_a(!commit_lsn) , i.e unconditional assert, not just debug. Will help to discover such bugs in optimized code, e.g during benchmarks, at a very low cost of single ut_a()

It’s not only an additional cost of inserting the assertion, but an extra cost of zeroing the field in several additional places. We could introduce a non-inline wrapper function of trx_t::free() to minimize the overhead.

@dr-m dr-m changed the base branch from 10.6 to 10.11 January 26, 2026 14:07
@vaintroub
Copy link
Member

Actually, this code ut_ad(!trx->mysql_thd || !thd_tx_is_read_only(trx->mysql_thd)); compiled in 10.6, in your branch. Accessor or not , it is unimportant, it is just some code that executes in debug mode. It just catches the error earlier than we had in main branch, where next query has already started.

@dr-m
Copy link
Contributor Author

dr-m commented Jan 26, 2026

Actually, this code ut_ad(!trx->mysql_thd || !thd_tx_is_read_only(trx->mysql_thd)); compiled in 10.6, in your branch. Accessor or not , it is unimportant, it is just some code that executes in debug mode. It just catches the error earlier than we had in main branch, where next query has already started.

Right, I realized that later. In any case, performance fixes that do not address a correctness issue should not qualify being pushed into the 10.6 branch.

@vaintroub vaintroub self-requested a review January 27, 2026 14:26
The design of "binlog group commit" involves carrying some state across
transaction boundaries. This includes trx_t::commit_lsn, which keeps track
of how much write-ahead log needs to be written. Unfortunately, this
field was not reset in a commit where a log write was elided. That would
cause an unnecessary wait in a subsequent read-only transaction that
happened to reuse the same transaction object.

trx_deregister_from_2pc(): Reset trx->commit_lsn so that
an earlier write that was executed in the same client connection
will not result in an unnecessary wait during a subsequent read
operation.

trx_commit_complete_for_mysql(): Unless we are inside a binlog
group commit, reset trx->commit_lsn.

unlock_and_close_files(): Reset trx->commit_lsn after durably
writing the log, and remove a redundant log write call from some
callers.

trx_t::rollback_finish(): Clear commit_lsn, because a rolled-back
transaction will not need to be durably written.

trx_t::clear_and_free(): Wrapper function to suppress a debug check
in trx_t::free().

Also, remove some redundant ut_ad(!trx->will_lock) that will be checked
in trx_t::free().

Reviewed by: Vladislav Vaintroub
@dr-m dr-m merged commit 7614f8f into 10.11 Jan 28, 2026
14 of 17 checks passed
@dr-m dr-m deleted the MDEV-38589 branch January 28, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants