Skip to content
/ server Public

Conversation

@mariadb-TafzeelShams
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-38079

Description

Issue:
Recovery fails because the expected space ID does not match the space
ID stored in the page.

Root Cause:

  • Before the crash, the nth page (n != 0) gets flushed to disk as a
    compressed page.
  • Page 0 remains unflushed, and the compressed flag for the space is
    made durable only in the redo logs.
  • During recovery, the compressed flag is first set to indicate a
    compressed space.
  • Later, while applying redo logs, an earlier LSN may reset it to
    non-compressed and then back to compressed.
  • If the nth page is read during this intermediate state, a compressed
    page may be read as non-compressed, causing a space ID mismatch.

Fix:

  • recv_sys_t::space_flags_lsn : Added a map to track the last applied
    LSN for each space and avoid stale updates from earlier LSNs.
  • recv_sys_t::update_space_flags() : Updates space->flags during
    recovery only if the update comes from the latest LSN.

How can this PR be tested?

Tested equivalent 10.11 patch with the data directory mentioned in the Jira ticket.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch
  • This is a bug fix and the PR is based against the earliest branch in which the bug can be reproduced

case log_phys_t::APPLIED_TO_FSP_HEADER:
s->flags = mach_read_from_4(
case log_phys_t::APPLIED_TO_FSP_HEADER: {
uint32_t flags = mach_read_from_4(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this temporary variable "flags".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto it = space_flags_lsn.find(space->id);
if (it == space_flags_lsn.end() || lsn >= it->second)
{
space->flags = flags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be space before "="

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

FSP_HEADER_OFFSET + FSP_FREE
+ FLST_LEN + frame);
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this curly brace?

Copy link
Contributor Author

@mariadb-TafzeelShams mariadb-TafzeelShams Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brace introduce a local scope, so flags is kept limited to the APPLIED_TO_FSP_HEADER case.

Copy link
Member

@Thirunarayanan Thirunarayanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to address the review comments.

FSP_HEADER_OFFSET
+ FSP_SPACE_FLAGS + frame);
recv_sys.update_space_flags(s,
mach_read_from_4(FSP_HEADER_OFFSET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recv_sys.update_space_flags(
   s, mach_read_from_4(FSP_HEADER_OFFSET +
                       FSP_SPACE_FLAGS + frame),
   l->start_lsn);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recv_sys.update_space_flags(
  s, mach_read_from_4(FSP_HEADER_OFFSET +
                      FSP_SPACE_FLAGS + frame),
  l->start_lsn);

Copy link
Member

@Thirunarayanan Thirunarayanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with the code format changes

Issue:
Recovery fails because the expected space ID does not match the space
ID stored in the page.

Root Cause:
- Before the crash, the nth page (n != 0) gets flushed to disk as a
  compressed page.
- Page 0 remains unflushed, and the compressed flag for the space is
  made durable only in the redo logs.
- During recovery, the compressed flag is first set to indicate a
  compressed space.
- Later, while applying redo logs, an earlier LSN may reset it to
  non-compressed and then back to compressed.
- If the nth page is read during this intermediate state, a compressed
  page may be read as non-compressed, causing a space ID mismatch.

Fix:
- recv_sys_t::space_flags_lsn : Added a map to track the last applied
  LSN for each space and avoid stale updates from earlier LSNs.
- recv_sys_t::update_space_flags() : Updates space->flags during
  recovery only if the update comes from the latest LSN.
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a step to the right direction, but I think that it needs a little more work. When it comes to changing FSP_SIZE, we can exploit the fact that the size never shrinks, except in undo tablespace truncation, whose recovery is handled separately. I believe that the LSN tracking must be kept strictly related to changes of FSP_SPACE_FLAGS.

Comment on lines 1672 to +1678
/** Set the recovered size of a tablespace in pages.
@param id tablespace ID
@param size recovered size in pages
@param flags tablespace flags */
@param flags tablespace flags
@param lsn lsn of the redo log */
void fil_space_set_recv_size_and_flags(ulint id, uint32_t size,
uint32_t flags) noexcept;
uint32_t flags, lsn_t lsn) noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @param lsn comment is not adding any value. Which specific LSN does it refer to? Also the size and flags could be described more accurately. Let us use spaces instead of TAB also in the comment:

/** Set the recovered size and flags of a tablespace.
@param id         tablespace ID
@param size       recovered FSP_SIZE, or a less-than-equal value if no change
@param flags      recovered FSP_SPACE_FLAGS if applicable
@param flags_lsn  start LSN of modifying flags, or 0 */

This comment would also answer the question what would happen or how we would identify the situation when a mini-transaction is modifying only one of FSP_SIZE or FSP_SPACE_FLAGS.

Comment on lines +271 to +272
/** Last applied LSN for space->flags updates (used to prevent stale updates) */
std::map<uint32_t, lsn_t> space_flags_lsn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could significantly increase the memory footprint. Do we really need an ordered map? We already store similar information in file_name_t linked from recv_spaces.

Could we slightly repurpose fil_space_t::create_lsn for storing this information? I think that the name fil_space_t::format_lsn would also cover the meaning of create_lsn, since the format is usually determined at creation.

If the information is being read before a fil_space_t object is available, then it would seem possible to add file_name_t::format_lsn for storing this.

Comment on lines +356 to +357
/** Update space->flags only if lsn >= last recorded space-flags LSN */
void update_space_flags(fil_space_t *space, uint32_t flags, lsn_t lsn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which LSN? This comment should also somehow refer to fil_space_set_recv_size_and_flags().

Comment on lines +1099 to +1104
auto it= space_flags_lsn.find(space->id);
if (it == space_flags_lsn.end() || lsn >= it->second)
{
space->flags= flags;
space_flags_lsn[space->id]= lsn;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not it->lsn= lsn;?

We are missing mysql_mutex_assert_owner(&recv_sys.mutex) here. The access to a shared data structure needs to be protected somehow.

Comment on lines 1345 to 1349
if (f.size
|| f.flags != f.initial_flags) {
fil_space_set_recv_size_and_flags(
space->id, f.size, f.flags);
space->id, f.size, f.flags, lsn);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is passing the start LSN of the mini-transaction of a FILE_ record, instead of the start LSN of the mini-transaction that actually modified FSP_SPACE_FLAGS.

Comment on lines +2730 to +2737
else if (!it->second.space)
{
if (has_size)
it->second.size= size;
if (has_flags)
it->second.flags= flags;
}
fil_space_set_recv_size_and_flags(space_id, size, flags);
fil_space_set_recv_size_and_flags(space_id, size, flags, start_lsn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to pass has_flags ? start_lsn : 0 to distinguish the case where the flags were not changed. Maybe the file_name_t::initial_flags could be removed altogether, now that we would keep track of the start_lsn of the mini-transaction that modifed FSP_SPACE_FLAGS, in a new field it->second.flags_lsn or it->second.format_lsn.

Comment on lines 3094 to +3098
case log_phys_t::APPLIED_TO_FSP_HEADER:
s->flags = mach_read_from_4(
FSP_HEADER_OFFSET
+ FSP_SPACE_FLAGS + frame);
recv_sys.update_space_flags(
s, mach_read_from_4(FSP_HEADER_OFFSET +
FSP_SPACE_FLAGS + frame),
l->start_lsn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this code be moved to log_phys_t::apply() and the special return value APPLIED_TO_FSP_HEADER removed? That return value does not currently distinguish changes to FSP_SPACE_FLAGS and FSP_SIZE. That is, if a mini-transaction only modifies FSP_SIZE but not FSP_SPACE_FLAGS, we would execute some unnecessary code here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants