MDEV-37316#5073
Conversation
To prepare for MDEV-37321, this commit moves them to a
`Remote_event_stream` class outside of `sql/`.
The `.cc` file will switch to Connector/C unbeknownst to `rpl_mi.cc`.
Only the following steps are reörganized to reusable instance methods:
* Connection setup and startup
* VIO management
* Abort when the thread is killed
* Closing
(already covered by Connector/C, but not by the current `libmysql`)
The following steps are currently not öptimal for moving
(have verbose changesets), so they are only wrapped, not reörganized:
* steps that run queries *with copy-pasted code*
* sending commands `COM_REGISTER_SLAVE` & `COM_BINLOG_DUMP`,
including serializing their arguments
* Connector/C does not have a formal API for serializing
command arguments, especially variable-length strings.
* Connector/C has a Binlog API that duplicates this step, but that
API is not modular for the Server’s full replication capabilities.
* error checking
These require refactors and ideally an evaluation for `mariadb-binlog`.
To enable this migration, this commit additionally:
* Reserves the management of the IO thread’s VIO to the client library
* Draft-specific: omits MDEV-28114’s kludge failure
recovery for Semi-Sync graceful self-KILL since
* MDEV-32551 fixed the root bug on the ACK receiver, and
* MDEV-39583 plans to remove this entire sub-component.
Draft-specific: Until Connector/C exports the pre-decompress (network)
byte count, `@@read_binlog_speed_limit` will measure the post-decompress
count even in `@@slave_compressed_protocol` for now.
There was a problem hiding this comment.
Code Review
This pull request introduces the Remote_event_stream class to encapsulate MariaDB Connector/C handles, refactoring the replication slave I/O thread and semi-sync logic to use this new abstraction instead of raw MYSQL pointers. Feedback highlights critical bugs, including a missing error check for cli_safe_read that could cause memory corruption, a buffer over-read vulnerability in semisync_ack, and a 1-byte offset error in read_event that breaks event parsing. Other recommendations include ensuring my_global.h is the first include, removing a redundant vio_close call, and using const char * in Connection_options for better ownership semantics.
| auto strlen= static_cast<size_t>(cli_safe_read(connector)); | ||
| return {connector->net.read_pos, strlen}; |
There was a problem hiding this comment.
The return value of cli_safe_read is not checked for errors. If it returns (unsigned long)-1 (the value of packet_error), strlen will be cast to a very large size_t, resulting in an invalid string_view. This will cause next().size() to be non-zero, bypassing error checks in the caller and potentially leading to memory corruption.
unsigned long len= cli_safe_read(connector);
if (unlikely(len == (unsigned long)-1))
return {};
return {connector->net.read_pos, static_cast<size_t>(len)};There was a problem hiding this comment.
packet_error is 0 in libmysql.
| log_name.copy(&(payload[HEAD_SIZE]), sizeof(payload)-HEAD_SIZE); | ||
| NET *net= &connector->net; | ||
| //@deprecated: not required in Connector/C | ||
| net->pkt_nr_can_be_reset= true; | ||
| // Connector/C might not require resetting; better be safe until confirmed. | ||
| net_clear(net, false); | ||
| return my_net_write(net, reinterpret_cast<const unsigned char *>(payload), | ||
| HEAD_SIZE + (log_name.size()+1)) || net_flush(net); |
There was a problem hiding this comment.
There is a buffer over-read risk in semisync_ack. The payload buffer has a fixed size based on FN_REFLEN, but my_net_write uses log_name.size() + 1 as the length without validation. If log_name.size() exceeds FN_REFLEN, my_net_write will read past the end of the payload array. Also, std::string_view::copy does not null-terminate.
size_t copied= log_name.copy(&(payload[HEAD_SIZE]), sizeof(payload) - HEAD_SIZE - 1);
payload[HEAD_SIZE + copied]= '\0';
NET *net= &connector->net;
//@deprecated: not required in Connector/C
net->pkt_nr_can_be_reset= true;
// Connector/C might not require resetting; better be safe until confirmed.
net_clear(net, false);
return my_net_write(net, reinterpret_cast<const unsigned char *>(payload),
HEAD_SIZE + copied + 1) || net_flush(net);There was a problem hiding this comment.
If
log_name.size()exceedsFN_REFLEN,my_net_writewill read past the end of thepayloadarray.
It will not de facto.
Also,
std::string_view::copydoes not null-terminate.
The array initialization does.
| DBUG_RETURN(len - 1); | ||
| DBUG_PRINT("exit", ("len: %zu net->read_pos[4]: %d", | ||
| len, packet[4])); | ||
| DBUG_RETURN(packet); |
There was a problem hiding this comment.
The read_event function now returns the entire packet including the MySQL protocol status byte (at index 0). However, the rest of the replication logic in handle_slave_io expects event_buf and event_len to represent only the event data (excluding the status byte). Returning the full packet here causes a 1-byte offset error in event parsing.
DBUG_RETURN(packet.substr(1));
Just a draft/preview that also serves as CI upload and backup; not final yet, not even close…
What problem is the patch trying to solve?
If some output changed that is not visible in a test case, what was it looking like before the change and how it's looking with this patch applied?
Do you think this patch might introduce side-effects in other parts of the server?
Anything from release notes?