Skip to content

Conversation

@nrwahl2
Copy link
Contributor

@nrwahl2 nrwahl2 commented Jan 13, 2026

This is the next batch from #4011

@nrwahl2 nrwahl2 requested a review from clumens January 13, 2026 20:03
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 13, 2026

Updated to fix Coverity false positive:

20:57:45  /pacemaker/lib/common/xml_element.c:1542: check_return: Calling "pcmk__xe_get_bool" without checking return value (as is done elsewhere 12 out of 13 times).

@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from 84de150 to ef0a401 Compare January 13, 2026 23:43
@nrwahl2 nrwahl2 marked this pull request as draft January 14, 2026 02:50
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 14, 2026

Moved to draft. Investigating cts-lab errors that pre-date this PR.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
With extern declarations in based_remote.h.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
based_init_remote_listener() becomes static.

Note that the_cib was already set to cib when startCib() called
pcmk__xe_get().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Since based_read_cib() returns non-NULL and based_activate_cib() assigns
its argument to the_cib, we can simply pass the return value of
based_read_cib() directly into based_activate_cib(). Then pass the_cib
to cib_read_config().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It was short and simple enough. There's also no harm in moving its
contents above the pcmk_cluster_new() call, as there are no dependencies
between these parts.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
The only thing it's used for is a trace message that logs how many
clients are still active after one's connection is destroyed. That
doesn't seem useful.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This shouldn't change behavior but is a good practice.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2 nrwahl2 marked this pull request as ready for review January 14, 2026 09:32
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from ef0a401 to 4efb973 Compare January 14, 2026 09:32
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 14, 2026

cts-lab ran fine for 50 iterations.

pcmk__trace("%s operation took %llds to complete", op, elapsed);
crm_write_blackbox(0, NULL);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously looked at simplifying this block of code, but our regression test and cts-lab test output is so picky when it came to what messages are logged at what level that it discouraged me from continuing to work on it. It felt like there could potentially be outside users relying on those messages also. We'll see what happens in the next commits here.


rc = cib_process_command(request, operation, op_function, &op_reply,
privileged);
log_op_result(request, operation, rc, (time(NULL) - start_time));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be using difftime for this (and elsewhere, for that matter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember looking into that for something... maybe my Booth work... and I might have decided against it for some reason. I see no problem with using it here though. All we do is check whether it's more than 3 seconds and log it at trace level with a blackbox if so.

I'm not sure I even want to do that, since we don't check how long most operations take. I would be very open to dropping the elapsed time calculation and logging. But it's not taking up much space.

if ((PCMK__CLIENT_TYPE(client) == pcmk__client_ipc)
&& pcmk__str_eq(op, CRM_OP_REGISTER, pcmk__str_none)) {

if (pcmk__is_set(flags, crm_ipc_client_response)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to load all this remote CIB stuff back into my head every year or two when I need to look at it, so I'm definitely not remembering all the nuances. If I were to guess, I'd say that the differences in behavior are due to the fact that IPC is all happening between daemons on the same host (which presumably we control?) and remote CIB operations could be from a completely separate host and could involve the whole TLS handshake setup process.

We only use three possible cred types. We always use certificate if
they're configured. Otherwise, some callers default to pre-shared keys,
while remote CIB clients and listeners default to anonymous
authentication. Also improve the doc comments, since it was not obvious
to me how this worked and why.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also add an error message on TLS init failure, don't log "Starting TLS
listener..." until we've initialized TLS, and warn that the clear-text
listener is not recommended.

It seems cleaner to handle this little piece of condtional code in the
caller, rather than passing a boolean argument.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No other code changes, only moving functions around.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No other code changes, only moving functions around.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No other code changes, only moving functions around.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also rename the two destroy callbacks so that their purposes are
clearer. Typically we wouldn't use the "based_" prefix since these are
static. However, there are similarly named callbacks in liblrmd (using
the "lrmd_" prefix) and we typically don't like having multiple
functions with the same name even if they're static (due to difficulty
of grepping through source code).

No code changes aside from the renames, moves, and dropping "return"
statements at the end of two void functions.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This is a GSourceFunc, so use G_SOURCE_REMOVE for clarity.

Also add Doxygen.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
I got confused for a little while about remote_auth_timeout_cb(). I
thought returning G_SOURCE_REMOVE would cause g_source_remove() to get
called on client->remote->source. But that is not the case.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This is a more logical place for it.

We drop the ipc_internal.h include from remote_internal.h to avoid
circular includes.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
An anonymous struct is fine for the typedef.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Previously, we were looking up the user, making sure we found it,
looking up the user's primary group, and checking whether it was
CRM_DAEMON_GROUP. If not, then we fell back to looking up
CRM_DAEMON_GROUP and checking whether user was in its member list.

That fallback is all we need. The additional checks were added by commit
14d9ae4. No reasoning was provided.

I'm guessing they were added in the name of efficiency in the common
case. But they required two additional system calls, so I doubt that was
any more efficient than iterating over the list of group members and
doing string comparisons. And since this is done only once per remote
client, efficiency gains probably don't matter much.

Also add Doxygen.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And other minor changes

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No code changes other than moving function definitions.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
...to based_common_callback_worker(), to avoid using public API prefix.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Avoid using the public API prefix.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This will probably get renamed and/or split up further when we implement
IPC/message-related best practices in pacemaker-based. However, for now,
simply make the header name match the .c file name.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from 4efb973 to 987e475 Compare January 14, 2026 23:11
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 14, 2026

Updated to address review

Make based_process_request() a bit more legible.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And change/add some variable names.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There are still plenty of IPC-specific pieces in based_callbacks.c.
However, this pulls out all of the libqb IPC callbacks, which can be
easily isolated. Pulling out the rest of the IPC-specific code will
require more refactoring.

This is based on commit 24d4455 and similar ones for other daemons.

Ref T566

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also drop the unused size argument from dispatch_common().

This is based on commit 4f8e790 and similar ones for other daemons.

Ref T566

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
* Add Doxygen.
* Standardize log messages with what's happening in the same functions
  in other daemons.
* Standardize variable names (cib_client becomes client, op_request
  becomes msg).
* Un-indent "msg != NULL" block after moving the "msg == NULL" block
  upward and standardizing it with other daemons.
* Make sure return values and comments match what libqb documents.

The "Dispatching (un)privileged request" message is out of alignment
with the common code in other daemons. I've left it for now, because the
alternative is to fetch the client in based_ipc_dispatch_rw() and
based_ipc_dispatch_ro(), log the message, and then fetch the client
again in dispatch_common() (or pass it as an argument, which is also
non-standard).

Also add a TODO for based_ipc_destroy(). The other daemons have comments
in their DAEMON_ipc_closed() doxygen that say:

"We handle a destroyed connection the same as a closed one, but we need
a separate handler because the return type is different."

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Now, if cib_api_operations_t:add_notify_callback or
cib_api_operations_t:del_notify_callback is called with the event
argument set to any value other than "cib_diff_notify", it will result
in an error.

Previously, the following were also treated as valid:
* "cib_pre_notify"
* "cib_post_notify"
* "cib_update_confirmation"

However, at least as far back as Pacemaker 2.0.0, Pacemaker has not sent
any notifications for those events. So if a client added or deleted a
notify callback for them, the addition or deletion would succeed, but
the callback would never run.

Likewise, at least as far back as Pacemaker 2.0.0, no internal clients
have added or deleted callbacks for those events. So this change should
not risk breaking compatibility with older internal clients.

External clients are probably not registering CIB callbacks. But in
theory they could be, since these functions are public API.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There's only one flag now, but we still need this instead of a boolean,
since we're setting it in pcmk__client_t:flags.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And simplify the body. The value doesn't have to be "true"
(PCMK_VALUE_TRUE); it can be anything that pcmk__parse_bool() parses to
true.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also move enum based_notify_flags and based_parse_notify_flags into
based_notify.c. No other file needs them now.

Technically we don't need based_nf_none or based_parse_notify_flags()
since we only have one meaningful flag. Those seemed clearer than
checking directly for PCMK__VALUE_CIB_DIFF_NOTIFY and setting the
based_nf_diff flag directly in based_update_notify_flags().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
These don't affect behavior, but they make the code clearer. We should
not be calling IPC functions for remote clients. I spent longer than I'd
like to admit, trying to figure out how the notify update ack worked for
remote clients, before I noticed the crm_ipc_client_response guard in
pcmk__ipc_create_ack_as(). Only cib_native.c sets that flag for CIB
clients. It's never set for remote clients as far as I can tell (which
makes sense).

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
For consistency with other IPC/message-related code.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
...clients.

CIB IPC clients send an initial request with PCMK__XA_CIB_OP set to
CRM_OP_REGISTER (in cib_native_signon()).

**For some reason**, CIB remote clients don't do that. Instead, they
send an initial request with PCMK_XA_OP (a different attribute) set to
"authenticate" (a different value). The PCMK__XA_CIB_OP attribute is not
set at all.

If the CIB manager receives a request with PCMK__XA_CIB_OP set to
CRM_OP_REGISTER, then it sends an ack in based_common_callback_worker().
As discussed above, that will happen only for an IPC client. It will
never happen for a remote client.

The CIB manager sends a CRM_OP_REGISTER ack to a CIB remote client
in cib_remote_msg(), after authentication.

It would be nice to isolate the IPC code from the remote code better.
This small change highlights that this block is needed only for IPC
clients.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Log the client name, and also log the bad request at info level.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
..and into based_ipc.c and based_remote.c. This introduces some
duplication, but we want to disentangle the IPC code from the remote
code.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Move the remaining functionality into based_ipc.c:dispatch_common() and
based_remote.c:cib_handle_remote_msg(). There is a bit of duplication,
but it's worth it to finally separate IPC and remote request processing,
up to the point of based_process_request().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
FALSE is defined as 0, so this doesn't change behavior.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from 987e475 to bc0ab87 Compare January 14, 2026 23:13
@clumens clumens merged commit 6a6d410 into ClusterLabs:main Jan 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants