Skip to content

oplogPopulator: route oplog key through TransformObjectKey SMT when available#2741

Open
delthas wants to merge 1 commit into
development/9.5from
improvement/BB-768/oplog-partition-by-object-key
Open

oplogPopulator: route oplog key through TransformObjectKey SMT when available#2741
delthas wants to merge 1 commit into
development/9.5from
improvement/BB-768/oplog-partition-by-object-key

Conversation

@delthas
Copy link
Copy Markdown
Contributor

@delthas delthas commented May 21, 2026

Summary

  • Adds an oplogPopulator.transformObjectKey config flag (default false). When enabled, oplog source connectors are configured with the com.scality.kafka.connect.transforms.TransformObjectKey SMT (shipped by Zenko via ZENKO-5274), keying messages by the raw S3 object key.
  • extensions/oplogPopulator/constants.js: defaultConnectorConfig (legacy {ns, fullDocument.value.key} key schema) + smtKeyConfig (projects documentKey._id, adds the SMT + key.converter=StringConverter), applied on top when the flag is set.
  • ConnectorsManager._getDefaultConnectorConfiguration picks the SMT key config from the flag. _processOldConnectors scrubs SMT-only keys from a stale connector config when the flag is off, so a connector that previously ran with the SMT doesn't keep referencing a class that isn't there.
  • OplogPopulatorConfigValidator: validates the new boolean flag.

Context

BB-768: the oplog Kafka topic keys messages from fullDocument.value.key, which is null on update/delete events (BB-355 removed change.stream.full.document=updateLookup). Those op types collapse onto hash({ns, null}) while inserts spread across partitions, breaking per-object ordering across op types.

The fix lives in the Zenko-side SMT (ZENKO-5274): it derives the key from documentKey._id (always populated) and strips the arsenal master/version encoding, so master and all versions of an S3 object hash to the same partition. This PR is the Backbeat side that configures the connector to use it.

Enablement model

Rather than Backbeat probing the Kafka Connect plugin path at runtime, the SMT is gated by an explicit config flag owned by the operator (ZKOP / zenkoversion CR). The flag is flipped on once the Connect image ships the TransformObjectKey plugin and the supporting Backbeat version is deployed. This keeps Backbeat free of auto-detection/runtime-flip machinery and gives a controlled rollout. A config change goes through the existing reconciliation: the connector is updated in place via PUT /connectors/{name}/config (no recreate, no resume-token loss — the change touches only the key schema + transforms, not the pipeline match stage).

Follow-up

  • ZKOP / zenko-operator must surface transformObjectKey into the generated Backbeat config (tracked separately).

Issue: BB-768

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 21, 2026

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.66%. Comparing base (c52fbcc) to head (5bbc166).

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/oplogPopulator/OplogPopulator.js 89.14% <ø> (ø)
...ns/oplogPopulator/OplogPopulatorConfigValidator.js 100.00% <ø> (ø)
extensions/oplogPopulator/constants.js 100.00% <100.00%> (ø)
...nsions/oplogPopulator/modules/ConnectorsManager.js 94.32% <100.00%> (+0.25%) ⬆️

... and 7 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 80.82% <ø> (-0.16%) ⬇️
Ingestion 71.24% <ø> (ø)
Lifecycle 79.06% <ø> (ø)
Oplog Populator 86.04% <100.00%> (+0.20%) ⬆️
Replication 59.71% <ø> (-0.08%) ⬇️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.5    #2741      +/-   ##
===================================================
- Coverage            74.73%   74.66%   -0.07%     
===================================================
  Files                  199      199              
  Lines                13650    13659       +9     
===================================================
- Hits                 10201    10199       -2     
- Misses                3439     3450      +11     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.12% <0.00%> (-0.01%) ⬇️
api:routes 8.94% <0.00%> (-0.01%) ⬇️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 9.07% <0.00%> (-1.05%) ⬇️
ingestion 12.56% <0.00%> (-0.01%) ⬇️
lib 7.76% <0.00%> (-0.03%) ⬇️
lifecycle 18.98% <0.00%> (-0.02%) ⬇️
notification 1.02% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.16% <33.33%> (+0.02%) ⬆️
replication 18.71% <0.00%> (-0.01%) ⬇️
unit 51.26% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js
Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js
@delthas delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 650dd4e to 396d23b Compare May 21, 2026 15:05
Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated
@scality scality deleted a comment from claude Bot May 21, 2026
@delthas delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 396d23b to 55af2e9 Compare May 21, 2026 15:41
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 21, 2026

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@scality scality deleted a comment from bert-e May 21, 2026
@scality scality deleted a comment from claude Bot May 21, 2026
@delthas delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 55af2e9 to 3f61602 Compare May 21, 2026 15:46
@scality scality deleted a comment from claude Bot May 21, 2026
@delthas delthas marked this pull request as ready for review May 22, 2026 15:36
@delthas delthas requested a review from francoisferrand May 22, 2026 15:40
Copy link
Copy Markdown
Contributor

@francoisferrand francoisferrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we need to make this dynamic, or should we just let ZKOP enable the plugin depending on backbeat/zenko version (could be a "feature flag" in zenkoversion cr, so it can be enabled when Zenko has both the the plugin, the supporting backbeat version, and we decide it is time) ?
  • How critical/dangerous is this whole fix (SMT...) ? Should it land in a patch release, or should we be more cautious?

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated
Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated
@delthas delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 3f61602 to aa29999 Compare May 27, 2026 09:38
@delthas delthas changed the base branch from development/9.4 to development/9.5 May 27, 2026 09:38
Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated
…a config flag

Adds an oplogPopulator 'transformObjectKey' config flag (default false).
When enabled, oplog source connectors are configured with the
com.scality.kafka.connect.transforms.TransformObjectKey SMT (shipped by
Zenko via ZENKO-5274), keying messages by the raw S3 object key derived
from documentKey._id with the arsenal master/version encoding stripped.
Master and all versions of the same S3 object then hash to the same
partition, regardless of op type — fixing BB-768, where update/delete
events collapse onto one partition because fullDocument (today's key
source) is null on those op types.

The flag is owned by the operator (ZKOP / zenkoversion CR) and flipped on
once the Kafka Connect image ships the TransformObjectKey plugin and the
supporting Backbeat version is deployed — rather than Backbeat probing
the Connect plugin path at runtime. This keeps Backbeat free of the
auto-detection / runtime-flip machinery and gives a controlled rollout.

* extensions/oplogPopulator/constants: defaultConnectorConfig (legacy key
  schema) + smtKeyConfig (documentKey._id projection + transforms +
  key.converter), applied on top when the flag is set.
* ConnectorsManager: _getDefaultConnectorConfiguration picks the SMT key
  config from the flag; _processOldConnectors scrubs SMT-only keys from a
  stale oldConfig when the flag is off (so a connector that previously ran
  with the SMT doesn't keep referencing a missing class).
* OplogPopulatorConfigValidator: validate the new boolean flag.

Issue: BB-768
@delthas delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from aa29999 to 5bbc166 Compare May 27, 2026 09:43
@scality scality deleted a comment from claude Bot May 27, 2026
@scality scality deleted a comment from claude Bot May 27, 2026
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 27, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@claude
Copy link
Copy Markdown

claude Bot commented May 27, 2026

  • PR description vs. implementation mismatch: The Summary section describes adding getConnectorPlugins() to KafkaConnectWrapper, runtime probing for the SMT plugin on each reconciliation tick, an _smtAvailable flag, and automatic SMT activation without operator action. None of this exists in the diff — the actual implementation uses a static transformObjectKey config flag that the operator must set manually and restart Backbeat. The Context section's claim that "an upgrade rollout that bumps the Kafka Connect image after Backbeat auto-activates the SMT config on the very next tick, without operator action" does not match the code. Either update the PR description to reflect the simpler config-flag approach, or the probing logic was intended but is missing from this PR.
    - Code is otherwise clean: config spread order is correct, SMT-key scrubbing on downgrade handles the legacy restore properly (output.schema.key is overwritten by the default while SMT-only keys are deleted), test coverage for both directions (enable/disable) is solid, and the Joi schema + validator tests are adequate.

    Review by Claude Code

@scality scality deleted a comment from claude Bot May 27, 2026
@delthas
Copy link
Copy Markdown
Contributor Author

delthas commented May 27, 2026

Pivoted per your first point: dropped the runtime /connector-plugins probe and the auto-flip machinery entirely. The SMT is now gated by an oplogPopulator.transformObjectKey config flag (default off), which ZKOP/zenkoversion flips on once the Connect image ships the plugin and the supporting Backbeat is deployed. This also dissolves the _applyKeyConfig / smtOnlyConfigKeys complexity flagged inline — constants.js is now just defaultConnectorConfig + smtKeyConfig (the base + additions shape you suggested).

Per your second point: retargeted to development/9.5 only (dropped the 9.4.1 patch); fix version updated to 9.5.0.

Follow-up: the zenkoversion CR → Backbeat config is tracked in ZKOP-553.

@scality scality deleted a comment from bert-e May 27, 2026
// SMT so the oplog message key is the raw S3 object key. Enabled via
// the oplogPopulator 'transformObjectKey' config flag, set by the
// operator once Kafka Connect ships the TransformObjectKey plugin.
this._transformObjectKey = params.transformObjectKey || false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaulting is not needed here, already handling by joi ?

'value.converter': 'org.apache.kafka.connect.storage.StringConverter',
// Kafka message key config (legacy).
// The key schema projects {ns.coll, fullDocument.value.key}. fullDocument
// is null on update/delete events (BB-355 removed updateLookup), so for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.f. https://scality.atlassian.net/browse/BB-768?focusedCommentId=477005 :

  • we always pass the full document (AFAIK) on update, so the key should be available (but maybe not in "fullDocument")
  • the delete event is not used AFAIK (we generate an update with the previous document right before it), and may be dropped instead

→ is there a simpler (as in "less maitenance") fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants