monitor lifecycle conductor by benzekrimaha · Pull Request #2723 · scality/backbeat

benzekrimaha · 2026-03-02T16:23:04Z

bert-e · 2026-03-02T16:23:08Z

Hello benzekrimaha,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2026-03-02T16:23:13Z

Incorrect fix version

The Fix Version/s in issue BB-740 contains:

9.3.0

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

9.3.1

Please check the Fix Version/s of BB-740, or the target
branch of this pull request.

codecov · 2026-03-02T16:48:01Z

Codecov Report

❌ Patch coverage is 97.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.83%. Comparing base (145f7a6) to head (5d84665).
⚠️ Report is 46 commits behind head on development/9.3.

Files with missing lines	Patch %	Lines
...ecycle/bucketProcessor/LifecycleBucketProcessor.js	80.00%	3 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
extensions/lifecycle/LifecycleConfigValidator.js	`100.00% <100.00%> (ø)`
extensions/lifecycle/LifecycleMetrics.js	`100.00% <100.00%> (+1.78%)`	⬆️
...tensions/lifecycle/conductor/LifecycleConductor.js	`84.63% <100.00%> (+0.92%)`	⬆️
...sions/lifecycle/tasks/LifecycleDeleteObjectTask.js	`92.90% <100.00%> (+0.14%)`	⬆️
extensions/lifecycle/tasks/LifecycleTask.js	`91.62% <100.00%> (+0.07%)`	⬆️
extensions/lifecycle/tasks/LifecycleTaskV2.js	`88.99% <100.00%> (+0.10%)`	⬆️
...s/lifecycle/tasks/LifecycleUpdateExpirationTask.js	`81.81% <100.00%> (+0.73%)`	⬆️
...s/lifecycle/tasks/LifecycleUpdateTransitionTask.js	`92.15% <100.00%> (+0.23%)`	⬆️
...ecycle/bucketProcessor/LifecycleBucketProcessor.js	`80.83% <80.00%> (+0.96%)`	⬆️

... and 5 files with indirect coverage changes

Components	Coverage Δ
Bucket Notification	`80.37% <ø> (ø)`
Core Library	`81.11% <ø> (+0.41%)`	⬆️
Ingestion	`71.14% <ø> (ø)`
Lifecycle	`79.34% <97.72%> (+0.72%)`	⬆️
Oplog Populator	`85.83% <ø> (ø)`
Replication	`59.61% <ø> (-0.04%)`	⬇️
Bucket Scanner	`85.76% <ø> (ø)`

@@                 Coverage Diff                 @@
##           development/9.3    #2723      +/-   ##
===================================================
+ Coverage            74.48%   74.83%   +0.35%     
===================================================
  Files                  200      200              
  Lines                13603    13701      +98     
===================================================
+ Hits                 10132    10253     +121     
+ Misses                3461     3438      -23     
  Partials                10       10

Flag	Coverage Δ
api:retry	`9.08% <0.75%> (-0.06%)`	⬇️
api:routes	`8.90% <0.75%> (-0.06%)`	⬇️
bucket-scanner	`85.76% <ø> (ø)`
ft_test:queuepopulator	`10.88% <9.84%> (+0.86%)`	⬆️
ingestion	`12.47% <0.75%> (-0.08%)`	⬇️
lib	`7.57% <0.75%> (-0.05%)`	⬇️
lifecycle	`19.13% <62.12%> (+0.28%)`	⬆️
notification	`1.02% <0.00%> (-0.01%)`	⬇️
oplogPopulator	`0.14% <0.00%> (-0.01%)`	⬇️
replication	`18.41% <9.84%> (-0.09%)`	⬇️
unit	`51.49% <91.66%> (+0.47%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

delthas · 2026-03-18T16:29:03Z

        const log = this.logger.newRequestLogger();
-        const start = new Date();
+        const start = Date.now();
+        this._scanId = uuid();


Hmm, we're storing the scan ID as a "global" field variable, but it sounds like it is really relevant/used only inside this function (through indirect calls). Could we drop the global field and instead pass it through to whatever uses it? Maybe in _createBucketTaskMessages?

claude · 2026-05-21T14:18:37Z

+        'configured bucket processor retention interval without update to ' +
+        'avoid unbounded process memory growth. ' +
+        'Prometheus retains scraped scan-id series until TSDB retention.',
+    labelNames: [LIFECYCLE_LABEL_ORIGIN, LIFECYCLE_LABEL_CONDUCTOR_SCAN_ID],


The conductor_scan_id label on bucketProcessorScanMessagesProcessed has unbounded cardinality — each scan produces a new UUID. The local cleanup timer limits prom-client memory, but Prometheus TSDB retains every scraped series until its own retention expires. Over weeks/months this can cause significant storage growth and slow queries on the TSDB side.

Consider whether you've estimated the TSDB cardinality impact (e.g. one new scan-id per hour × 30-day retention = 720 label values × replicas). If this is expected and acceptable, no change needed, but it's worth documenting the expected growth rate in a runbook or the alert description.

— Claude Code

This is an intentional trade-off for the overlapping-scan alert: the alert needs to distinguish scan IDs to detect bucket processors handling work from more than one conductor scan. Local prom-client cleanup is configurable through bucketProcessor.scanMetricRetentionMs, and the metric help text now states that Prometheus retains scraped scan-id series until TSDB retention. I kept the wording concise to avoid over-explaining this internal troubleshooting detail in user-facing monitoring text.

claude · 2026-05-21T14:18:38Z

+const DEFAULT_SCAN_METRIC_RETENTION_MS = 24 * 60 * 60 * 1000;
+const CONDUCTOR_ORIGIN = 'conductor';
+const BUCKET_PROCESSOR_ORIGIN = 'bucket_processor';
+let scanMetricRetentionMs = DEFAULT_SCAN_METRIC_RETENTION_MS;


Module-level mutable state (let scanMetricRetentionMs) shared across tests. The resetLifecycleScanMetricCleanupTimers function handles this for tests, but in production, if configureLifecycleScanMetricRetention is called multiple times (e.g. multiple LifecycleBucketProcessor instances in the same process), the last call wins silently. This is likely fine given the current single-instance architecture, but worth noting.

— Claude Code

This is process-wide state by design for the current bucket processor deployment model: metrics are process-wide and there is a single lifecycle bucket processor service configuration per process. Tests reset it through resetLifecycleScanMetricCleanupTimers(). If we ever run multiple differently configured lifecycle bucket processors in one process, this should be revisited with instance-scoped metric state.

claude · 2026-05-21T14:19:02Z

+            details,
+        };
+    }
+


_getContinuationEntry spreads ...bucketData and then overwrites contextInfo and details. This is correct as long as bucketData doesn't have additional properties that shouldn't be forwarded (e.g. stale details from a previous continuation). The previous code used Object.assign({}, bucketData, ...) which had the same behavior, so this is not a regression — just noting that the spread inherits all properties from bucketData, which is the intended behavior for continuation entries.

— Claude Code

Yes, this is intentional and matches the previous Object.assign({}, bucketData, ...) behavior. Continuation entries preserve the bucket task envelope and only replace contextInfo / details for the continuation message.

claude · 2026-05-21T14:19:13Z

+
    /**
     * This function forces syncing the latest data mover topic
     * offsets to the 'lifecycle' metrics snapshot. It is called when


Minor: the _applyTransitionRule call in _processCurrents at line 1191 passes ...this._getScanContext(bucketData) into the params object, but _applyTransitionRule then builds context with conductorScanId: params.conductorScanId. This works because _getScanContext returns exactly those two keys. However, the scan context is propagated via two different mechanisms in this file: via _getActionContext (which calls _getScanContext internally) for delete/MPU actions, and via spreading _getScanContext into params objects for transition actions. A single pattern would be slightly less error-prone.

— Claude Code

Fixed. Transition action entries now use the same action-context pattern as the other lifecycle actions: callers pass actionContext: this._getActionContext(bucketData, log, 'transition'), and _getTransitionActionEntry adds that context to the entry. I also updated the V2 transition test to assert the context through actionContext.

claude · 2026-05-21T14:19:27Z

extensions/lifecycle/LifecycleMetrics.js:117 — The conductor_scan_id label creates unbounded Prometheus TSDB cardinality (one UUID per scan). Local prom-client cleanup is handled, but TSDB-side growth should be estimated and documented.
- extensions/lifecycle/LifecycleMetrics.js:18 — Module-level mutable scanMetricRetentionMs is a shared singleton; multiple callers would silently race. Fine under current architecture but worth noting.
- extensions/lifecycle/conductor/LifecycleConductor.js:482 — On throttled scan skips, _currentScanId retains the previous scan's value (harmless since no messages are produced, but latent coupling).
- extensions/lifecycle/tasks/LifecycleTask.js:161 — _getContinuationEntry spread semantics match the old Object.assign behavior; no regression.
- extensions/lifecycle/tasks/LifecycleTask.js:164 — Scan context propagated via two different mechanisms (params spread vs _getActionContext); a single pattern would reduce maintenance risk.

Overall: well-structured change with good test coverage, proper error handling, rolling-upgrade safety for the new context fields, and sensible metric cleanup. The main point to consider is the TSDB cardinality impact of the per-scan-id counter label.

Review by Claude Code

claude · 2026-05-21T14:28:40Z

            attempt,
        });
-        entry.addContext({
+        entry.addContext(params.actionContext || {


The fallback means that if params.actionContext is undefined (callers that haven't been updated), the context won't include scan fields. This is fine for rolling upgrades, but every call site in this PR that calls _applyTransitionRule should pass actionContext. I verified the callers in this PR and they all do — just flagging that any future caller that forgets actionContext will silently lose scan context tracing without any warning.

— Claude Code

Agreed. The fallback has been removed, and current callers now pass actionContext explicitly so missing scan context is not silently hidden.

claude · 2026-05-21T14:28:49Z

    processBucketEntry(bucketLCRules, bucketData, s3target,
    backbeatMetadataProxy, nbRetries, done) {
        const log = this.log.newRequestLogger();
+        if (log.addDefaultFields) {


The guard if (log.addDefaultFields) suggests this might be called with a logger that doesn't have addDefaultFields. However, just a few lines down in the same method, this.log.newRequestLogger() is called — and in the conductor and bucket processor, log.addDefaultFields is called unconditionally. This inconsistency suggests either the guard is unnecessary (and could mask a bug if the wrong logger type is passed), or there's a known code path where a non-request logger is used. If it's the latter, a comment explaining when that happens would help.

— Claude Code

This was addressed: the guard was unnecessary and has been removed, so addDefaultFields is now called consistently with the conductor and bucket processor paths.

claude · 2026-05-21T14:29:03Z

LifecycleMetrics.js:117 — The conductor_scan_id label creates unbounded cardinality in Prometheus TSDB (one new UUID per scan). Process memory is handled by the cleanup timer, but TSDB accumulation at high scan frequencies should be validated against retention sizing.
- LifecycleConductor.js:482 — On the Throttling early-return path, _batchInProgress is not reset (pre-existing). With the new scan state fields preserved intentionally on this path, worth confirming no code path can set _batchInProgress = true then hit Throttling without clearing it.
- LifecycleTask.js:1184 — The params.actionContext || { ... } fallback in _applyTransitionRule means future callers that forget to pass actionContext will silently lose scan tracing. All current callers in this PR are correct.
- LifecycleTask.js:1832 — The if (log.addDefaultFields) guard is inconsistent with other call sites that call addDefaultFields unconditionally. Either the guard is unnecessary or there's an undocumented code path using a non-request logger.

Overall: well-structured change with good test coverage, proper error handling on all metric calls, rolling-upgrade safety for the new context fields, and thorough dashboard/alert additions.

Review by Claude Code

Carry conductor scan start timestamps through lifecycle bucket messages, task contexts, and action entries for consistent downstream observability. Issue: BB-740

Record bucket-task message pickup by conductor scan id before scheduler dispatch, observe message age, and make per-scan metric retention configurable. Issue: BB-740

Separate stalled-scan thresholds from lifecycle latency and alert only when bucket processors handle multiple scan ids in parallel. Issue: BB-740

Clarify bucket processor pickup panels, split pod and scan-id views, and render scan message age as a heatmap. Issue: BB-740

claude · 2026-05-21T14:40:11Z

+            const unknownCanonicalIds = this._accountIdCache.getMisses();
            if (err && err.Throttling) {
-                log.info('not starting new lifecycle batch', { reason: err });
+                if (scanStarted) {


When _controlBacklog throttles but scanStarted is true (meaning the waterfall step that sets scanStarted = true already ran, then listBuckets returned a Throttling error), this block resets _batchInProgress = false and clears _currentScanId/_currentScanStartTimestamp. That's correct for cleanup.

However, this block does not call LifecycleMetrics.onProcessBuckets to undo the heartbeat set at line 446, and it does not call onConductorScanComplete. This means s3_lifecycle_latest_batch_start_time was already set (line 446), but s3_lifecycle_latest_batch_end_time never gets updated. The LifecycleStalledScan alert will fire because start_time > end_time and enough wall-clock time passes — even though the conductor is not actually stalled, it just got throttled mid-scan. Consider calling onConductorScanComplete here too (with the partial totalBucketsListed count and elapsed time) so the alert self-resolves.

— Claude Code

I updated the conductor so that when throttling happens after a scan has already started, we still record the scan end metrics before clearing the in-progress scan state. This avoids leaving latest_batch_start_time > latest_batch_end_time and prevents LifecycleStalledScan from firing for a scan that already exited.

claude · 2026-05-21T14:40:31Z

+        'configured bucket processor retention interval without update to ' +
+        'avoid unbounded process memory growth. ' +
+        'Prometheus retains scraped scan-id series until TSDB retention.',
+    labelNames: [LIFECYCLE_LABEL_ORIGIN, LIFECYCLE_LABEL_CONDUCTOR_SCAN_ID],


The conductor_scan_id label on bucketProcessorScanMessagesProcessed can take a unique UUID value per scan. Since each scan generates a new UUID, this creates a new time series for every scan. While the cleanup timer in setScanMetricTimeout bounds prom-client's in-process memory, Prometheus server will retain all scraped series until TSDB retention expires. For a system scanning every few minutes, this could produce thousands of unique label values per day. Consider whether the LifecycleBucketProcessorMultipleParallelScans alert (which is the main consumer of this cardinality) could be achieved with a lower-cardinality approach, e.g. a gauge tracking the count of distinct active scans rather than one series per scan UUID.

— Claude Code

This is a known trade-off and matches the compromise discussed with François earlier. We keep conductor_scan_id as a label because it makes overlapping scans visible if the expected “one scan at a time” invariant breaks. The prom-client side is bounded by configurable local cleanup, and the help text explicitly notes that Prometheus retains scraped scan-id series until TSDB retention. So I do not think we should replace this with a lower-cardinality gauge in this PR unless we decide to change that accepted trade-off.

Issue: BB-740

claude · 2026-05-21T15:20:52Z

+                    this._batchInProgress = false;
+                    this._currentScanId = null;
+                    this._currentScanStartTimestamp = null;
+                    LifecycleMetrics.onConductorScanComplete(


When the scan was started (scanStarted = true) and a Throttling error occurs during listing, onConductorScanComplete is called. But on the non-throttling error path (line ~502–506), the same logic runs — this is correct. However, I notice the _currentScanId is reset to null inside the Throttling branch but _batchInProgress is also set to false. If a concurrent processBuckets call were to sneak in between these assignments (Node.js is single-threaded so this can't actually happen), it could see inconsistent state. As-is this is fine for single-threaded Node.js, just noting for future multi-instance awareness.

— Claude Code

claude · 2026-05-21T15:21:07Z

-            ruleType: 'transition',
-            reqId: log.getSerializedUids(),
-        });
+        entry.addContext(params.actionContext);


_applyTransitionRule receives params.actionContext and passes it directly to entry.addContext(). Callers construct actionContext via this._getActionContext(bucketData, log, 'transition') which always includes origin, ruleType, reqId, and scan context. However, nothing prevents a caller from omitting actionContext entirely (it's not validated), in which case entry.addContext(undefined) would be called. Consider adding a guard or documenting actionContext as required in the JSDoc for _applyTransitionRule.

— Claude Code

claude · 2026-05-21T15:21:08Z

+const DEFAULT_SCAN_METRIC_RETENTION_MS = 24 * 60 * 60 * 1000;
+const CONDUCTOR_ORIGIN = 'conductor';
+const BUCKET_PROCESSOR_ORIGIN = 'bucket_processor';
+let scanMetricRetentionMs = DEFAULT_SCAN_METRIC_RETENTION_MS;


scanMetricRetentionMs is module-level mutable state shared across all callers in the same process. configureLifecycleScanMetricRetention is called from LifecycleBucketProcessor constructor. If a process ever created two LifecycleBucketProcessor instances with different configs, the last one wins silently. This is probably fine for the current single-instance usage, but worth noting.

— Claude Code

claude · 2026-05-21T15:21:25Z

PR Review Summary

This PR adds comprehensive monitoring for lifecycle conductor scans: a UUID-based scan ID propagated through conductor → bucket processor → action tasks, new Prometheus metrics (batch duration, bucket count, message age histogram, per-scan message counter), new alerts (LifecycleStalledScan, LifecycleBucketProcessorMultipleParallelScans), and Grafana dashboard panels. The scan context flows cleanly through Kafka messages and is attached to all downstream action entries and log lines. Test coverage is thorough.

- extensions/lifecycle/LifecycleMetrics.js:117 — The conductor_scan_id label on bucketProcessorScanMessagesProcessed creates one time series per UUID scan. Although prom-client cleanup timers are in place, Prometheus TSDB retains scraped series until retention expires, which can cause cardinality churn. Consider documenting the expected impact for operators.
- extensions/lifecycle/LifecycleMetrics.js:18 — scanMetricRetentionMs is module-level mutable state; multiple LifecycleBucketProcessor instances in the same process would silently overwrite each other's retention config.
- extensions/lifecycle/tasks/LifecycleTask.js:1184 — entry.addContext(params.actionContext) passes the context directly without validation. If a caller omits actionContext, addContext(undefined) is called. Consider guarding or documenting it as required.
- extensions/lifecycle/conductor/LifecycleConductor.js:475 — Throttling-after-scan-start error path correctly calls onConductorScanComplete and resets state (single-threaded safe).

Overall the implementation is solid — well-structured metrics with proper error handling, rolling-upgrade safety for missing fields, and good alert design. No blocking issues found.

Review by Claude Code

benzekrimaha · 2026-05-21T15:29:20Z

PR is kept as-is for these points:

The conductor_scan_id cardinality is a known trade-off already discussed : we keep it because it exposes overlapping scans if the expected “one scan at a time” invariant breaks, while prom-client memory is bounded by configurable cleanup. The help text also calls out Prometheus TSDB retention.
scanMetricRetentionMs being module-level is fine for the current single bucket-processor configuration per process. If we ever support multiple differently configured instances in the same process, that would deserve a separate refactor. ( AFAIK there is no current path with multiple differently configured bucket-processor instances in the same process. If that process model changes later, we should move the retention value into instance-owned state, but I do not think this needs to block this PR.)
For entry.addContext(params.actionContext), all current transition callers pass actionContext explicitly. The previous fallback was removed intentionally so missing scan context is not silently hidden.
I do not see any additional code change needed.

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from 380069a to 25ea9d5 Compare March 2, 2026 16:32

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch 5 times, most recently from 8316f88 to 408c96c Compare March 11, 2026 16:03

benzekrimaha marked this pull request as ready for review March 11, 2026 16:35

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from 408c96c to e1c5b13 Compare March 11, 2026 16:48

benzekrimaha requested review from a team, SylvainSenechal and francoisferrand March 13, 2026 08:49

francoisferrand requested a review from delthas March 18, 2026 09:19

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from e1c5b13 to aefb677 Compare March 18, 2026 10:04

benzekrimaha changed the title ~~Improvement/bb 740 monitor lifecycle conductor~~ Improvement/BB-740 monitor lifecycle conductor Mar 18, 2026

francoisferrand changed the title ~~Improvement/BB-740 monitor lifecycle conductor~~ monitor lifecycle conductor Mar 18, 2026

delthas reviewed Mar 18, 2026

View reviewed changes

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js Outdated

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js Outdated

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js Outdated

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from 725c3df to 11a94ea Compare March 19, 2026 09:36

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js Outdated

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js Outdated

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js Outdated

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from a2128cf to a464b39 Compare March 19, 2026 09:43

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js Outdated

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread tests/unit/lifecycle/LifecycleConductor.spec.js Outdated

claude Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread tests/unit/lifecycle/LifecycleConductor.spec.js Outdated

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js

claude Bot reviewed May 21, 2026

View reviewed changes

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from 85f1055 to 4da31b3 Compare May 21, 2026 14:26

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/lifecycle/LifecycleMetrics.js

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js

claude Bot reviewed May 21, 2026

View reviewed changes

benzekrimaha added 4 commits May 21, 2026 16:35

Propagate lifecycle scan timestamps.

43d3f90

Carry conductor scan start timestamps through lifecycle bucket messages, task contexts, and action entries for consistent downstream observability. Issue: BB-740

Track lifecycle bucket task pickups.

8a19b71

Record bucket-task message pickup by conductor scan id before scheduler dispatch, observe message age, and make per-scan metric retention configurable. Issue: BB-740

Refine lifecycle scan alerts.

d7000eb

Separate stalled-scan thresholds from lifecycle latency and alert only when bucket processors handle multiple scan ids in parallel. Issue: BB-740

Refine lifecycle scan dashboard panels.

4520f5a

Clarify bucket processor pickup panels, split pod and scan-id views, and render scan message age as a heatmap. Issue: BB-740

benzekrimaha force-pushed the improvement/BB-740-monitor-lifecycle-conductor branch from 4da31b3 to 4520f5a Compare May 21, 2026 14:38

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/lifecycle/conductor/LifecycleConductor.js

claude Bot reviewed May 21, 2026

View reviewed changes

Close lifecycle scans on errors.

5d84665

Issue: BB-740

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/lifecycle/LifecycleMetrics.js

claude Bot reviewed May 21, 2026

View reviewed changes

scality deleted a comment from claude Bot May 21, 2026

benzekrimaha requested review from a team and francoisferrand May 21, 2026 15:29

Conversation

benzekrimaha commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bert-e commented Mar 2, 2026

Hello benzekrimaha,

Uh oh!

bert-e commented Mar 2, 2026

Incorrect fix version

Uh oh!

codecov Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 21, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 21, 2026

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

benzekrimaha commented Mar 2, 2026 •

edited

Loading

codecov Bot commented Mar 2, 2026 •

edited

Loading