Skip to content

CCDB-4523: Strip config.providers from kafka-ready preflight check#1667

Open
kunal sevkani (kunalmnnit) wants to merge 11 commits into
masterfrom
fix/ccdb-4523-resilient-config-provider-preflight
Open

CCDB-4523: Strip config.providers from kafka-ready preflight check#1667
kunal sevkani (kunalmnnit) wants to merge 11 commits into
masterfrom
fix/ccdb-4523-resilient-config-provider-preflight

Conversation

@kunalmnnit
Copy link
Copy Markdown
Member

@kunalmnnit kunal sevkani (kunalmnnit) commented May 6, 2026

Summary

  • Makes KafkaReadyCommand resilient to config provider class loading failures during the kafka-ready preflight check
  • Uses a try/catch/retry strategy that handles all scenarios correctly
  • Adds 23 unit tests covering helpers and orchestration logic

Problem

When a Connect worker config includes config.providers entries (e.g. for AWS Secrets Manager), the kafka-ready preflight check fails with ClassNotFoundException because config provider JARs live on the worker plugin.path, not on CUB_CLASSPATH/UB_CLASSPATH.

Fix

checkKafkaReadyWithConfigProviderResilience() implements a four-step strategy:

  1. No config.providers in config -> run check normally (fast path)
  2. Config providers present -> try with full config (works if provider JARs are on classpath)
  3. ConfigException from class loading + Kafka client connectivity properties (security., sasl., ssl.*) have unresolved ${provider:...} refs -> SKIP the check with a WARN (Connect worker will verify connectivity on startup)
  4. ConfigException from class loading + all client connectivity values are literals -> strip config.providers and RETRY

This ensures:

  • Scenario A (providers for connector secrets, literal security props): retries successfully
  • Scenario B (providers for security props like SSL passwords): skips gracefully with warning
  • Scenario C (providers on classpath): works on first try, no behavior change

Test plan

  • 23 unit tests: stripConfigProviders, hasConfigProviders, isConfigProviderLoadFailure (CNFE, NCDFE, nested cause, message fallback, SASL/SSL negative cases), findUnresolvedConfigProviderVars (client keys, non-client keys, non-provider placeholders), orchestration (5 scenarios via KafkaReadyChecker functional interface)
  • Integration: build Connect Docker image with custom config provider on plugin.path, verify kafka-ready no longer crashes

Fixes: https://confluentinc.atlassian.net/browse/CCDB-4523
Related: https://confluentinc.atlassian.net/browse/DP-6628

When the Connect worker config includes config.providers entries (e.g.
for AWS Secrets Manager), the kafka-ready preflight check fails with
ClassNotFoundException because the config provider plugin JARs are on
the worker's plugin.path but not on the CUB/UB classpath.

The kafka-ready check only needs Kafka client properties (bootstrap
servers, security config) to verify broker connectivity. Config
providers are irrelevant for this check, so strip them before creating
the AdminClient. Removed entries are logged at WARN level.
Copilot AI review requested due to automatic review settings May 6, 2026 07:13
@kunalmnnit kunal sevkani (kunalmnnit) requested a review from a team as a code owner May 6, 2026 07:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the kafka-ready CLI preflight check to ignore Kafka Connect config provider properties (which can trigger ClassNotFoundException during AdminClient initialization when provider JARs aren’t on the preflight classpath), and adds unit tests for the stripping behavior.

Changes:

  • Strip config.providers and config.providers.* keys from the loaded --config properties before running the readiness check.
  • Add WARN logging to make stripped properties visible to operators.
  • Add a new unit test class covering the stripping behavior across multiple scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Strips config-provider properties prior to calling the readiness check; adds operator-facing WARN logs.
utility-belt/src/test/java/io/confluent/admin/utils/cli/KafkaReadyCommandTest.java Adds unit tests for config-provider stripping behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
…ass loading errors

When the Connect worker config includes config.providers entries (e.g.
for AWS Secrets Manager), the kafka-ready preflight check fails with
ClassNotFoundException because the config provider plugin JARs are on
the worker's plugin.path but not on the CUB/UB classpath.

The fix uses a try/catch/retry strategy:
1. Try the check with the full config (works if providers are on classpath)
2. On ConfigException from class loading, strip config.providers entries
3. If remaining properties have unresolved ${provider:...} variable
   references (security props depend on config providers), skip the
   check with a WARN — the Connect worker will verify connectivity
   on startup
4. Otherwise, retry the check without config.providers
…k trace

- Consolidate per-key WARN logs into a single message with count + key list
- Replace Unicode em dash with plain hyphen for ASCII-safe log parsing
- Include exception stack trace in config provider load failure warning
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
  separator (${provider:path[:key]}), avoiding false matches on
  non-provider ${...} placeholders like ${ENV_VAR}
- Check exception cause chain for ClassNotFoundException /
  NoClassDefFoundError instead of brittle message string matching;
  keep message check as fallback
- Add 4 new tests: ClassNotFoundException cause, NoClassDefFoundError
  cause, nested cause chain, non-provider placeholder filtering
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

…, cause chain check

- Extract ClusterStatus.isKafkaReady behind KafkaReadyChecker functional
  interface so checkKafkaReadyWithConfigProviderResilience orchestration
  logic can be unit tested with mocked checkers
- Add 5 orchestration tests covering all scenarios: no config providers,
  providers loadable (first try succeeds), class not found with literal
  props (strip + retry), class not found with unresolved vars (skip),
  non-provider ConfigException (rethrow)
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
  separator, avoiding false matches on non-provider ${...} placeholders
- Check exception cause chain for ClassNotFoundException /
  NoClassDefFoundError instead of brittle message string matching;
  keep message check as fallback for truncated cause chains
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Move ConfigException import into org.apache.kafka.* group for
  consistent import ordering
- Require exception message to mention "config.providers" before
  checking cause chain for ClassNotFoundException/NoClassDefFoundError,
  so SASL/SSL handler class loading failures are not misclassified
- Add tests for SASL and SSL CNFE cases that must NOT match
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

- Fix 2 test failures: mock ConfigException messages now include
  "config.providers" to match the tightened isConfigProviderLoadFailure
  predicate from round 3
- Limit findUnresolvedConfigProviderVars to Kafka client connectivity
  keys only (security.*, sasl.*, ssl.*, bootstrap.servers) so
  unresolved references in non-client keys (e.g. connector configs)
  don't cause the preflight to be skipped unnecessarily
- Add test for non-client keys being ignored
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

…s in message

The resilience_classNotFound_unresolvedVars_skipsWithWarning test was
still using a bare "Could not load" message without "config.providers",
which the tightened isConfigProviderLoadFailure predicate rejects.
Convert KafkaReadyCommandTest from JUnit 5 to JUnit 4 for consistency
with the rest of the module. Add junit-vintage-engine dependency so
JUnit 4 tests are discovered by the JUnit Platform surefire provider.
…ures

The vintage engine causes surefire to discover existing JUnit 4 tests
that have a broken EmbeddedKafkaCluster setup. Removing it keeps the
PR scoped to the kafka-ready fix without inheriting unrelated failures.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Operate on a copy of workerProps when stripping config.providers,
  so the caller's map is never mutated
- Update javadoc to document fast-path and no-mutation guarantee
- Update test assertion to verify original map is untouched
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants