CCDB-4523: Strip config.providers from kafka-ready preflight check#1667
Open
kunal sevkani (kunalmnnit) wants to merge 11 commits into
Open
CCDB-4523: Strip config.providers from kafka-ready preflight check#1667kunal sevkani (kunalmnnit) wants to merge 11 commits into
kunal sevkani (kunalmnnit) wants to merge 11 commits into
Conversation
When the Connect worker config includes config.providers entries (e.g. for AWS Secrets Manager), the kafka-ready preflight check fails with ClassNotFoundException because the config provider plugin JARs are on the worker's plugin.path but not on the CUB/UB classpath. The kafka-ready check only needs Kafka client properties (bootstrap servers, security config) to verify broker connectivity. Config providers are irrelevant for this check, so strip them before creating the AdminClient. Removed entries are logged at WARN level.
There was a problem hiding this comment.
Pull request overview
Updates the kafka-ready CLI preflight check to ignore Kafka Connect config provider properties (which can trigger ClassNotFoundException during AdminClient initialization when provider JARs aren’t on the preflight classpath), and adds unit tests for the stripping behavior.
Changes:
- Strip
config.providersandconfig.providers.*keys from the loaded--configproperties before running the readiness check. - Add WARN logging to make stripped properties visible to operators.
- Add a new unit test class covering the stripping behavior across multiple scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java | Strips config-provider properties prior to calling the readiness check; adds operator-facing WARN logs. |
| utility-belt/src/test/java/io/confluent/admin/utils/cli/KafkaReadyCommandTest.java | Adds unit tests for config-provider stripping behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ass loading errors
When the Connect worker config includes config.providers entries (e.g.
for AWS Secrets Manager), the kafka-ready preflight check fails with
ClassNotFoundException because the config provider plugin JARs are on
the worker's plugin.path but not on the CUB/UB classpath.
The fix uses a try/catch/retry strategy:
1. Try the check with the full config (works if providers are on classpath)
2. On ConfigException from class loading, strip config.providers entries
3. If remaining properties have unresolved ${provider:...} variable
references (security props depend on config providers), skip the
check with a WARN — the Connect worker will verify connectivity
on startup
4. Otherwise, retry the check without config.providers
…k trace - Consolidate per-key WARN logs into a single message with count + key list - Replace Unicode em dash with plain hyphen for ASCII-safe log parsing - Include exception stack trace in config provider load failure warning
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
separator (${provider:path[:key]}), avoiding false matches on
non-provider ${...} placeholders like ${ENV_VAR}
- Check exception cause chain for ClassNotFoundException /
NoClassDefFoundError instead of brittle message string matching;
keep message check as fallback
- Add 4 new tests: ClassNotFoundException cause, NoClassDefFoundError
cause, nested cause chain, non-provider placeholder filtering
…, cause chain check
- Extract ClusterStatus.isKafkaReady behind KafkaReadyChecker functional
interface so checkKafkaReadyWithConfigProviderResilience orchestration
logic can be unit tested with mocked checkers
- Add 5 orchestration tests covering all scenarios: no config providers,
providers loadable (first try succeeds), class not found with literal
props (strip + retry), class not found with unresolved vars (skip),
non-provider ConfigException (rethrow)
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
separator, avoiding false matches on non-provider ${...} placeholders
- Check exception cause chain for ClassNotFoundException /
NoClassDefFoundError instead of brittle message string matching;
keep message check as fallback for truncated cause chains
- Move ConfigException import into org.apache.kafka.* group for consistent import ordering - Require exception message to mention "config.providers" before checking cause chain for ClassNotFoundException/NoClassDefFoundError, so SASL/SSL handler class loading failures are not misclassified - Add tests for SASL and SSL CNFE cases that must NOT match
- Fix 2 test failures: mock ConfigException messages now include "config.providers" to match the tightened isConfigProviderLoadFailure predicate from round 3 - Limit findUnresolvedConfigProviderVars to Kafka client connectivity keys only (security.*, sasl.*, ssl.*, bootstrap.servers) so unresolved references in non-client keys (e.g. connector configs) don't cause the preflight to be skipped unnecessarily - Add test for non-client keys being ignored
…s in message The resilience_classNotFound_unresolvedVars_skipsWithWarning test was still using a bare "Could not load" message without "config.providers", which the tightened isConfigProviderLoadFailure predicate rejects.
Convert KafkaReadyCommandTest from JUnit 5 to JUnit 4 for consistency with the rest of the module. Add junit-vintage-engine dependency so JUnit 4 tests are discovered by the JUnit Platform surefire provider.
…ures The vintage engine causes surefire to discover existing JUnit 4 tests that have a broken EmbeddedKafkaCluster setup. Removing it keeps the PR scoped to the kafka-ready fix without inheriting unrelated failures.
- Operate on a copy of workerProps when stripping config.providers, so the caller's map is never mutated - Update javadoc to document fast-path and no-mutation guarantee - Update test assertion to verify original map is untouched
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
When a Connect worker config includes config.providers entries (e.g. for AWS Secrets Manager), the kafka-ready preflight check fails with ClassNotFoundException because config provider JARs live on the worker plugin.path, not on CUB_CLASSPATH/UB_CLASSPATH.
Fix
checkKafkaReadyWithConfigProviderResilience() implements a four-step strategy:
This ensures:
Test plan
Fixes: https://confluentinc.atlassian.net/browse/CCDB-4523
Related: https://confluentinc.atlassian.net/browse/DP-6628