Skip to content

LOG-8972: Enhance cluster-logging-operator to react to cluster TLS Profile updates#3228

Open
jcantrill wants to merge 1 commit intoopenshift:masterfrom
jcantrill:LOG-8972
Open

LOG-8972: Enhance cluster-logging-operator to react to cluster TLS Profile updates#3228
jcantrill wants to merge 1 commit intoopenshift:masterfrom
jcantrill:LOG-8972

Conversation

@jcantrill
Copy link
Copy Markdown
Contributor

@jcantrill jcantrill commented Mar 19, 2026

Summary

This PR enhances the cluster-logging-operator to react to cluster TLS Profile updates, ensuring both the operator itself and deployed collectors use the cluster's TLS security configuration.

Fixes LOG-8972

Changes

Operator's Own TLS Configuration

  • TLS Conversion Helpers: Added functions to convert OpenShift TLSProfileSpec to Go crypto/tls.Config
  • Metrics Server: Operator's metrics endpoint now uses cluster TLS profile (cipher suites and min TLS version)
  • TLS Profile Watcher: New controller that watches for APIServer TLS profile changes and gracefully restarts the operator pod to apply new configuration

Collector TLS Configuration

  • APIServer Watch: ClusterLogForwarder controller now watches APIServer for TLS profile changes
  • Automatic Reconciliation: All ClusterLogForwarders are reconciled when cluster TLS profile changes
  • Collector Rollout: Collector configs are regenerated and pods are rolled out with updated TLS configuration

Implementation Details

Part A: Operator TLS Configuration

  1. Added internal/tls/tls.go conversion functions:

    • CipherSuiteStringToID: Convert cipher suite names to crypto/tls IDs
    • TLSVersionToConstant: Convert TLS version strings to crypto/tls constants
    • TLSConfigFromProfile: Create crypto/tls.Config from TLSProfileSpec
    • GetTLSConfigOptions: Get TLS options for controller-runtime manager
  2. Created internal/controller/tlsprofile/ watcher controller:

    • Monitors APIServer resource for TLS profile changes
    • Compares current profile to initial startup profile
    • Gracefully exits operator pod when changes detected
    • Kubernetes automatically restarts pod with new configuration
  3. Updated cmd/main.go:

    • Fetch cluster TLS profile at startup
    • Configure metrics server with cluster TLS profile
    • Register TLS profile watcher controller

Part B: Collector TLS Configuration

  1. Updated internal/controller/observability/clusterlogforwarder_controller.go:
    • Added watch on config.openshift.io/v1/APIServer
    • Implemented event handler to enqueue all CLFs on TLS profile changes
    • Added predicate to filter only TLS profile change events

Behavior

When Cluster TLS Profile Changes

Operator:

  1. TLS profile watcher detects the change
  2. Operator pod exits gracefully (exit code 0)
  3. Kubernetes restarts the pod (~10 seconds downtime)
  4. New pod applies updated TLS configuration to metrics server

Collectors:

  1. All ClusterLogForwarders are enqueued for reconciliation
  2. Each CLF regenerates collector config with new TLS profile
  3. Collector pods are rolled out using Kubernetes rolling update
  4. No log loss (gradual rollout ensures availability)

TLS Profile Precedence

  1. Output-specific profile (highest priority)
  2. Cluster TLS profile (default when output doesn't specify)

Testing

Unit Tests

  • ✅ TLS conversion functions: 12/12 tests passing
  • ✅ TLS profile watcher controller: 5/5 tests passing

Verification

  • make build - Success
  • make lint - 0 issues
  • ✅ All new unit tests passing

Manual Testing Recommendations

  1. Deploy operator in a test cluster
  2. Verify metrics endpoint uses cluster TLS profile
  3. Change cluster TLS profile: oc patch apiserver cluster --type=merge -p '{"spec":{"tlsSecurityProfile":{"type":"Modern"}}}'
  4. Verify operator pod restarts
  5. Create CLF without output-specific TLS profile
  6. Verify collector uses cluster TLS profile
  7. Change cluster TLS profile again
  8. Verify collector pods are rolled out

RBAC

No changes needed - existing ClusterRole already has permissions to read APIServer resources.

Notes

  • Why restart the operator? The controller-runtime manager's TLS configuration is set at creation time and cannot be dynamically updated. Restarting is the cleanest way to apply new TLS settings.

  • Impact on running collectors: The operator restart does not affect running collectors. They continue operating normally during the brief restart period.

  • TLS Curves: OpenShift's TLSProfileSpec doesn't have a separate field for EC curves. Curves are implicitly controlled by cipher suites (e.g., ECDHE cipher suites use EC curves).

  • Graceful degradation: If APIServer cannot be fetched, operator logs a warning and uses default TLS configuration (TLS 1.2).

Commits

  1. feat(tls): Add TLS profile conversion helpers for crypto/tls config
  2. feat(controller): Add TLS profile watcher to restart operator on changes
  3. feat(operator): Apply cluster TLS profile to metrics endpoint
  4. feat(controller): Watch APIServer TLS profile for collector updates

Documentation

Follow-up PR needed to update docs/features/tls_security_profile.adoc with:

  • Operator metrics endpoint uses cluster TLS profile
  • Automatic reaction to cluster TLS profile changes
  • Operator restart behavior
  • Minimal disruption during updates

🤖 Generated with Claude Code via /jira:solve [LOG-8972](https://redhat.atlassian.net/browse/LOG-8972)

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 19, 2026

@jcantrill: This pull request references LOG-8972 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

Details

In response to this:

Summary

This PR enhances the cluster-logging-operator to react to cluster TLS Profile updates, ensuring both the operator itself and deployed collectors use the cluster's TLS security configuration.

Fixes LOG-8972

Changes

Operator's Own TLS Configuration

  • TLS Conversion Helpers: Added functions to convert OpenShift TLSProfileSpec to Go crypto/tls.Config
  • Metrics Server: Operator's metrics endpoint now uses cluster TLS profile (cipher suites and min TLS version)
  • TLS Profile Watcher: New controller that watches for APIServer TLS profile changes and gracefully restarts the operator pod to apply new configuration

Collector TLS Configuration

  • APIServer Watch: ClusterLogForwarder controller now watches APIServer for TLS profile changes
  • Automatic Reconciliation: All ClusterLogForwarders are reconciled when cluster TLS profile changes
  • Collector Rollout: Collector configs are regenerated and pods are rolled out with updated TLS configuration

Implementation Details

Part A: Operator TLS Configuration

  1. Added internal/tls/tls.go conversion functions:
  • CipherSuiteStringToID: Convert cipher suite names to crypto/tls IDs
  • TLSVersionToConstant: Convert TLS version strings to crypto/tls constants
  • TLSConfigFromProfile: Create crypto/tls.Config from TLSProfileSpec
  • GetTLSConfigOptions: Get TLS options for controller-runtime manager
  1. Created internal/controller/tlsprofile/ watcher controller:
  • Monitors APIServer resource for TLS profile changes
  • Compares current profile to initial startup profile
  • Gracefully exits operator pod when changes detected
  • Kubernetes automatically restarts pod with new configuration
  1. Updated cmd/main.go:
  • Fetch cluster TLS profile at startup
  • Configure metrics server with cluster TLS profile
  • Register TLS profile watcher controller

Part B: Collector TLS Configuration

  1. Updated internal/controller/observability/clusterlogforwarder_controller.go:
  • Added watch on config.openshift.io/v1/APIServer
  • Implemented event handler to enqueue all CLFs on TLS profile changes
  • Added predicate to filter only TLS profile change events

Behavior

When Cluster TLS Profile Changes

Operator:

  1. TLS profile watcher detects the change
  2. Operator pod exits gracefully (exit code 0)
  3. Kubernetes restarts the pod (~10 seconds downtime)
  4. New pod applies updated TLS configuration to metrics server

Collectors:

  1. All ClusterLogForwarders are enqueued for reconciliation
  2. Each CLF regenerates collector config with new TLS profile
  3. Collector pods are rolled out using Kubernetes rolling update
  4. No log loss (gradual rollout ensures availability)

TLS Profile Precedence

  1. Output-specific profile (highest priority)
  2. Cluster TLS profile (default when output doesn't specify)

Testing

Unit Tests

  • ✅ TLS conversion functions: 12/12 tests passing
  • ✅ TLS profile watcher controller: 5/5 tests passing

Verification

  • make build - Success
  • make lint - 0 issues
  • ✅ All new unit tests passing

Manual Testing Recommendations

  1. Deploy operator in a test cluster
  2. Verify metrics endpoint uses cluster TLS profile
  3. Change cluster TLS profile: oc patch apiserver cluster --type=merge -p '{"spec":{"tlsSecurityProfile":{"type":"Modern"}}}'
  4. Verify operator pod restarts
  5. Create CLF without output-specific TLS profile
  6. Verify collector uses cluster TLS profile
  7. Change cluster TLS profile again
  8. Verify collector pods are rolled out

RBAC

No changes needed - existing ClusterRole already has permissions to read APIServer resources.

Notes

  • Why restart the operator? The controller-runtime manager's TLS configuration is set at creation time and cannot be dynamically updated. Restarting is the cleanest way to apply new TLS settings.

  • Impact on running collectors: The operator restart does not affect running collectors. They continue operating normally during the brief restart period.

  • TLS Curves: OpenShift's TLSProfileSpec doesn't have a separate field for EC curves. Curves are implicitly controlled by cipher suites (e.g., ECDHE cipher suites use EC curves).

  • Graceful degradation: If APIServer cannot be fetched, operator logs a warning and uses default TLS configuration (TLS 1.2).

Commits

  1. feat(tls): Add TLS profile conversion helpers for crypto/tls config
  2. feat(controller): Add TLS profile watcher to restart operator on changes
  3. feat(operator): Apply cluster TLS profile to metrics endpoint
  4. feat(controller): Watch APIServer TLS profile for collector updates

Documentation

Follow-up PR needed to update docs/features/tls_security_profile.adoc with:

  • Operator metrics endpoint uses cluster TLS profile
  • Automatic reaction to cluster TLS profile changes
  • Operator restart behavior
  • Minimal disruption during updates

🤖 Generated with Claude Code via /jira:solve [LOG-8972](https://redhat.atlassian.net/browse/LOG-8972)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 19, 2026

@jcantrill: This pull request references LOG-8972 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

Details

In response to this:

Summary

This PR enhances the cluster-logging-operator to react to cluster TLS Profile updates, ensuring both the operator itself and deployed collectors use the cluster's TLS security configuration.

Fixes LOG-8972

Changes

Operator's Own TLS Configuration

  • TLS Conversion Helpers: Added functions to convert OpenShift TLSProfileSpec to Go crypto/tls.Config
  • Metrics Server: Operator's metrics endpoint now uses cluster TLS profile (cipher suites and min TLS version)
  • TLS Profile Watcher: New controller that watches for APIServer TLS profile changes and gracefully restarts the operator pod to apply new configuration

Collector TLS Configuration

  • APIServer Watch: ClusterLogForwarder controller now watches APIServer for TLS profile changes
  • Automatic Reconciliation: All ClusterLogForwarders are reconciled when cluster TLS profile changes
  • Collector Rollout: Collector configs are regenerated and pods are rolled out with updated TLS configuration

Implementation Details

Part A: Operator TLS Configuration

  1. Added internal/tls/tls.go conversion functions:
  • CipherSuiteStringToID: Convert cipher suite names to crypto/tls IDs
  • TLSVersionToConstant: Convert TLS version strings to crypto/tls constants
  • TLSConfigFromProfile: Create crypto/tls.Config from TLSProfileSpec
  • GetTLSConfigOptions: Get TLS options for controller-runtime manager
  1. Created internal/controller/tlsprofile/ watcher controller:
  • Monitors APIServer resource for TLS profile changes
  • Compares current profile to initial startup profile
  • Gracefully exits operator pod when changes detected
  • Kubernetes automatically restarts pod with new configuration
  1. Updated cmd/main.go:
  • Fetch cluster TLS profile at startup
  • Configure metrics server with cluster TLS profile
  • Register TLS profile watcher controller

Part B: Collector TLS Configuration

  1. Updated internal/controller/observability/clusterlogforwarder_controller.go:
  • Added watch on config.openshift.io/v1/APIServer
  • Implemented event handler to enqueue all CLFs on TLS profile changes
  • Added predicate to filter only TLS profile change events

Behavior

When Cluster TLS Profile Changes

Operator:

  1. TLS profile watcher detects the change
  2. Operator pod exits gracefully (exit code 0)
  3. Kubernetes restarts the pod (~10 seconds downtime)
  4. New pod applies updated TLS configuration to metrics server

Collectors:

  1. All ClusterLogForwarders are enqueued for reconciliation
  2. Each CLF regenerates collector config with new TLS profile
  3. Collector pods are rolled out using Kubernetes rolling update
  4. No log loss (gradual rollout ensures availability)

TLS Profile Precedence

  1. Output-specific profile (highest priority)
  2. Cluster TLS profile (default when output doesn't specify)

Testing

Unit Tests

  • ✅ TLS conversion functions: 12/12 tests passing
  • ✅ TLS profile watcher controller: 5/5 tests passing

Verification

  • make build - Success
  • make lint - 0 issues
  • ✅ All new unit tests passing

Manual Testing Recommendations

  1. Deploy operator in a test cluster
  2. Verify metrics endpoint uses cluster TLS profile
  3. Change cluster TLS profile: oc patch apiserver cluster --type=merge -p '{"spec":{"tlsSecurityProfile":{"type":"Modern"}}}'
  4. Verify operator pod restarts
  5. Create CLF without output-specific TLS profile
  6. Verify collector uses cluster TLS profile
  7. Change cluster TLS profile again
  8. Verify collector pods are rolled out

RBAC

No changes needed - existing ClusterRole already has permissions to read APIServer resources.

Notes

  • Why restart the operator? The controller-runtime manager's TLS configuration is set at creation time and cannot be dynamically updated. Restarting is the cleanest way to apply new TLS settings.

  • Impact on running collectors: The operator restart does not affect running collectors. They continue operating normally during the brief restart period.

  • TLS Curves: OpenShift's TLSProfileSpec doesn't have a separate field for EC curves. Curves are implicitly controlled by cipher suites (e.g., ECDHE cipher suites use EC curves).

  • Graceful degradation: If APIServer cannot be fetched, operator logs a warning and uses default TLS configuration (TLS 1.2).

Commits

  1. feat(tls): Add TLS profile conversion helpers for crypto/tls config
  2. feat(controller): Add TLS profile watcher to restart operator on changes
  3. feat(operator): Apply cluster TLS profile to metrics endpoint
  4. feat(controller): Watch APIServer TLS profile for collector updates

Documentation

Follow-up PR needed to update docs/features/tls_security_profile.adoc with:

  • Operator metrics endpoint uses cluster TLS profile
  • Automatic reaction to cluster TLS profile changes
  • Operator restart behavior
  • Minimal disruption during updates

🤖 Generated with Claude Code via /jira:solve [LOG-8972](https://redhat.atlassian.net/browse/LOG-8972)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 19, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the midstream/Dockerfile A Dockerfile.in sync is needed with midstream label Mar 19, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2026
@jcantrill jcantrill force-pushed the LOG-8972 branch 2 times, most recently from 3e8e69a to 18d5156 Compare March 19, 2026 18:17
@jcantrill
Copy link
Copy Markdown
Contributor Author

/test all

@jcantrill jcantrill marked this pull request as ready for review March 20, 2026 18:19
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2026
@openshift-ci openshift-ci bot requested review from alanconway and vparfonov March 20, 2026 18:20
@jcantrill
Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 23, 2026

@jcantrill: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@anpingli anpingli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. midstream/Dockerfile A Dockerfile.in sync is needed with midstream release/6.6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants