Skip to content

[WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults#30746

Open
wangke19 wants to merge 1 commit intoopenshift:mainfrom
wangke19:tls13-testdefaults-support
Open

[WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults#30746
wangke19 wants to merge 1 commit intoopenshift:mainfrom
wangke19:tls13-testdefaults-support

Conversation

@wangke19
Copy link
Contributor

@wangke19 wangke19 commented Feb 2, 2026

Summary

The TestTLSDefaults test was previously skipping when the cluster TLS profile was set to Modern (TLS 1.3). This change extends the test to support both Intermediate and Modern TLS profiles, ensuring CI jobs configured with TLS 1.3 properly test the TLS version behavior.

Changes

  • Replace the skip condition with a switch statement that handles both Intermediate and Modern profiles
  • For Intermediate profile: test TLS 1.2+ and cipher suites
  • For Modern profile: test TLS 1.3 only (cipher suites are not configurable in TLS 1.3)
  • Use a dynamic minTLSVersion variable based on the profile

Problem

CI jobs running with TLS 1.3 clusters (e.g., openshift-kubernetes-2315-ci-4.18-e2e-aws-ovn-tls-13) were seeing this test skip:

[sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel]
Reason: skip [github.com/openshift/origin/test/extended/apiserver/tls.go:126]: 
Cluster TLS profile is not default (intermediate), skipping cipher defaults check

Solution

The test now properly handles Modern TLS profile by:

  1. Testing that only TLS 1.3 connections succeed
  2. Testing that TLS 1.0, 1.1, and 1.2 connections fail
  3. Skipping cipher suite testing (not applicable to TLS 1.3)

Test Plan

  • Verify test compiles without errors
  • Run test on Intermediate profile cluster (existing behavior preserved)
  • Run test on Modern profile cluster (new behavior)

The TestTLSDefaults test was previously skipping when the cluster
TLS profile was set to Modern (TLS 1.3). This change extends the
test to support both Intermediate and Modern TLS profiles.

Changes:
- Replace the skip condition with a switch statement that handles
  both Intermediate and Modern profiles
- For Intermediate profile: test TLS 1.2+ and cipher suites
- For Modern profile: test TLS 1.3 only (cipher suites are not
  configurable in TLS 1.3)
- Use a dynamic minTLSVersion variable based on the profile

This ensures CI jobs configured with TLS 1.3 will properly test
the TLS version behavior instead of skipping the test.
@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wangke19
Once this PR has been reviewed and has the lgtm label, please assign petr-muller for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from deads2k and sjenning February 2, 2026 10:07
@wangke19 wangke19 changed the title Add TLS 1.3 (Modern profile) support to TestTLSDefaults [WIP]Add TLS 1.3 (Modern profile) support to TestTLSDefaults Feb 2, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 2, 2026
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 2, 2026

@wangke19: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-aggregate periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-aggregate periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/44c09590-01e0-11f1-99ae-f7f91100ef91-0

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

CI job https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30746-nightly-4.22-e2e-aws-ovn-tls-13/2019074402091536384:
passed: (2.7s) 2026-02-04T18:11:23 "[sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel]"

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ad6ed650-01fa-11f1-880d-126cb2da3d50-0

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 5, 2026

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3fb34a50-02a5-11f1-8110-7108862631ab-0

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 6, 2026

/retitle [WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults

@openshift-ci openshift-ci bot changed the title [WIP]Add TLS 1.3 (Modern profile) support to TestTLSDefaults [WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults Feb 6, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 6, 2026
@openshift-ci-robot
Copy link

@wangke19: This pull request references Jira Issue OCPBUGS-76334, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

The TestTLSDefaults test was previously skipping when the cluster TLS profile was set to Modern (TLS 1.3). This change extends the test to support both Intermediate and Modern TLS profiles, ensuring CI jobs configured with TLS 1.3 properly test the TLS version behavior.

Changes

  • Replace the skip condition with a switch statement that handles both Intermediate and Modern profiles
  • For Intermediate profile: test TLS 1.2+ and cipher suites
  • For Modern profile: test TLS 1.3 only (cipher suites are not configurable in TLS 1.3)
  • Use a dynamic minTLSVersion variable based on the profile

Problem

CI jobs running with TLS 1.3 clusters (e.g., openshift-kubernetes-2315-ci-4.18-e2e-aws-ovn-tls-13) were seeing this test skip:

[sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel]
Reason: skip [github.com/openshift/origin/test/extended/apiserver/tls.go:126]: 
Cluster TLS profile is not default (intermediate), skipping cipher defaults check

Solution

The test now properly handles Modern TLS profile by:

  1. Testing that only TLS 1.3 connections succeed
  2. Testing that TLS 1.0, 1.1, and 1.2 connections fail
  3. Skipping cipher suite testing (not applicable to TLS 1.3)

Test Plan

  • Verify test compiles without errors
  • Run test on Intermediate profile cluster (existing behavior preserved)
  • Run test on Modern profile cluster (new behavior)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 6, 2026

Test Failure Analysis: kubelet terminates kube-apiserver gracefully extended

TL;DR

The test failure is NOT caused by this PR's code changes. The ungraceful kube-apiserver terminations occur during cluster initialization, 1-1.5 hours before the TestTLSDefaults test runs. This is a pre-existing infrastructure issue with the TLS 1.3 CI environment.


Evidence from 3 Failed Runs

I analyzed all three failed CI runs in detail:

Run ID Ungraceful Terminations TestTLSDefaults Execution Time Gap
2019122669349244928 19:56:19, 20:04:07 21:38:50 ✅ PASSED (5.3s) 1h 34m - 1h 42m
2019429245096300544 16:24:22 17:47:18 ✅ PASSED (2.3s) 1h 23m
2019074402091536384 16:48:52, 16:52:45, 16:57:14 18:11:20 ✅ PASSED (2.7s) 1h 14m - 1h 23m

Pattern: All ungraceful terminations occur during cluster bootstrap (16:00-20:00 range), well before test execution begins (18:11-21:38 range).


Detailed Timeline (Run 1 Example)

Timeline for run 2019122669349244928:

19:56:19 🔴 Pod kube-apiserver-ip-10-0-81-175 started (ungraceful termination detected)
20:04:07 🔴 Pod kube-apiserver-ip-10-0-76-239 started (ungraceful termination detected)
         ⏱️  [~1h 34m gap - cluster stabilization phase]
21:38:50 ✅ TestTLSDefaults starts (THIS PR)
21:38:55 ✅ TestTLSDefaults ends - PASSED
21:39:48 ✅ TestTLSMinimumVersions starts (pre-existing, also tests TLS 1.3)
21:39:59 ✅ TestTLSMinimumVersions ends - PASSED
22:03:01 ❌ Graceful termination test runs
22:03:04 ❌ Test FAILS - detects ungraceful terminations from 19:56 and 20:04

From the test failure output:

fail [github.com/openshift/origin/test/extended/apiserver/graceful_termination.go:89]: 
The following API Servers weren't gracefully terminated: 
 kube-apiserver on node ip-10-0-76-239.us-west-2.compute.internal wasn't gracefully terminated, 
 reason: Previous pod kube-apiserver-ip-10-0-76-239.us-west-2.compute.internal started at 
 2026-02-04 20:04:07.028765438 +0000 UTC did not terminate gracefully

Key Findings

  1. TestTLSDefaults always passes in all 3 runs (5.3s, 2.3s, 2.7s)
  2. Temporal impossibility - failures occur 1-1.5 hours BEFORE the test runs
  3. TestTLSMinimumVersions pre-existed - TLS 1.3 connection testing was already happening (11.3s runtime, more intensive than TestTLSDefaults)
  4. Consistent pattern - all 3 runs show identical behavior: bootstrap failures detected later
  5. Real infrastructure issue - kube-apiserver pods are terminated ungracefully during cluster initialization

Root Cause: TLS 1.3 Cluster Bootstrap Issue

The nightly-4.22-e2e-aws-ovn-tls-13 CI job has a pre-existing cluster initialization problem where kube-apiserver pods don't terminate gracefully during bootstrap. This happens during:

  • Initial cluster provisioning (16:00-20:00 time range)
  • Before any e2e tests execute (tests start ~18:00-21:00)
  • Likely causes: kubelet/CRI-O not respecting graceful termination during bootstrap, TLS configuration triggering rapid restarts, or resource pressure

Why this appears on this PR: The PR triggers the TLS 1.3 CI job via /payload-job command, which exposes the existing infrastructure issue. This job may not run frequently enough on main branch to detect the problem consistently.


Why This PR is Not the Cause

  1. Code changes are test-only - only modifies TestTLSDefaults to support TLS 1.3 instead of skipping
  2. Test is read-only - no cluster modifications, only verification via port-forward and TLS connections
  3. Pre-existing TLS testing - TestTLSMinimumVersions already performed similar (more intensive) TLS connection testing
  4. Test passes successfully - if the test code was problematic, it would fail or cause issues during its execution, not 1.5 hours earlier

Recommendation

This PR can proceed to merge. The test failure is detecting a real cluster infrastructure problem that exists independently of this PR's changes.

Separate issue tracking: I recommend filing a bug against the TLS 1.3 CI environment for the cluster bootstrap ungraceful termination issue. The failure is legitimate - it's just not caused by this PR.


Related Tests

For reference, both TLS tests run successfully:

  • TestTLSDefaults (this PR): Tests TLS versions and cipher suites via port-forward, ~5s runtime
  • TestTLSMinimumVersions (pre-existing): Tests TLS versions across 7 components, ~11s runtime

Both tests work correctly and pass. The graceful termination test is designed to detect any ungraceful terminations that occurred at any point during cluster lifetime, including initialization phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants