[WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults#30746
[WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults#30746wangke19 wants to merge 1 commit intoopenshift:mainfrom
Conversation
The TestTLSDefaults test was previously skipping when the cluster TLS profile was set to Modern (TLS 1.3). This change extends the test to support both Intermediate and Modern TLS profiles. Changes: - Replace the skip condition with a switch statement that handles both Intermediate and Modern profiles - For Intermediate profile: test TLS 1.2+ and cipher suites - For Modern profile: test TLS 1.3 only (cipher suites are not configurable in TLS 1.3) - Use a dynamic minTLSVersion variable based on the profile This ensures CI jobs configured with TLS 1.3 will properly test the TLS version behavior instead of skipping the test.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wangke19 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Scheduling required tests: |
|
@wangke19: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-aggregate periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-tls-13 |
|
/payload-aggregate periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13 |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13 |
|
@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/44c09590-01e0-11f1-99ae-f7f91100ef91-0 |
|
CI job https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30746-nightly-4.22-e2e-aws-ovn-tls-13/2019074402091536384: |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13 |
|
@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ad6ed650-01fa-11f1-880d-126cb2da3d50-0 |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13 |
|
@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3fb34a50-02a5-11f1-8110-7108862631ab-0 |
|
/retitle [WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults |
|
@wangke19: This pull request references Jira Issue OCPBUGS-76334, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Test Failure Analysis:
|
| Run ID | Ungraceful Terminations | TestTLSDefaults Execution | Time Gap |
|---|---|---|---|
| 2019122669349244928 | 19:56:19, 20:04:07 | 21:38:50 ✅ PASSED (5.3s) | 1h 34m - 1h 42m |
| 2019429245096300544 | 16:24:22 | 17:47:18 ✅ PASSED (2.3s) | 1h 23m |
| 2019074402091536384 | 16:48:52, 16:52:45, 16:57:14 | 18:11:20 ✅ PASSED (2.7s) | 1h 14m - 1h 23m |
Pattern: All ungraceful terminations occur during cluster bootstrap (16:00-20:00 range), well before test execution begins (18:11-21:38 range).
Detailed Timeline (Run 1 Example)
Timeline for run 2019122669349244928:
19:56:19 🔴 Pod kube-apiserver-ip-10-0-81-175 started (ungraceful termination detected)
20:04:07 🔴 Pod kube-apiserver-ip-10-0-76-239 started (ungraceful termination detected)
⏱️ [~1h 34m gap - cluster stabilization phase]
21:38:50 ✅ TestTLSDefaults starts (THIS PR)
21:38:55 ✅ TestTLSDefaults ends - PASSED
21:39:48 ✅ TestTLSMinimumVersions starts (pre-existing, also tests TLS 1.3)
21:39:59 ✅ TestTLSMinimumVersions ends - PASSED
22:03:01 ❌ Graceful termination test runs
22:03:04 ❌ Test FAILS - detects ungraceful terminations from 19:56 and 20:04
From the test failure output:
fail [github.com/openshift/origin/test/extended/apiserver/graceful_termination.go:89]:
The following API Servers weren't gracefully terminated:
kube-apiserver on node ip-10-0-76-239.us-west-2.compute.internal wasn't gracefully terminated,
reason: Previous pod kube-apiserver-ip-10-0-76-239.us-west-2.compute.internal started at
2026-02-04 20:04:07.028765438 +0000 UTC did not terminate gracefully
Key Findings
- ✅ TestTLSDefaults always passes in all 3 runs (5.3s, 2.3s, 2.7s)
- ✅ Temporal impossibility - failures occur 1-1.5 hours BEFORE the test runs
- ✅ TestTLSMinimumVersions pre-existed - TLS 1.3 connection testing was already happening (11.3s runtime, more intensive than TestTLSDefaults)
- ✅ Consistent pattern - all 3 runs show identical behavior: bootstrap failures detected later
- ❌ Real infrastructure issue - kube-apiserver pods are terminated ungracefully during cluster initialization
Root Cause: TLS 1.3 Cluster Bootstrap Issue
The nightly-4.22-e2e-aws-ovn-tls-13 CI job has a pre-existing cluster initialization problem where kube-apiserver pods don't terminate gracefully during bootstrap. This happens during:
- Initial cluster provisioning (16:00-20:00 time range)
- Before any e2e tests execute (tests start ~18:00-21:00)
- Likely causes: kubelet/CRI-O not respecting graceful termination during bootstrap, TLS configuration triggering rapid restarts, or resource pressure
Why this appears on this PR: The PR triggers the TLS 1.3 CI job via /payload-job command, which exposes the existing infrastructure issue. This job may not run frequently enough on main branch to detect the problem consistently.
Why This PR is Not the Cause
- Code changes are test-only - only modifies
TestTLSDefaultsto support TLS 1.3 instead of skipping - Test is read-only - no cluster modifications, only verification via port-forward and TLS connections
- Pre-existing TLS testing -
TestTLSMinimumVersionsalready performed similar (more intensive) TLS connection testing - Test passes successfully - if the test code was problematic, it would fail or cause issues during its execution, not 1.5 hours earlier
Recommendation
This PR can proceed to merge. The test failure is detecting a real cluster infrastructure problem that exists independently of this PR's changes.
Separate issue tracking: I recommend filing a bug against the TLS 1.3 CI environment for the cluster bootstrap ungraceful termination issue. The failure is legitimate - it's just not caused by this PR.
Related Tests
For reference, both TLS tests run successfully:
- TestTLSDefaults (this PR): Tests TLS versions and cipher suites via port-forward, ~5s runtime
- TestTLSMinimumVersions (pre-existing): Tests TLS versions across 7 components, ~11s runtime
Both tests work correctly and pass. The graceful termination test is designed to detect any ungraceful terminations that occurred at any point during cluster lifetime, including initialization phase.
Summary
The
TestTLSDefaultstest was previously skipping when the cluster TLS profile was set to Modern (TLS 1.3). This change extends the test to support both Intermediate and Modern TLS profiles, ensuring CI jobs configured with TLS 1.3 properly test the TLS version behavior.Changes
minTLSVersionvariable based on the profileProblem
CI jobs running with TLS 1.3 clusters (e.g.,
openshift-kubernetes-2315-ci-4.18-e2e-aws-ovn-tls-13) were seeing this test skip:Solution
The test now properly handles Modern TLS profile by:
Test Plan