Skip to content

Conversation

@rhmdnd
Copy link
Collaborator

@rhmdnd rhmdnd commented Aug 21, 2024

Some remediations are more invasive than others, and make changes to the
cluster that require time to propagate through the system. Before the
suite starts running subsequent scans, we should wait for it to become
stable so that we know the remediations at least applied properly, or at
the very least didn't make things worse.

Some remediations are more invasive than others, and make changes to the
cluster that require time to propagate through the system. Before the
suite starts running subsequent scans, we should wait for it to become
stable so that we know the remediations at least applied properly, or at
the very least didn't make things worse.
@rhmdnd rhmdnd requested review from Vincent056 and yuumasato August 21, 2024 21:33
}

func (ctx *e2econtext) waitForStableCluster() error {
_, err := exec.Command("oc", "adm", "wait-for-stable-cluster", "--minimum-stable-period=2m").Output()
Copy link
Collaborator Author

@rhmdnd rhmdnd Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption here is that we don't care about the command output, just that it doesn't timeout waiting for a stable cluster.

Using a client library here instead would be nice because it might give us more useful error messages without having to parse raw output.

Copy link

@xiaojiey xiaojiey Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the testing for PR ComplianceAsCode/content#12220, the remediation took about 25-30 minutes for a 6 node cluster. Otherwise the ingress or apisever will be in updating status..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the cluster be modified to have a faster rollout? Machine config operate used to have such an option

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that's a significant increase in our testing times. I'll do some digging around to see if there is a way to speed this up.

@openshift-ci
Copy link

openshift-ci bot commented Sep 2, 2025

@rhmdnd: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-openshift-node-compliance 882969e link true /test e2e-aws-openshift-node-compliance
ci/prow/e2e-aws-openshift-platform-compliance 882969e link true /test e2e-aws-openshift-platform-compliance

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants