[occm]: fix openstack getServerByName return error code#3083
[occm]: fix openstack getServerByName return error code#3083tyiying wants to merge 1 commit intokubernetes:masterfrom
Conversation
- Additional notes:
|
|
|
Hi @tyiying. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Additional notes:
What this PR does / why we need it:
after upgraded occm version from v1.29 to v1.32, we found occm cannot delete node from k8s node list anymore after we remove the node from openstack
Which issue this PR fixes(if applicable):
fixes #2999 - we have fixed this issue in our product using nokia repo, but would like to fix it in the upstream.
Special notes for reviewers:
After delete the node from openstack, we expect occm delete the node from k8s node list. But we still see the node in the node list:
[cloud-admin@hotel-ncs03-control-01 ~]$ sudo kubectl get node
Enter login password:
NAME STATUS ROLES AGE VERSION
hotel-ncs03-control-01 Ready 44h v1.32.8
hotel-ncs03-control-02 Ready 44h v1.32.8
hotel-ncs03-control-03 Ready 44h v1.32.8
hotel-ncs03-edge-01 Ready 44h v1.32.8
hotel-ncs03-edge-02 Ready 44h v1.32.8
hotel-ncs03-storage-01 NotReady 44h v1.32.8 <= this should be gone
hotel-ncs03-storage-02 Ready 44h v1.32.8
hotel-ncs03-storage-03 Ready 44h v1.32.8
hotel-ncs03-worker-01 Ready 44h v1.32.8
hotel-ncs03-worker-02 Ready 44h v1.32.8
[cloud-admin@hotel-ncs03-control-01 ~]$
Here are the logs we see:
E0922 11:38:01.217813 1 node_lifecycle_controller.go:156] error checking if node hotel-ncs03-storage-01 exists: failed to find object
I did a code comparison between 1.29 and 1.32, I can see the return code from getInstance() function was causing the different behavior:
The return code is expected to be cloudprovider.InstanceNotFound for both ccm 1.29 and ccm 1.32
// InstanceExists indicates whether a given node exists according to the cloud provider
func (i *InstancesV2) InstanceExists(ctx context.Context, node *v1.Node) (bool, error) {
_, err := i.getInstance(ctx, node)
if err == cloudprovider.InstanceNotFound {
klog.V(6).Infof("instance not found for node: %s", node.Name)
return false, nil
}
Release note: