Skip to content

Conversation

@jamand
Copy link
Contributor

@jamand jamand commented Nov 13, 2025

How to categorize this PR?

/area quality
/kind enhancement
/platform openstack

What this PR does / why we need it:

This PR implements the GetMachineStatus method of the mcm-provider-openstack, which is optional as per the Machine Error code handling (but which is implemented for other providers like aws or gcp).

Working on GEP-28 (self-hosted shoots, managed infra scenario) we noticed that the control plane worker node never became ready due to the logic in the mcm not adding the required node label on the Machine objects. Would the GetMachineStatus have been implemented it would have done so. As the implementation is optional, @maboehm implemented a fix in the gardener/machine-controller-manager#1050 itself. The implementation of the GetMachineStatus is now more of a "nice to have" which could potentially reduce some loads on the OpenStack API Server by doing a lookup by provider ID first if provided and then by name instead of having to call Create of the driver which has to fetch the list of all Servers in the project every time.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Tested in Self-hosted Managed infrastructure flow. Requires running e2e.

Release note:

Implement the optional GetMachineStatus driver method

@gardener-robot gardener-robot added area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/enhancement Enhancement, improvement, extension platform/openstack OpenStack platform/infrastructure needs/review Needs review size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 13, 2025
@jamand jamand changed the title Feat/implement get machine status feat: Implement optional driver GetMachineStatus method Nov 13, 2025
@jamand jamand force-pushed the feat/implement-GetMachineStatus branch 2 times, most recently from b8e2bff to ae88578 Compare November 13, 2025 12:06
@jamand jamand marked this pull request as ready for review November 21, 2025 09:03
@jamand jamand requested review from a team as code owners November 21, 2025 09:03
@jamand jamand force-pushed the feat/implement-GetMachineStatus branch from ae88578 to 5e609d1 Compare December 4, 2025 12:09
@aaronfern aaronfern self-assigned this Dec 9, 2025
Copy link
Member

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Can you please explain a little more about the backing reason for this PR?
Will GetmachineStatus now need to be implemented for all providers for GEP-28?

func (ex *Executor) getMachineByID(ctx context.Context, serverID string) (*servers.Server, error) {
klog.V(2).Infof("finding server with [ID=%q]", serverID)
// GetMachineByID fetches the data for a server based on a provider-encoded ID.
func (ex *Executor) GetMachineByID(ctx context.Context, providerID string) (*servers.Server, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the name of this method to GetMachineByProviderID so that it's clear?

Copy link
Member

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also run integration tests and add those logs here? (Please reach out if you need assistance with that)

// to store tags in a respective field and do a server-side filtering. To avoid incompatibility with older versions
// we will continue making the filtering clientside.
func (ex *Executor) getMachineByName(ctx context.Context, machineName string) (*servers.Server, error) {
func (ex *Executor) GetMachineByName(ctx context.Context, machineName string) (*servers.Server, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update this method to also return providerID? You can construct it using encodeProviderID()
You can then use this provider id in the GetMachineStatusResponse of GetMachineStatus()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get the exact change you want, GetMachineByName returns the machine (similar to GetMachineByProviderID) from which I in return extract the providerID of the machine that was found, so it is already part of the returned Server object, and used in the GetMachineStatusResponse.

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Dec 11, 2025
@jamand
Copy link
Contributor Author

jamand commented Dec 12, 2025

Hi @aaronfern! Thanks for the first review. I'll check your comments and do the testing.

For the motivation: Working on GEP-28 (self-hosted shoots, managed infra scenario) we noticed that the control plane worker node never became ready due to the logic in the mcm not adding the required node label on the Machine objects. Would the GetMachineStatus have been implemented it would have done so. But as per Machine Error code handling the implementation of this method should be optional, so @maboehm implemented a fix in the gardener/machine-controller-manager#1050 itself.

I noticed thatGetMachineStatus was implemented in the aws and gcp provider, so in a first step to get the self-hosted flow running locally, I implemented the status method and verified that the control plane machine became ready. So with the mcm fix merged, this is now more of a "nice to have", as I also refactored the code that is duplicated between all the driver methods.

I can also update the PR description soon.

@jamand jamand force-pushed the feat/implement-GetMachineStatus branch from 5e609d1 to ed75679 Compare December 12, 2025 11:40
@aaronfern
Copy link
Member

Hey @jamand
Thanks for the explanation! 😄
Sure, this is fine then, please address the review comments and we'll review it again

I can also update the PR description soon.

Yes, please, that would be really helpful

@jamand
Copy link
Contributor Author

jamand commented Dec 19, 2025

@aaronfern I added comments and updated the PR description. I'll be on holidays the next weeks. I haven't been able to run the integration tests on our environment yet unfortunately, is there a possibility to run it on the sap-landscape-dev directly?
Maybe through @timebertt who will also be back at the same time as me and who is Gardener maintainer?

@aaronfern
Copy link
Member

Thank you @jamand, I will re-review your PR

I'll be on holidays the next weeks.

Sure, not a problem, if there are still some more concerns or changes, you can take it up when you're back

is there a possibility to run it on the sap-landscape-dev directly?

Yes, but it is still a manual process. The make test-integration target runs a script that asks for inputs such as project name and seed/shoot name, and this runs the IT on a shoot in the dev landscape.
If you still face issues let me know, and I will run them for you

Copy link
Contributor

@kon-angelo kon-angelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamand I am actually kinda opposed to the PR.

When I made the initial commits, I decided explicitly to skip enabling the optional GetMachineStatus. The reason was that back then we didn't have the InitializeMachine and because of that, things that need to be done post server creation had the chance that if they failed, then a GetMachineStatus like this implementation would leave machines potentially undeleted but also uninitialized.

You would have to refactor everything that happens after the server creation to the InitializeMachine call and even that I consider inferior in the sense that with the GetMachineStatus unimplemented, the CreateMachine call is a full reconciliation of the machine rather than an imperative call for certain operations. I would still think it is okay if you decide to go the InitializeMachine for consistency reasons but not sure if it's an improvement. 🤷

@kon-angelo
Copy link
Contributor

With regards to

The implementation of the GetMachineStatus is now more of a "nice to have" which could potentially reduce some loads on the OpenStack API Server by doing a lookup by provider ID first if provided and then by name instead of having to call Create of the driver which has to fetch the list of all Servers in the project every time.

is it that MCM calls the list everytime before making the create call or is there some kind of list call there that I forgot 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/enhancement Enhancement, improvement, extension needs/changes Needs (more) changes needs/review Needs review platform/openstack OpenStack platform/infrastructure size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants