Skip to content

Conversation

@4censord
Copy link
Contributor

@4censord 4censord commented May 2, 2025

Basically just changes the loops in Increase and Decrease to run in parallel using go routines.

So now instead of waiting until one instance is completely created on the cloudscale side (Servers.WaitFor) and only then continuing on, it creates all instances at the same time.

I'd assume one might run into rate limits with huge setups, but for us that hasn't happened yet.

@4censord 4censord force-pushed the 4c/create-in-parallel branch from 72b064c to dffe954 Compare May 2, 2025 13:52
@href
Copy link
Contributor

href commented May 6, 2025

Thank you for your PR. I took a look at it, and I think the direction is right, but there are some issues with the approach, due to backend constraints you could not be aware of:

The server API DELETE and POST calls are largely serial in the backend. If they arrive at the endpoint at the same time, they will mostly be worked on in a serialized fashion. This is partly an implementation detail, but also has to do with scheduling: The algorithm that places a VM in OpenStack considers one VM at a time, as scheduling multiple VMs in one call would be orders of magnituted harder from algorithm standpoint.

So if you were to time it, you would find that you do not gain much with concurrency.

What does matter is waiting for servers to be actually started (via the WaitFor call). So I suggest to update this PR as follows:

  • Create servers serially, then await them concurrently. Maybe the main thread can create the servers in a loop, send the resulting server instances to workers via a channel, then wait for all the workers to await the server.
  • Limit how many servers are created concurrently. Maybe if a channel is used to await the servers, the channel could be size-constrainted, so that the main loop would block if too many workers are already awaiting servers. I mean this mainly as a some kind of upper bound for sanity that should never be breached (e.g., I don't expect there to ever be more than 10 servers launched in parallel).
  • Do not use concurrency for delete requests. As noted, those are serial anyway, and in our experience deleting a number of servers serially, vs. concurrently does not make a difference in time. So the additional code complexity there is not worth it in my opinion.

Let me know what you think.

@4censord
Copy link
Contributor Author

4censord commented May 6, 2025

So I've implemented the “create in serial, await in parallel” approach, and reverted the deletion to serial.

But, instead of having a precreated/limited number of awaiters, it dynamically creates goroutines to wait for servers becoming ready.

Mostly because I dislike having an upper bound in the plugin, as these features are already present as part of the GitLab runner configuration (max_instances combined with scale_throttle.limit and scale_throttle.burst settings. 1 [2] )

Also, some pipelines we currently have more than 10 jobs that run in parallel, so that would slow these down.

[2]: https://docs.gitlab.com/runner/configuration/advanced-configuration/#the-runnersautoscalerscale_throttle-section )

@href
Copy link
Contributor

href commented May 7, 2025

Thanks, I think the approach looks fair now. Setting an upper bound borders a bit on the paranoid, so I think you make a good argument. Since server creation is serial now, I don't foresee the API being a bottleneck either: Checking the server state is fairly fast.

I didn't do a functional test yet, as I would like to do that in one go, together with the other PR, once that is read.

Copy link
Contributor

@href href left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some trouble getting this to work. It looks like in its current form the Increase call never returns.

@4censord 4censord force-pushed the 4c/create-in-parallel branch from 42c7958 to 311753d Compare May 9, 2025 06:46
@4censord 4censord force-pushed the 4c/create-in-parallel branch from 311753d to 6d34e38 Compare May 9, 2025 06:48
@href
Copy link
Contributor

href commented May 13, 2025

Thank you. I'll go ahead and merge this PR, and the other one once the tests pass. Afterward I want to create a beta release and integrate it into https://github.com/cloudscale-ch/gitlab-runner. This will give me some additional testing mileage and would also make it possible for you to run this already, pending a proper release.

However, quay.io currently throws HTTP 500 whenever I push a container, so this is currently slowing me down a bit. I'll wait and see first if this resolves itself, before maying publishing it elsewhere. Either way, I'm aiming to have at least the beta release this week, and a proper release by next week.

In the meantime, thank you for your contributions and your patience 😅

@href href merged commit 411ea6b into cloudscale-ch:main May 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants