mantle/platform/qemu: remove mutex; protect tearingDown; add helper function#4486
mantle/platform/qemu: remove mutex; protect tearingDown; add helper function#4486dustymabe merged 4 commits intocoreos:mainfrom
Conversation
RenderUserData() operates on read-only shared state (rconf, bf.baseopts) and local variables, making it safe for concurrent calls without synchronization. No other platform cluster (AWS, GCP, Azure, DO, ESX, OpenStack) uses a mutex around RenderUserData either. Remove the mu sync.Mutex field from both the qemu and qemuiso Cluster structs and the corresponding Lock/Unlock calls around RenderUserData, along with the now-unused sync import. This code was originally introduced in 67b9c0d (which was a large commit) and most likely was an artifact of the development process.. OR maybe it was needed at one time, but current analysis seems to think not. Assisted-By: <anthropic/claude-opus-4.6>
The tearingDown field is written in Destroy() from the main goroutine and read from a background goroutine spawned in NewMachineWithQemuOptions() that waits for the QEMU process to exit. This unsynchronized access constitutes a data race detectable by Go's race detector. Switch tearingDown from bool to sync/atomic.Bool and use Store()/Load() to elminate the race. Assisted-By: <anthropic/claude-opus-4.6>
This will render the userdata and write the resulting Ignition file out to the working `dir`. Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
This is one step towards merging in `kola testiso` into `kola run` [1] since the testiso tests usually provide a config directly rather than the userdata that then needs to be rendered. [1] coreos#3989 Co-authored-by: Nikita Dubrovskii <nikita@linux.ibm.com>
|
@jlebon really interested in your thoughts here (at least on the safety of the first commit). |
There was a problem hiding this comment.
Code Review
This pull request correctly replaces a mutex with a more efficient atomic.Bool for the tearingDown flag and removes an unnecessary mutex from qemuiso/cluster.go. The extraction of user data rendering and file writing logic into the new RenderUserDataAndWriteIgnitionFileToDir helper function in qemu/cluster.go is a good structural improvement. I have one suggestion to enhance the readability of this new function.
| if config != nil { | ||
| if config.IsIgnition() { | ||
| configPath = filepath.Join(dir, "ignition.json") | ||
| if err = config.WriteFile(configPath); err != nil { | ||
| return nil, "", err | ||
| } | ||
| } else if !config.IsEmpty() { | ||
| return nil, "", fmt.Errorf("qemu only supports Ignition or empty configs") | ||
| } | ||
| } | ||
| return config, configPath, nil |
There was a problem hiding this comment.
The logic to handle the rendered configuration can be simplified to improve readability by using early returns and reducing nesting. This makes the control flow flatter and easier to follow.
if config == nil {
return nil, "", nil
}
if !config.IsIgnition() {
if !config.IsEmpty() {
return nil, "", fmt.Errorf("qemu only supports Ignition or empty configs")
}
return config, "", nil
}
configPath := filepath.Join(dir, "ignition.json")
if err := config.WriteFile(configPath); err != nil {
return nil, "", err
}
return config, configPath, nil|
Ignore the linter error here. I'll fix it when I fix the repo-templates PR and get it merged. |
nikita-dubrovskii
left a comment
There was a problem hiding this comment.
LGTM. Ignoring gemini suggestions since this PR is based on #4377, and introducing changes there could make the rebase more difficult.
For the record this is the analysis that Claude/Opus 4.6 produced on the topic of safety of removing the mutex: RenderUserData (mantle/platform/cluster.go:190-283) operates primarily on:
No other platform (AWS, GCP, Azure, DigitalOcean, ESX, OpenStack) locks around RenderUserData, which is evidence that the community considers it concurrency-safe. Conclusion on RenderUserData:The mutex was not protecting against a real race. The function reads immutable configuration and produces independent *Conf values. Concurrent calls from NewMachines() (platform.go:388-400), which spawns n goroutines each calling NewMachineWithOptions, are safe without the lock. |
|
I traced this down to coreos/mantle#104. If you look at what it was protecting at the time, AFAICT none of those resources are relevant today. Most of them were dropped when we dropped the privileged qemu path, and the |
Nice. I didn't consider the old repo.
Thank you for the sanity check! |
See individual commit messages. These are essentially prep work for #3989 and some of it was broken out from the larger effort over in #4377