Skip to content

mantle/platform/qemu: remove mutex; protect tearingDown; add helper function#4486

Merged
dustymabe merged 4 commits intocoreos:mainfrom
dustymabe:dusty-testiso-prep
Mar 18, 2026
Merged

mantle/platform/qemu: remove mutex; protect tearingDown; add helper function#4486
dustymabe merged 4 commits intocoreos:mainfrom
dustymabe:dusty-testiso-prep

Conversation

@dustymabe
Copy link
Member

See individual commit messages. These are essentially prep work for #3989 and some of it was broken out from the larger effort over in #4377

dustymabe and others added 4 commits March 17, 2026 16:05
RenderUserData() operates on read-only shared state (rconf, bf.baseopts)
and local variables, making it safe for concurrent calls without
synchronization. No other platform cluster (AWS, GCP, Azure, DO, ESX,
OpenStack) uses a mutex around RenderUserData either.

Remove the mu sync.Mutex field from both the qemu and qemuiso Cluster
structs and the corresponding Lock/Unlock calls around RenderUserData,
along with the now-unused sync import.

This code was originally introduced in 67b9c0d (which was a large commit)
and most likely was an artifact of the development process.. OR maybe
it was needed at one time, but current analysis seems to think not.

Assisted-By: <anthropic/claude-opus-4.6>
The tearingDown field is written in Destroy() from the main goroutine and
read from a background goroutine spawned in NewMachineWithQemuOptions()
that waits for the QEMU process to exit. This unsynchronized access
constitutes a data race detectable by Go's race detector.

Switch tearingDown from bool to sync/atomic.Bool and use Store()/Load()
to elminate the race.

Assisted-By: <anthropic/claude-opus-4.6>
This will render the userdata and write the resulting Ignition file
out to the working `dir`.

Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
This is one step towards merging in `kola testiso` into `kola run` [1]
since the testiso tests usually provide a config directly rather than
the userdata that then needs to be rendered.

[1] coreos#3989

Co-authored-by: Nikita Dubrovskii <nikita@linux.ibm.com>
@dustymabe
Copy link
Member Author

@jlebon really interested in your thoughts here (at least on the safety of the first commit).

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly replaces a mutex with a more efficient atomic.Bool for the tearingDown flag and removes an unnecessary mutex from qemuiso/cluster.go. The extraction of user data rendering and file writing logic into the new RenderUserDataAndWriteIgnitionFileToDir helper function in qemu/cluster.go is a good structural improvement. I have one suggestion to enhance the readability of this new function.

Comment on lines +266 to +276
if config != nil {
if config.IsIgnition() {
configPath = filepath.Join(dir, "ignition.json")
if err = config.WriteFile(configPath); err != nil {
return nil, "", err
}
} else if !config.IsEmpty() {
return nil, "", fmt.Errorf("qemu only supports Ignition or empty configs")
}
}
return config, configPath, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to handle the rendered configuration can be simplified to improve readability by using early returns and reducing nesting. This makes the control flow flatter and easier to follow.

if config == nil {
	return nil, "", nil
}

if !config.IsIgnition() {
	if !config.IsEmpty() {
		return nil, "", fmt.Errorf("qemu only supports Ignition or empty configs")
	}
	return config, "", nil
}

configPath := filepath.Join(dir, "ignition.json")
if err := config.WriteFile(configPath); err != nil {
	return nil, "", err
}
return config, configPath, nil

@dustymabe
Copy link
Member Author

Ignore the linter error here. I'll fix it when I fix the repo-templates PR and get it merged.

Copy link
Contributor

@nikita-dubrovskii nikita-dubrovskii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ignoring gemini suggestions since this PR is based on #4377, and introducing changes there could make the rebase more difficult.

@dustymabe
Copy link
Member Author

@jlebon really interested in your thoughts here (at least on the safety of the first commit).

For the record this is the analysis that Claude/Opus 4.6 produced on the topic of safety of removing the mutex:


RenderUserData (mantle/platform/cluster.go:190-283) operates primarily on:

  • Read-only shared state: bc.rconf, bc.bf.baseopts, bc.Distribution() — these are set at initialization and never mutated during the test lifecycle.
  • Local variables: confSources, conf, userdata (reassigned locally via Subst(), which returns a new value).
  • File reads: os.ReadFile() on AppendButane/AppendIgnition paths — inherently safe for concurrent reads.
  • bc.bf.Keys(): would need verification, but key generation typically also reads immutable state.

No other platform (AWS, GCP, Azure, DigitalOcean, ESX, OpenStack) locks around RenderUserData, which is evidence that the community considers it concurrency-safe.

Conclusion on RenderUserData:

The mutex was not protecting against a real race. The function reads immutable configuration and produces independent *Conf values. Concurrent calls from NewMachines() (platform.go:388-400), which spawns n goroutines each calling NewMachineWithOptions, are safe without the lock.

@jlebon
Copy link
Member

jlebon commented Mar 18, 2026

I traced this down to coreos/mantle#104. If you look at what it was protecting at the time, AFAICT none of those resources are relevant today. Most of them were dropped when we dropped the privileged qemu path, and the machines map mutations moved to BaseCluster which has its own mutex. So yeah, I think it's safe to drop it.

@dustymabe
Copy link
Member Author

I traced this down to coreos/mantle#104.

Nice. I didn't consider the old repo.

So yeah, I think it's safe to drop it.

Thank you for the sanity check!

@dustymabe dustymabe merged commit 9467ee2 into coreos:main Mar 18, 2026
5 of 6 checks passed
@dustymabe dustymabe deleted the dusty-testiso-prep branch March 18, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants