-
Notifications
You must be signed in to change notification settings - Fork 110
Fix race condition causing sshd start failure during provisioning #460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ubuntu-jammy
Are you sure you want to change the base?
Conversation
* Run first-boot tasks via systemd so sshd never races with host-key regeneration. The old `rc.local` script ran after network.target, but in parallel with other regular system services, like ssh.service. Therefore, ssh.service often started (and restarted) while `/root/firstboot.sh` was deleting keys. cloud-init’s set-passwords module made this worse by restarting ssh mid-run. * Replace `rc.local` with a oneshot firstboot.service (delete keys, create new keys, reconfigure sysstat) that runs Before=ssh.service and leaves the `/root/firstboot_done` file as a marker. * Add a cloud-config.service drop-in so cloud-init's config stage waits for firstboot.service, and * Update walinuxagent.service to wait for firstboot.service, ensuring ssh keys have been regenerated. This guarantees sshd, cloud-init, and WALinuxAgent all start only after the first-boot tasks succeed.
|
Warning It's important to be aware that this change could affect how the ssh service behaves. If the firstboot script was intended only for host key regeneration, using the |
|
we should not introduce this within jammy. we currently have similar issues on noble as well as we have set bosh-agent to use systemd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a race condition where SSH daemon could start before host keys are regenerated during first boot, causing provisioning failures. The fix replaces the rc.local-based firstboot mechanism with a proper systemd service that establishes explicit ordering dependencies.
Key Changes
- Introduces firstboot.service (oneshot systemd unit) that runs before ssh.service to regenerate host keys and configure sysstat
- Removes the legacy rc.local script and firstboot.sh in favor of systemd-native orchestration
- Updates walinuxagent.service to depend on firstboot.service completion instead of polling for the marker file
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
stemcell_builder/stages/base_ubuntu_firstboot/assets/etc/systemd/system/firstboot.service |
New systemd oneshot service that deletes old SSH keys, generates new ones, and reconfigures sysstat before SSH starts |
stemcell_builder/stages/base_ubuntu_firstboot/assets/etc/rc.local |
Removed legacy rc.local script that previously executed firstboot tasks |
stemcell_builder/stages/base_ubuntu_firstboot/assets/root/firstboot.sh |
Removed shell script containing firstboot logic, now handled by systemd service |
stemcell_builder/stages/base_ubuntu_firstboot/apply.sh |
Updated to install and enable the new firstboot.service instead of copying rc.local and firstboot.sh scripts |
stemcell_builder/stages/system_azure_init/assets/etc/systemd/system/cloud-config.service.d/firstboot-blocker.conf |
New drop-in configuration ensuring cloud-init waits for firstboot.service completion |
stemcell_builder/stages/system_azure_init/assets/etc/waagent/walinuxagent.service |
Replaced polling loop with proper systemd dependency on firstboot.service |
stemcell_builder/stages/system_azure_init/apply.sh |
Added installation of cloud-config.service drop-in to ensure proper ordering |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [Unit] | ||
| Description=Run first boot tasks | ||
| ConditionPathExists=!/root/firstboot_done | ||
| Before=ssh.service |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding "Before=sshd-keygen.service" to ensure firstboot.service completes key regeneration before the system's default SSH key generation service (sshd-keygen.service) attempts to run. While both services likely check for existing keys, explicitly ordering them would prevent any potential race condition between the two key generation mechanisms.
| Before=ssh.service | |
| Before=sshd-keygen.service ssh.service |
| ExecStart=/usr/bin/ssh-keygen -A -v | ||
| ExecStartPost=/usr/sbin/dpkg-reconfigure -fnoninteractive sysstat | ||
| ExecStartPost=/usr/bin/touch /root/firstboot_done | ||
| RemainAfterExit=yes | ||
|
|
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ssh-keygen command in ExecStart fails, the ExecStartPost commands (including the creation of the marker file) will still execute. This could leave the system in an inconsistent state where firstboot_done exists but SSH keys were never generated. Consider moving the marker file creation to a separate ExecStartPost command that only runs on success, or use a more robust approach to ensure the service fails if key generation fails.
| ExecStart=/usr/bin/ssh-keygen -A -v | |
| ExecStartPost=/usr/sbin/dpkg-reconfigure -fnoninteractive sysstat | |
| ExecStartPost=/usr/bin/touch /root/firstboot_done | |
| RemainAfterExit=yes | |
| ExecStart=/bin/sh -c '/usr/bin/ssh-keygen -A -v && /usr/bin/touch /root/firstboot_done' | |
| ExecStartPost=/usr/sbin/dpkg-reconfigure -fnoninteractive sysstat | |
| RemainAfterExit=yes |
rc.localscript ran after network.target, but in parallel with other regular system services, like ssh.service. Therefore, ssh.service often started (and restarted) while/root/firstboot.shwas deleting keys. cloud-init’s set-passwords module made this worse by restarting ssh mid-run.rc.localwith a oneshot firstboot.service (delete keys, create new keys, reconfigure sysstat) that runs Before=ssh.service and leaves the/root/firstboot_donefile as a marker.Resolves #458