Skip to content

Conversation

@fok666
Copy link
Owner

@fok666 fok666 commented Jan 18, 2026

Summary

This PR fixes the pipeline failures in the create-manifest job by improving reliability and error handling.

Changes

  • Remove continue-on-error from build-push step to properly catch and report failures
  • Add verification step after image push with retry logic (3 attempts, 10s intervals)
  • Add retry logic to manifest creation with image existence check (3 attempts, 15s delays)
  • Wait for images to propagate in registry before creating manifests

Problem Solved

The pipeline was failing because:

  1. Push failures were being silently ignored due to continue-on-error
  2. Manifest creation was attempting to combine images before they were fully available in the registry
  3. No retry logic to handle transient registry issues

Testing

The changes ensure:

  • Images are verified to exist before proceeding
  • Transient failures are handled with automatic retries
  • Clear error messages when operations fail
  • Proper job failure if images cannot be pushed or manifests cannot be created

Fernando Korndorfer and others added 4 commits January 18, 2026 16:26
Added a comprehensive profiles matrix documenting the five available configurations:
- full: Complete toolset with all available tools
- minimal: Lightweight profile with essential tools only
- k8s: Kubernetes-focused profile
- iac: Infrastructure as Code profile with bash-based tools
- iac-pwsh: Infrastructure as Code profile with PowerShell support
- Fix CWE-78: Quote variables in seq command to prevent command injection
- Fix CWE-78: Add proper quoting for EventId in vmss_monitor.sh
- Fix deprecated apt-key usage, replace with gpg --dearmor
- Fix insecure curl piping to bash, download scripts first
- Add security documentation for privileged mode and NOPASSWD sudo
- Improve curl error handling with standardized flags
- Quote sensitive variables to prevent word splitting
- Use jq to safely construct JSON payload in vmss_monitor.sh to properly handle special characters in EventId
- Add -L flag to curl for Kustomize download to follow redirects
…ogic

- Remove continue-on-error from build-push step to catch failures
- Add verification step after image push with retry logic
- Add retry logic to manifest creation with image existence check
- Wait for images to propagate in registry before creating manifests
@gemini-code-assist
Copy link

Summary of Changes

Hello @fok666, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the reliability and error handling of the CI/CD pipeline, specifically addressing issues in the image build, push, and manifest creation processes. By introducing explicit error catching, image verification, and retry mechanisms, it mitigates transient failures and ensures that images are fully propagated before being used, leading to a more stable and predictable pipeline. Additionally, it includes minor script improvements for robustness and documentation updates for clarity.

Highlights

  • Pipeline Reliability: Enhanced pipeline reliability by removing 'continue-on-error' from the build-push step, ensuring proper failure reporting instead of silently ignoring issues.
  • Image Verification and Retries: Introduced robust image verification with retry logic (3 attempts, 10s intervals) after image pushes to confirm availability in the registry.
  • Manifest Creation Retries: Implemented retry logic for manifest creation, including an image existence check (3 attempts, 15s delays), to handle transient registry issues and ensure images are fully propagated before use.
  • Dockerfile Tool Installation Improvements: Updated installation methods for Azure CLI, Terraspace, and Kustomize in the Dockerfile to improve robustness and security, often by downloading scripts to temporary files before execution.
  • Security Notes Added: Added explicit security notes to the Dockerfile regarding NOPASSWD:ALL for sudo and to run.sh explaining the use and risks of --privileged Docker mode.
  • Shell Script Robustness: Improved general shell script robustness by consistently quoting variables in run.sh and start.sh, and enhancing JSON payload handling using jq in vmss_monitor.sh.
  • Documentation Update: Added a new 'Available Profiles' section to the README.md to clearly document pre-configured toolsets for different runner use cases.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/docker-image.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@fok666 fok666 self-assigned this Jan 18, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the reliability and security of the Docker image build process and the runner scripts. The changes to avoid curl | bash, replace the deprecated apt-key, and properly quote variables in shell scripts are excellent. Using jq to construct JSON payloads is also a great improvement for robustness. The added documentation and security notes are very helpful. Overall, these are high-quality changes that make the project more secure and maintainable.

Comment on lines +112 to +114
curl -sLS "https://aka.ms/InstallAzureCLIDeb" -o /tmp/install-azure-cli.sh \
&& bash /tmp/install-azure-cli.sh \
&& rm /tmp/install-azure-cli.sh \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This is a great security improvement. Piping curl directly to bash can be risky as it executes remote code without inspection. Downloading the script to a temporary file before executing it is a much safer practice. It also improves debuggability if the download or script execution fails.

Comment on lines +183 to +184
curl -sL https://apt.boltops.com/boltops-key.public | gpg --dearmor -o /usr/share/keyrings/boltops-archive-keyring.gpg \
&& echo "deb [signed-by=/usr/share/keyrings/boltops-archive-keyring.gpg] https://apt.boltops.com stable main" > /etc/apt/sources.list.d/boltops.list \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Excellent change! apt-key is deprecated and its usage is discouraged for security reasons. Switching to gpg --dearmor and storing the key in /usr/share/keyrings is the modern, recommended, and more secure way to handle repository keys. This prevents the key from being trusted for all repositories on the system.

Comment on lines +201 to +204
curl -sLf "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" -o /tmp/install_kustomize.sh \
&& bash /tmp/install_kustomize.sh \
&& install -o root -g root -m 0755 kustomize /usr/local/bin/kustomize \
&& rm -f kustomize
&& rm -f kustomize /tmp/install_kustomize.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This change significantly improves security by avoiding the curl | bash anti-pattern. Downloading the script to a temporary file before execution is the recommended and safer approach. Great job on applying this consistently.


GITHUB_TOKEN_FILE=/runner/.token
echo -n $GITHUB_TOKEN > "$GITHUB_TOKEN_FILE"
echo -n "$GITHUB_TOKEN" > "$GITHUB_TOKEN_FILE"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Excellent! Quoting the $GITHUB_TOKEN variable is a crucial security and robustness improvement. It prevents potential issues like word splitting or globbing if the token were to contain special characters. This adheres to the shell scripting best practices outlined in the repository style guide.

References
  1. Variables, especially those containing secrets, must be quoted to prevent unexpected behavior and potential security risks. This is from line 8 of the repository style guide. (link)

# So, give it some time to finish the job.
while true; do
./config.sh remove --token $(cat "$GITHUB_TOKEN_FILE") && break
./config.sh remove --token "$(cat "$GITHUB_TOKEN_FILE")" && break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Good catch! Quoting the command substitution $(cat "$GITHUB_TOKEN_FILE") ensures that the token is passed as a single argument to the remove command, even if it contains whitespace or other special characters. This is a great improvement for script robustness and aligns with the style guide.

References
  1. Command substitutions should be quoted to ensure their output is treated as a single string, which is critical when dealing with secrets or file contents. This is from line 8 of the repository style guide. (link)

--name "${RUNNER_NAME:-$(hostname)}" \
--url "$GITHUB_URL" \
--token $(cat "$GITHUB_TOKEN_FILE") \
--token "$(cat "$GITHUB_TOKEN_FILE")" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Another great fix. Quoting the command substitution for the token is essential for script reliability, preventing issues if the token contains special characters. This change correctly follows the repository's style guide on quoting variables and command substitutions.

References
  1. Command substitutions should be quoted to ensure their output is treated as a single string, which is critical when dealing with secrets or file contents. This is from line 8 of the repository style guide. (link)

Comment on lines +33 to +38
# Use jq to safely construct JSON payload to handle special characters
JSON_PAYLOAD=$(jq -n --arg eventId "$EventId" '{"StartRequests": [{"EventId": $eventId}]}')
curl -sf -X POST "$METADATA_ENDPOINT" \
-H 'Metadata: true' \
-d "{\"StartRequests\": [{\"EventId\": \"${EventId}\"}]}"
-H 'Content-Type: application/json' \
-d "$JSON_PAYLOAD"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This is an excellent improvement for script robustness and security. Using jq to construct the JSON payload is much safer than manually creating the JSON string. It correctly handles any special characters that might be present in the EventId, preventing potential errors or injection vulnerabilities.


# Launch runners
for R in $(seq 1 $RUNNER_COUNT); do
for R in $(seq 1 "$RUNNER_COUNT"); do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Good job quoting the $RUNNER_COUNT variable. This prevents potential word splitting and globbing issues, making the script more robust. This change aligns with the repository's style guide.

References
  1. Variables should be quoted to prevent word splitting and globbing. This is from line 8 of the repository style guide. (link)

@fok666 fok666 merged commit 82329ff into main Jan 18, 2026
11 checks passed
@fok666 fok666 deleted the fix/improve-pipeline-reliability branch January 18, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants