github: isolate trivy scan from credentials and SARIF upload#11596
github: isolate trivy scan from credentials and SARIF upload#11596
Conversation
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
📝 WalkthroughWalkthroughThe Trivy scanning workflow was restructured to isolate permissions and responsibilities. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e062fef36f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Run Trivy with no registry credentials and no GitHub write permissions. | ||
| trivy-latest: | ||
| needs: prepare-images |
There was a problem hiding this comment.
Avoid gating all scans on the full fetch matrix
Because trivy-latest now needs: prepare-images, GitHub will wait for the entire fetch matrix to succeed before starting any scan leg. Since prepare-images no longer has continue-on-error: true, a single failure while pulling/exporting one architecture (for example, a missing linux/arm/v7 image or a transient registry/artifact error) causes all three scan jobs and all SARIF uploads to be skipped. In the previous workflow only the failing architecture was lost, so this change turns a partial outage into a complete loss of security-scan coverage.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/cron-trivy.yaml:
- Around line 67-72: Remove the job-level "continue-on-error: true" from the
trivy-latest matrix job (the job named "trivy-latest") so scan/artifact/SARIF
upload failures are not masked; keep advisory behavior by configuring Trivy's
own exit-code settings in the Trivy step instead, and rely on existing
orchestration controls (fail-fast: false on the matrix and downstream steps
using if: ${{ always() }}) to prevent the whole workflow from failing while
still surfacing real job-level errors.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f138ffca-0e93-4c2a-b0cc-16619e8f6399
📒 Files selected for processing (1)
.github/workflows/cron-trivy.yaml
| trivy-latest: | ||
| needs: prepare-images | ||
| name: ${{ matrix.arch }} container scan | ||
| runs-on: [ ubuntu-latest ] | ||
| continue-on-error: true | ||
| permissions: {} |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In GitHub Actions, what does job-level continue-on-error: truedo for failed steps and matrix jobs, and what is the defaultexit-codebehavior ofaquasecurity/trivy-actionwhenexit-code is omitted?
💡 Result:
In GitHub Actions, setting continue-on-error: true at the job level allows subsequent jobs that depend on it (via needs) to run even if the job encounters a failure in one of its steps. However, within the job itself, execution stops at the first failed step—subsequent steps in the same job do not run. Failed steps are still marked as failed (with steps..outcome and steps..conclusion set to 'failure'), but the job's overall result is treated as 'success' for dependency purposes (e.g., needs..result is 'success'). The workflow status remains failed, and the job shows a red X in the UI indicating internal failure. For matrix jobs (using strategy.matrix), continue-on-error: true at the job level applies to the entire matrix strategy. If any matrix leg fails, the job continues (subsequent jobs run), but unlike non-matrix jobs with strategy.fail-fast: false, the workflow may still succeed overall unless configured otherwise. Individual failed matrix legs are marked failed, but the job result is 'success' for dependencies. When exit-code is omitted in aquasecurity/trivy-action, the default is 0. Trivy exits with code 0 by default even if vulnerabilities are found (unless overridden via exit-code: '1' or config). This means the step succeeds regardless of findings, allowing workflows to continue and upload results (e.g., SARIF for code scanning).
Citations:
- 1: https://www.kenmuse.com/blog/how-to-handle-step-and-job-errors-in-github-actions/
- 2: https://stackoverflow.com/questions/70830094/marking-github-actions-workflow-as-failed-if-a-single-job-fails-in-matrix/70841204
- 3: Wrong behaviour when combining 'continue-on-error' and 'failure()' in subsequent steps actions/toolkit#1034
- 4: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions
- 5: https://docs.github.com/en/actions/using-jobs/using-jobs-in-a-workflow
- 6: https://github.com/aquasecurity/trivy-action/blob/master/README.md
- 7: https://github.com/aquasecurity/trivy-action
- 8: https://trivy.dev/docs/latest/configuration/others/
- 9: How to get the exit code of the trivy-action? aquasecurity/trivy-action#330
- 10: https://github.com/aquasecurity/trivy-action/blob/master/action.yaml
🏁 Script executed:
cat -n .github/workflows/cron-trivy.yaml | sed -n '60,130p'Repository: fluent/fluent-bit
Length of output: 3232
Don't mask scan and SARIF upload failures with job-level continue-on-error.
If you want findings to stay advisory, keep that policy in Trivy's exit-code setting, not on the whole job. At the job level, this mainly swallows real breakage—artifact download/load failures, DB fetch issues, or SARIF upload failures—so coverage can be lost without the workflow failing. The orchestration you need (fan-out/fan-in across matrix jobs) is already handled by fail-fast: false and if: ${{ always() }}.
Suggested changes
trivy-latest:
needs: prepare-images
name: ${{ matrix.arch }} container scan
runs-on: [ ubuntu-latest ]
- continue-on-error: true
permissions: {}
strategy:
fail-fast: false upload-trivy-results:
needs: trivy-latest
name: ${{ matrix.arch }} SARIF upload
runs-on: [ ubuntu-latest ]
if: ${{ always() }}
- continue-on-error: true
permissions:
contents: read
security-events: write📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| trivy-latest: | |
| needs: prepare-images | |
| name: ${{ matrix.arch }} container scan | |
| runs-on: [ ubuntu-latest ] | |
| continue-on-error: true | |
| permissions: {} | |
| trivy-latest: | |
| needs: prepare-images | |
| name: ${{ matrix.arch }} container scan | |
| runs-on: [ ubuntu-latest ] | |
| permissions: {} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/cron-trivy.yaml around lines 67 - 72, Remove the job-level
"continue-on-error: true" from the trivy-latest matrix job (the job named
"trivy-latest") so scan/artifact/SARIF upload failures are not masked; keep
advisory behavior by configuring Trivy's own exit-code settings in the Trivy
step instead, and rely on existing orchestration controls (fail-fast: false on
the matrix and downstream steps using if: ${{ always() }}) to prevent the whole
workflow from failing while still surfacing real job-level errors.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit