-
Notifications
You must be signed in to change notification settings - Fork 514
Implement automated CI failure investigator #3861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| description = "Automatically investigates and diagnoses CI test failures" | ||
| prompt = """ | ||
| You are a world-class autonomous software diagnostics agent. Your purpose is to analyze failed CI/CD runs, pinpoint the root cause in the codebase, and write a highly actionable diagnostic comment. | ||
|
|
||
| ## Context Available: | ||
| - **Failed Log Excerpt:** Available in `.gemini/failed_logs.txt`. Use `cat` to view it. | ||
| - **Pull Request Ref / Diff:** Use the `pull_request_read` tool or explore files using shell tools. | ||
|
|
||
| ## Systematic Diagnostics Protocol: | ||
|
|
||
| *Optimizing Investigation Efficiency:* Perform cheap, lightweight actions first (reading local log files, searching git history, checking previous issues/diffs) before initiating deep analysis of codebase modules or downloading heavy build artifacts. | ||
|
|
||
| 1. **Read & Parse failed_logs.txt:** | ||
| - Locate the failing test functions, classes, or scripts. | ||
| - Extract the exact error messages and tracebacks. | ||
| - **Group and Compare Failures:** If a run has multiple failures, determine if they are all part of a single cascade (sharing the same root cause) or if different independent root causes are at play. Focus deep-dive analysis on the most recent/representative failure, but explicitly note if multiple distinct failure modes were found. | ||
| - Limit detailed trace extraction to up to 3 representative examples to avoid cluttering the final report. | ||
|
|
||
| 2. **Explore Related Issues / Previous Runs (Cheap Search):** | ||
| - Check if this is a recurring or known flake by searching recent issues, discussions, or git history for similar error messages or failing test names. | ||
| - If a similar error has been encountered before, reference those occurrences or prior investigation outcomes. | ||
|
|
||
| 3. **Locate the Failing Component:** | ||
| - Search the codebase using `search_code` or look up the files where the failing tests or code reside. | ||
|
|
||
| 4. **Analyze Changes & Identify Culprits:** | ||
| - **PR runs:** Compare the failure traces with the recent code additions/deletions in the PR. Identify if the failures are due to syntax, logic, sharding changes, parameter mismatches, environment configuration, or infrastructure issues. | ||
| - *Fallback check:* If no clear link is found between the failure and the PR changes (low confidence), check the git log/blame of the failing component on `main`. Try to identify if a recent upstream PR or commit merged to the base branch (`main`) might be the actual culprit. | ||
| - **Scheduled runs (on main):** If investigating a scheduled failure on the `main` branch, inspect the git log history (e.g., `git log`, `git blame`) of the failing component/test file. Identify recent merges or commits that modified relevant paths, and try to identify the specific 'culprit PR' or commit that likely introduced the failure. | ||
|
|
||
| 5. **Calibrate Tone and Confidence:** | ||
| - State your confidence level: **low**, **moderate**, or **high**. | ||
| - **Codebase vs. Infrastructure Distinction:** Explicitly distinguish whether you believe the failure is a codebase regression (e.g., logical bugs, syntax, API mismatches) or an infrastructure/environment flake (e.g., TPU provisioning failures, GCS timeout errors, CUDA out-of-memory or driver issues). | ||
| - Default to "possible cause" or "hypothesis" language. | ||
| - Upgrade to "likely cause" only when multiple independent pieces of evidence converge (e.g., a suspicious commit + matching error signature + timing correlation). | ||
| - Use "confirmed cause" only when evidence is unambiguous. | ||
| - If inconclusive, say so. Partial findings and ruling things out is still valuable. Avoid assertive phrasing like "the root cause is" unless genuinely certain. | ||
|
|
||
| 6. **Formulate the Diagnostics Report:** | ||
| - Write a clean, professional, and precise markdown report matching the template below. Do not be overly wordy; get straight to the facts. | ||
| - **Save the Report**: You MUST write and save this formulated markdown report to `.gemini/findings.md`. | ||
| - **Keep it Concise:** If there are many failing tests due to the same error or infra issue, mention that a cascade occurred, list up to 3 representative examples, and explain the single root cause instead of repeating sections. | ||
|
|
||
| ## Report Template: | ||
|
|
||
| ```markdown | ||
| ### 🤖 CI Failure Investigation Report | ||
|
|
||
| I have analyzed the recent test failures in the CI pipeline and identified the following: | ||
|
|
||
| #### 🔍 What Failed | ||
| *(If there are many failures, group them by root cause and list only up to 3 representative example test cases)* | ||
| * **Job/Matrix**: `Matrix-Flavor-Name` | ||
| * **Failing Test**: `test_filename.py::test_function_name` | ||
| * **Error**: `TypeError: ...` | ||
|
|
||
| #### 🪵 Error Details & Stack Trace | ||
| ```python | ||
| [Short stack trace snippet showing where the error(s) occurred] | ||
| ``` | ||
|
|
||
| #### 💡 Root Cause Analysis & Context | ||
| **Confidence:** [low / moderate / high] *(Calibrate based on whether this is a hypothesis, a likely cause, or a confirmed cause)* | ||
|
|
||
| [Provide a clear explanation connecting the failure(s) to recent changes made in this PR, or to infrastructure issues. If you searched for previous occurrences or similar issues, summarize those findings here.] | ||
|
|
||
| #### 🛠️ Recommended Fix *(Only include this section if Confidence is HIGH)* | ||
| [Provide the recommended code block diff(s) or specific file edit(s) to fix the issue(s).] | ||
| ``` | ||
|
|
||
| 7. **Execute the Report:** | ||
| - **Determine Target Destination:** | ||
| - If the environment variable `PULL_REQUEST_NUMBER` is present and non-empty, post the report as a comment on that PR/issue using the `add_issue_comment` tool. | ||
| - If `PULL_REQUEST_NUMBER` is empty or not a valid number (such as in a scheduled CI run failure), use `gh issue list --state open` with the shell tool to locate the open failure notification issue for the "MaxText Package Tests" workflow. If found, post the report as a comment on that issue using the `gh issue comment <issue-number> --body-file .gemini/findings.md` command. | ||
| - If no target issue is found, verify that the findings are written to `.gemini/findings.md` so it is preserved in the runner's artifacts. | ||
|
|
||
| """ |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,119 @@ | ||||||||||||||||||||
| name: 'Gemini Failure Investigator' | ||||||||||||||||||||
|
|
||||||||||||||||||||
| on: | ||||||||||||||||||||
| workflow_call: | ||||||||||||||||||||
| inputs: | ||||||||||||||||||||
| additional_context: | ||||||||||||||||||||
| type: 'string' | ||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🟡 The `workflow_call` trigger defines `additional_context` as an input, but `failed_run_id` is missing from the `workflow_call` inputs definition. This means `github.event.inputs.failed_run_id` will likely be undefined when called from another workflow.
Suggested change
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added failed_run_id below |
||||||||||||||||||||
| required: false | ||||||||||||||||||||
| failed_run_id: | ||||||||||||||||||||
| type: 'string' | ||||||||||||||||||||
| required: false | ||||||||||||||||||||
|
|
||||||||||||||||||||
| permissions: | ||||||||||||||||||||
| contents: 'read' | ||||||||||||||||||||
| id-token: 'write' | ||||||||||||||||||||
| issues: 'write' | ||||||||||||||||||||
| pull-requests: 'write' | ||||||||||||||||||||
| actions: 'read' # Required to fetch workflow logs | ||||||||||||||||||||
|
|
||||||||||||||||||||
| jobs: | ||||||||||||||||||||
| investigate: | ||||||||||||||||||||
| runs-on: 'ubuntu-latest' | ||||||||||||||||||||
| steps: | ||||||||||||||||||||
| - name: 'Checkout repository' | ||||||||||||||||||||
| uses: 'actions/checkout@v4' | ||||||||||||||||||||
| with: | ||||||||||||||||||||
| persist-credentials: 'false' | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - name: 'Gather failed logs' | ||||||||||||||||||||
| env: | ||||||||||||||||||||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||||||||||||||||||||
| RUN_ID: ${{ github.event.workflow_run.id || inputs.failed_run_id }} | ||||||||||||||||||||
| REPO: ${{ github.repository }} | ||||||||||||||||||||
| BRANCH: ${{ github.event.pull_request.head.ref }} | ||||||||||||||||||||
| SHA: ${{ github.event.pull_request.head.sha }} | ||||||||||||||||||||
| run: | | ||||||||||||||||||||
| mkdir -p .gemini | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Determine target run ID | ||||||||||||||||||||
| if [ -z "$RUN_ID" ]; then | ||||||||||||||||||||
| # Fallback to finding the latest failed run for this PR's specific commit | ||||||||||||||||||||
| if [ -n "$SHA" ]; then | ||||||||||||||||||||
| echo "Searching for failed runs for commit: $SHA" | ||||||||||||||||||||
| RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --commit "$SHA" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO") | ||||||||||||||||||||
| fi | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Fallback to branch if commit-specific run wasn't found | ||||||||||||||||||||
| if [ -z "$RUN_ID" ] && [ -n "$BRANCH" ]; then | ||||||||||||||||||||
| echo "Searching for failed runs on branch: $BRANCH" | ||||||||||||||||||||
| RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --branch "$BRANCH" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO") | ||||||||||||||||||||
| fi | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Global fallback | ||||||||||||||||||||
| if [ -z "$RUN_ID" ]; then | ||||||||||||||||||||
| echo "Searching for latest failed run across the repository" | ||||||||||||||||||||
| RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO") | ||||||||||||||||||||
| fi | ||||||||||||||||||||
| fi | ||||||||||||||||||||
|
|
||||||||||||||||||||
| echo "Gathering logs for failed run: $RUN_ID" | ||||||||||||||||||||
|
|
||||||||||||||||||||
| if [ -n "$RUN_ID" ]; then | ||||||||||||||||||||
| # Retrieve only the failing lines/jobs to avoid token limit overhead | ||||||||||||||||||||
| gh run view "$RUN_ID" --log-failed --repo "$REPO" > .gemini/failed_logs.txt || true | ||||||||||||||||||||
| else | ||||||||||||||||||||
| echo "No failed runs found." > .gemini/failed_logs.txt | ||||||||||||||||||||
| fi | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - name: 'Run Gemini Failure Investigator' | ||||||||||||||||||||
| uses: 'google-github-actions/run-gemini-cli@v0' | ||||||||||||||||||||
| env: | ||||||||||||||||||||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||||||||||||||||||||
| REPOSITORY: ${{ github.repository }} | ||||||||||||||||||||
| PULL_REQUEST_NUMBER: ${{ github.event.workflow_run.pull_requests[0].number || github.event.pull_request.number || github.event.issue.number }} | ||||||||||||||||||||
| with: | ||||||||||||||||||||
| gcp_location: '${{ vars.GOOGLE_CLOUD_LOCATION }}' | ||||||||||||||||||||
| gcp_project_id: '${{ vars.GOOGLE_CLOUD_PROJECT }}' | ||||||||||||||||||||
| gcp_service_account: '${{ vars.SERVICE_ACCOUNT_EMAIL }}' | ||||||||||||||||||||
| gcp_workload_identity_provider: '${{ vars.GCP_WIF_PROVIDER }}' | ||||||||||||||||||||
| gemini_api_key: '${{ secrets.GEMINI_API_KEY }}' | ||||||||||||||||||||
| gemini_cli_version: '${{ vars.GEMINI_CLI_VERSION }}' | ||||||||||||||||||||
| gemini_model: '${{ vars.GEMINI_MODEL }}' | ||||||||||||||||||||
| workflow_name: 'gemini-investigate' | ||||||||||||||||||||
| settings: |- | ||||||||||||||||||||
| { | ||||||||||||||||||||
| "model": { | ||||||||||||||||||||
| "maxSessionTurns": 15 | ||||||||||||||||||||
| }, | ||||||||||||||||||||
| "mcpServers": { | ||||||||||||||||||||
| "github": { | ||||||||||||||||||||
| "command": "docker", | ||||||||||||||||||||
| "args": [ | ||||||||||||||||||||
| "run", | ||||||||||||||||||||
| "-i", | ||||||||||||||||||||
| "--rm", | ||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🟢 It's good practice to ensure that the investigator has access to the most relevant tools. Since the prompt mentions searching git history and exploring files, ensure `mcpServers` configuration includes all necessary permissions if they are not already covered by the defaults or the explicitly listed tools.
The current list is good, but for "searching git history", you might eventually want tools that can run
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added more tools |
||||||||||||||||||||
| "-e", | ||||||||||||||||||||
| "GITHUB_PERSONAL_ACCESS_TOKEN", | ||||||||||||||||||||
| "ghcr.io/github/github-mcp-server:v0.27.0" | ||||||||||||||||||||
| ], | ||||||||||||||||||||
| "includeTools": [ | ||||||||||||||||||||
| "add_issue_comment", | ||||||||||||||||||||
| "pull_request_read", | ||||||||||||||||||||
| "search_code", | ||||||||||||||||||||
| "get_file_contents", | ||||||||||||||||||||
| "list_commits", | ||||||||||||||||||||
| "get_commit" | ||||||||||||||||||||
| ], | ||||||||||||||||||||
| "env": { | ||||||||||||||||||||
| "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" | ||||||||||||||||||||
| } | ||||||||||||||||||||
| } | ||||||||||||||||||||
| }, | ||||||||||||||||||||
| "tools": { | ||||||||||||||||||||
| "shell": { | ||||||||||||||||||||
| "allowCommands": ["cat", "grep", "head", "tail", "gh", "git", "find"] | ||||||||||||||||||||
| } | ||||||||||||||||||||
| } | ||||||||||||||||||||
| } | ||||||||||||||||||||
| prompt: '/gemini-investigate' | ||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -150,6 +150,7 @@ dmypy.json | |
|
|
||
| # Gemini CLI | ||
| .gemini/ | ||
| !.gemini/commands/ | ||
| gha-creds-*.json | ||
|
|
||
| # vscode workspace | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its intended, to protect against context injection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is to question about removal of
contextfor PR review. Could you help cross check?Originally, we could do something like "@gemini-cli /review I am worried about the numerics" and the sentence later is the extra context