Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions .gemini/commands/gemini-investigate.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
description = "Automatically investigates and diagnoses CI test failures"
prompt = """
You are a world-class autonomous software diagnostics agent. Your purpose is to analyze failed CI/CD runs, pinpoint the root cause in the codebase, and write a highly actionable diagnostic comment.

## Context Available:
- **Failed Log Excerpt:** Available in `.gemini/failed_logs.txt`. Use `cat` to view it.
- **Pull Request Ref / Diff:** Use the `pull_request_read` tool or explore files using shell tools.

## Systematic Diagnostics Protocol:

*Optimizing Investigation Efficiency:* Perform cheap, lightweight actions first (reading local log files, searching git history, checking previous issues/diffs) before initiating deep analysis of codebase modules or downloading heavy build artifacts.

1. **Read & Parse failed_logs.txt:**
- Locate the failing test functions, classes, or scripts.
- Extract the exact error messages and tracebacks.
- **Group and Compare Failures:** If a run has multiple failures, determine if they are all part of a single cascade (sharing the same root cause) or if different independent root causes are at play. Focus deep-dive analysis on the most recent/representative failure, but explicitly note if multiple distinct failure modes were found.
- Limit detailed trace extraction to up to 3 representative examples to avoid cluttering the final report.

2. **Explore Related Issues / Previous Runs (Cheap Search):**
- Check if this is a recurring or known flake by searching recent issues, discussions, or git history for similar error messages or failing test names.
- If a similar error has been encountered before, reference those occurrences or prior investigation outcomes.

3. **Locate the Failing Component:**
- Search the codebase using `search_code` or look up the files where the failing tests or code reside.

4. **Analyze Changes & Identify Culprits:**
- **PR runs:** Compare the failure traces with the recent code additions/deletions in the PR. Identify if the failures are due to syntax, logic, sharding changes, parameter mismatches, environment configuration, or infrastructure issues.
- *Fallback check:* If no clear link is found between the failure and the PR changes (low confidence), check the git log/blame of the failing component on `main`. Try to identify if a recent upstream PR or commit merged to the base branch (`main`) might be the actual culprit.
- **Scheduled runs (on main):** If investigating a scheduled failure on the `main` branch, inspect the git log history (e.g., `git log`, `git blame`) of the failing component/test file. Identify recent merges or commits that modified relevant paths, and try to identify the specific 'culprit PR' or commit that likely introduced the failure.

5. **Calibrate Tone and Confidence:**
- State your confidence level: **low**, **moderate**, or **high**.
- **Codebase vs. Infrastructure Distinction:** Explicitly distinguish whether you believe the failure is a codebase regression (e.g., logical bugs, syntax, API mismatches) or an infrastructure/environment flake (e.g., TPU provisioning failures, GCS timeout errors, CUDA out-of-memory or driver issues).
- Default to "possible cause" or "hypothesis" language.
- Upgrade to "likely cause" only when multiple independent pieces of evidence converge (e.g., a suspicious commit + matching error signature + timing correlation).
- Use "confirmed cause" only when evidence is unambiguous.
- If inconclusive, say so. Partial findings and ruling things out is still valuable. Avoid assertive phrasing like "the root cause is" unless genuinely certain.

6. **Formulate the Diagnostics Report:**
- Write a clean, professional, and precise markdown report matching the template below. Do not be overly wordy; get straight to the facts.
- **Save the Report**: You MUST write and save this formulated markdown report to `.gemini/findings.md`.
- **Keep it Concise:** If there are many failing tests due to the same error or infra issue, mention that a cascade occurred, list up to 3 representative examples, and explain the single root cause instead of repeating sections.

## Report Template:

```markdown
### 🤖 CI Failure Investigation Report

I have analyzed the recent test failures in the CI pipeline and identified the following:

#### 🔍 What Failed
*(If there are many failures, group them by root cause and list only up to 3 representative example test cases)*
* **Job/Matrix**: `Matrix-Flavor-Name`
* **Failing Test**: `test_filename.py::test_function_name`
* **Error**: `TypeError: ...`

#### 🪵 Error Details & Stack Trace
```python
[Short stack trace snippet showing where the error(s) occurred]
```

#### 💡 Root Cause Analysis & Context
**Confidence:** [low / moderate / high] *(Calibrate based on whether this is a hypothesis, a likely cause, or a confirmed cause)*

[Provide a clear explanation connecting the failure(s) to recent changes made in this PR, or to infrastructure issues. If you searched for previous occurrences or similar issues, summarize those findings here.]

#### 🛠️ Recommended Fix *(Only include this section if Confidence is HIGH)*
[Provide the recommended code block diff(s) or specific file edit(s) to fix the issue(s).]
```

7. **Execute the Report:**
- **Determine Target Destination:**
- If the environment variable `PULL_REQUEST_NUMBER` is present and non-empty, post the report as a comment on that PR/issue using the `add_issue_comment` tool.
- If `PULL_REQUEST_NUMBER` is empty or not a valid number (such as in a scheduled CI run failure), use `gh issue list --state open` with the shell tool to locate the open failure notification issue for the "MaxText Package Tests" workflow. If found, post the report as a comment on that issue using the `gh issue comment <issue-number> --body-file .gemini/findings.md` command.
- If no target issue is found, verify that the findings are written to `.gemini/findings.md` so it is preserved in the runner's artifacts.

"""
31 changes: 29 additions & 2 deletions .github/workflows/gemini-dispatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request_review:
types: ['submitted']

# Trigger when a comment is added to the main conversation of a PR/Issue
issue_comment:
types: ['created']

# Trigger when any label is attached to the PR
pull_request:
types: ['labeled']
Expand Down Expand Up @@ -61,6 +65,7 @@ jobs:
command: '${{ steps.extract_command.outputs.command }}'
request: '${{ steps.extract_command.outputs.request }}'
additional_context: '${{ steps.extract_command.outputs.additional_context }}'
failed_run_id: '${{ steps.extract_command.outputs.failed_run_id }}'
issue_number: '${{ github.event.pull_request.number || github.event.issue.number }}'
steps:
- name: 'Mint identity token'
Expand Down Expand Up @@ -92,8 +97,13 @@ jobs:
core.setOutput('command', 'review');
} else if (request.startsWith("@gemini-cli /review")) {
core.setOutput('command', 'review');
const additionalContext = request.replace(/^@gemini-cli \/review/, '').trim();
core.setOutput('additional_context', additionalContext);
core.setOutput('additional_context', '');
} else if (request.startsWith("@gemini-cli /investigate")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 This change removes the ability to provide additional context to the `/review` command, which appears to be an unintended regression. The previous implementation correctly parsed and passed the context.
Suggested change
} else if (request.startsWith("@gemini-cli /investigate")) {
} else if (request.startsWith("@gemini-cli /review")) {
core.setOutput('command', 'review');
const additionalContext = request.replace(/^@gemini-cli \/review/, '').trim();
core.setOutput('additional_context', additionalContext);
} else if (request.startsWith("@gemini-cli /investigate")) {
core.setOutput('command', 'investigate');
const additionalContext = request.replace(/^@gemini-cli \/investigate/, '').trim();
core.setOutput('additional_context', additionalContext);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its intended, to protect against context injection.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is to question about removal of context for PR review. Could you help cross check?

Originally, we could do something like "@gemini-cli /review I am worried about the numerics" and the sentence later is the extra context

core.setOutput('command', 'investigate');
const parts = request.split(/\s+/);
const failedRunId = parts.length > 2 ? parts[2] : '';
core.setOutput('failed_run_id', failedRunId);
core.setOutput('additional_context', '');
} else if (request.startsWith("@gemini-cli")) {
const additionalContext = request.replace(/^@gemini-cli/, '').trim();
core.setOutput('command', 'invoke');
Expand Down Expand Up @@ -142,11 +152,28 @@ jobs:
additional_context: '${{ needs.dispatch.outputs.additional_context }}'
secrets: 'inherit'

investigate:
needs: 'dispatch'
if: |-
${{ needs.dispatch.outputs.command == 'investigate' }}
uses: './.github/workflows/gemini-investigate.yml'
permissions:
contents: 'read'
id-token: 'write'
issues: 'write'
pull-requests: 'write'
actions: 'read'
with:
additional_context: '${{ needs.dispatch.outputs.additional_context }}'
failed_run_id: '${{ needs.dispatch.outputs.failed_run_id }}'
secrets: 'inherit'

fallthrough:
needs:
- 'dispatch'
- 'review'
- 'invoke'
- 'investigate'
if: |-
${{ always() && !cancelled() && (failure() || needs.dispatch.outputs.command == 'fallthrough') }}
runs-on: 'ubuntu-latest'
Expand Down
119 changes: 119 additions & 0 deletions .github/workflows/gemini-investigate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
name: 'Gemini Failure Investigator'

on:
workflow_call:
inputs:
additional_context:
type: 'string'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The `workflow_call` trigger defines `additional_context` as an input, but `failed_run_id` is missing from the `workflow_call` inputs definition. This means `github.event.inputs.failed_run_id` will likely be undefined when called from another workflow.
Suggested change
type: 'string'
workflow_call:
inputs:
additional_context:
type: 'string'
required: false
failed_run_id:
type: 'string'
required: false

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added failed_run_id below

required: false
failed_run_id:
type: 'string'
required: false

permissions:
contents: 'read'
id-token: 'write'
issues: 'write'
pull-requests: 'write'
actions: 'read' # Required to fetch workflow logs

jobs:
investigate:
runs-on: 'ubuntu-latest'
steps:
- name: 'Checkout repository'
uses: 'actions/checkout@v4'
with:
persist-credentials: 'false'

- name: 'Gather failed logs'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
RUN_ID: ${{ github.event.workflow_run.id || inputs.failed_run_id }}
REPO: ${{ github.repository }}
BRANCH: ${{ github.event.pull_request.head.ref }}
SHA: ${{ github.event.pull_request.head.sha }}
run: |
mkdir -p .gemini

# Determine target run ID
if [ -z "$RUN_ID" ]; then
# Fallback to finding the latest failed run for this PR's specific commit
if [ -n "$SHA" ]; then
echo "Searching for failed runs for commit: $SHA"
RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --commit "$SHA" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
fi

# Fallback to branch if commit-specific run wasn't found
if [ -z "$RUN_ID" ] && [ -n "$BRANCH" ]; then
echo "Searching for failed runs on branch: $BRANCH"
RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --branch "$BRANCH" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
fi

# Global fallback
if [ -z "$RUN_ID" ]; then
echo "Searching for latest failed run across the repository"
RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
fi
fi

echo "Gathering logs for failed run: $RUN_ID"

if [ -n "$RUN_ID" ]; then
# Retrieve only the failing lines/jobs to avoid token limit overhead
gh run view "$RUN_ID" --log-failed --repo "$REPO" > .gemini/failed_logs.txt || true
else
echo "No failed runs found." > .gemini/failed_logs.txt
fi

- name: 'Run Gemini Failure Investigator'
uses: 'google-github-actions/run-gemini-cli@v0'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPOSITORY: ${{ github.repository }}
PULL_REQUEST_NUMBER: ${{ github.event.workflow_run.pull_requests[0].number || github.event.pull_request.number || github.event.issue.number }}
with:
gcp_location: '${{ vars.GOOGLE_CLOUD_LOCATION }}'
gcp_project_id: '${{ vars.GOOGLE_CLOUD_PROJECT }}'
gcp_service_account: '${{ vars.SERVICE_ACCOUNT_EMAIL }}'
gcp_workload_identity_provider: '${{ vars.GCP_WIF_PROVIDER }}'
gemini_api_key: '${{ secrets.GEMINI_API_KEY }}'
gemini_cli_version: '${{ vars.GEMINI_CLI_VERSION }}'
gemini_model: '${{ vars.GEMINI_MODEL }}'
workflow_name: 'gemini-investigate'
settings: |-
{
"model": {
"maxSessionTurns": 15
},
"mcpServers": {
"github": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 It's good practice to ensure that the investigator has access to the most relevant tools. Since the prompt mentions searching git history and exploring files, ensure `mcpServers` configuration includes all necessary permissions if they are not already covered by the defaults or the explicitly listed tools.

The current list is good, but for "searching git history", you might eventually want tools that can run git log or git blame if the shell tool is too restricted (though here you've allowed cat, grep, etc., which is a good start).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more tools

"-e",
"GITHUB_PERSONAL_ACCESS_TOKEN",
"ghcr.io/github/github-mcp-server:v0.27.0"
],
"includeTools": [
"add_issue_comment",
"pull_request_read",
"search_code",
"get_file_contents",
"list_commits",
"get_commit"
],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
}
}
},
"tools": {
"shell": {
"allowCommands": ["cat", "grep", "head", "tail", "gh", "git", "find"]
}
}
}
prompt: '/gemini-investigate'
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ dmypy.json

# Gemini CLI
.gemini/
!.gemini/commands/
gha-creds-*.json

# vscode workspace
Expand Down
Loading