Skip to content

Add commit analyst for commit-driven pipeline orchestration#60

Open
mcljot wants to merge 8 commits intomainfrom
commit-analyst
Open

Add commit analyst for commit-driven pipeline orchestration#60
mcljot wants to merge 8 commits intomainfrom
commit-analyst

Conversation

@mcljot
Copy link
Copy Markdown
Collaborator

@mcljot mcljot commented Mar 25, 2026

The story of a code change becoming documentation:

  1. A developer pushes code to a repo (e.g., adds a new --timeout flag to a CLI tool)
  2. The gate checker asks: "Are there any new commits since the last time I looked?" It reads a tiny bookmark file that says "last time I checked, I was at commit abc123." If there
    are new commits after that, it says "yes, ready!" and hands over the list of commit SHAs.
  3. The commit analyst reads the code diffs and answers one question: "Did anything change that users would need to know about?" It grades the answer: High, Medium, Low, or None.
    If None → stop here, update the bookmark, done.
  4. The requirements analyst reads the commit analyst's report and figures out: "What docs need to be created or updated?" It produces a checklist like "add deprecation notice to
    the manifests page" and "remove deleted targets from the reference table."
  5. The planner maps those requirements to actual files — which pages to edit, what content to add, what format to use.
  6. The writer creates or edits the documentation files based on the plan.
  7. Two reviewers check the work — one for technical accuracy ("is this correct?"), one for style ("does this follow the style guide?").
  8. Optionally, a PR is created with the doc changes.
  9. The bookmark is updated to the latest commit SHA, so next time the gate checker only looks at newer commits.

The whole thing is like a factory assembly line: each step reads the output of the previous step and produces input for the next one. If a step fails, the progress is saved and
you can resume from where it left off.

Summary by CodeRabbit

  • New Features

    • Commit-driven documentation workflow (alternative to JIRA) with commit-analysis, ready-batch gating, short-circuit on no-impact, and optional PR/MR creation.
    • Commit-inspection CLI to list commits, files, and diffs, plus a CI gate to detect ready batches.
    • Automated PR/MR creation step that opens PRs/MRs and returns metadata.
  • Documentation

    • Extensive docs, examples, and agent/skill specs for commit-driven setup, CI integration, and marker-file tracking.
  • Chores

    • Plugin version bumped to 0.0.16.

mcljot and others added 2 commits March 25, 2026 11:21
Add the ability to trigger docs-orchestrator workflows from code
repository commits instead of JIRA tickets. This enables automated
documentation generation when code changes land on main.

New components:
- git_commit_reader.py: commit-level API script (list, diff, range-diff, files)
- docs-workflow-commits-ready: gate skill to poll for new commits
- commit-analyst agent: extracts change signals and grades doc impact
- docs-workflow-commit-analysis: step skill wrapping the commit-analyst
- docs-workflow-create-pr: optional step to create PR/MR in docs repo
- docs-commit-workflow.yaml: 8-step workflow definition

Modified components:
- docs-orchestrator: --commits and --create-pr flags, short-circuit on
  None impact, commit marker updates, source_type in progress file
- requirements-analyst: accepts commit analysis as input source
- docs-workflow-requirements: --commit-analysis argument and prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commits where all changed files are excluded by git_filters.yaml
(tests, CI, lock files, etc.) cannot have doc impact. The new
--drop-empty flag on git_commit_reader.py list removes these commits
before the LLM sees them, saving analysis time and cost.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a commit-driven documentation workflow to docs-tools: new commit-analysis agent/spec, commit-ready gate script, git commit reader CLI, commit-oriented orchestrator steps and workflow YAML, create-PR automation, README additions, and plugin version bumps to 0.0.16.

Changes

Cohort / File(s) Summary
Plugin Version
.claude-plugin/marketplace.json, plugins/docs-tools/.claude-plugin/plugin.json
Bumped docs-tools version from 0.0.150.0.16.
README & Docs
plugins/docs-tools/README.md
Added "Commit-driven workflow" docs, CI examples, CLI examples, and commit-marker tracking.
Agent Specs
plugins/docs-tools/agents/commit-analyst.md, plugins/docs-tools/agents/requirements-analyst.md
Added commit-analyst spec (analysis rubric, output schema) and updated requirements-analyst to accept commit-analysis input.
Orchestrator Skill
plugins/docs-tools/skills/docs-orchestrator/SKILL.md
Extended orchestrator to support JIRA- and commit-driven modes, new flags (--commits, --create-pr), updated progress schema, DOC_IMPACT short-circuit, and commit-marker updates.
Commit Workflow YAML
plugins/docs-tools/skills/docs-orchestrator/defaults/docs-commit-workflow.yaml
New docs-commit-workflow with ordered steps and conditional create-pr step.
Commit-Analysis Skill
plugins/docs-tools/skills/docs-workflow-commit-analysis/SKILL.md
New skill that dispatches docs-tools:commit-analyst and writes <base-path>/commit-analysis/analysis.md.
Commits-Ready Gate
plugins/docs-tools/skills/docs-workflow-commits-ready/SKILL.md, plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh
New CI-facing gate skill and script: reads per-repo marker, lists commits via git_commit_reader.py, checks prior progress, emits JSON readiness payload.
Create-PR Skill & Script
plugins/docs-tools/skills/docs-workflow-create-pr/SKILL.md, plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py
New step and script to create GitHub PR / GitLab MR from workflow outputs; detects platform/default branch and writes create-pr/pr-info.md.
Requirements Skill Update
plugins/docs-tools/skills/docs-workflow-requirements/SKILL.md
Accepts generalized identifier arg and optional --commit-analysis <path>; agent prompts and inputs are mode-dependent.
Commit Reader CLI
plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py
New Python CLI supporting list, diff, range-diff, files, range-files; file filtering, GitHub/GitLab readers, JSON output and error semantics.

Sequence Diagram

sequenceDiagram
    participant CI as CI/CD Pipeline
    participant Gate as commits-ready Gate
    participant Reader as git_commit_reader
    participant Orch as docs-orchestrator
    participant CommitA as commit-analyst Agent
    participant Steps as Workflow Steps
    participant PRAPI as GitHub/GitLab API
    participant Marker as Repo Marker File

    CI->>Gate: call commits-ready-check.sh (--repo, --branch, --base-path)
    Gate->>Marker: read last_processed_sha
    Gate->>Reader: list commits (--drop-empty ...)
    Reader-->>Gate: commit list / filtered_stats
    Gate->>Marker: check for existing workflow progress
    Gate-->>CI: return JSON {ready: true/false, batch}

    alt commits ready
        CI->>Orch: invoke docs-orchestrator (--commits <repo> <shas> [, --create-pr])
        Orch->>CommitA: dispatch commit-analyst (repo + SHAs)
        CommitA-->>Orch: write analysis.md (includes DOC_IMPACT)

        alt DOC_IMPACT: None
            Orch->>Marker: update marker (last_processed_sha, last_processed_at)
            Orch-->>CI: complete (short-circuit)
        else impact != None
            Orch->>Steps: run requirements → planning → prepare-branch → writing → reviews
            Steps-->>Orch: produce artifacts
            Orch->>PRAPI: call create_pr.py to open PR/MR
            PRAPI-->>Orch: return PR/MR URL
            Orch->>Marker: update marker & commit state
            Orch-->>CI: complete with PR/MR URL
        end
    else no new commits
        Gate-->>CI: return ready: false
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Git Safety Rules ❌ Error Pull request violates Git Safety Rules with hardcoded remote names, missing branch protection validation, and git push execution without user permission. Replace hardcoded remotes with dynamic discovery via 'git remote -v', add branch protection validation rejecting 'main'/'master', require explicit user permission before pushing, and update SKILL.md documentation.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the primary change: introducing a commit analyst for a commit-driven documentation pipeline orchestration workflow.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
No Real People Names In Style References ✅ Passed No real people's names are used as style references, example prompts, or manner descriptors in modified files; only generic role descriptions and established style guides are referenced.
No Untrusted Mcp Servers ✅ Passed Pull request contains no MCP server installations, suspicious package managers calls, or untrusted dependencies; uses only standard libraries and official packages.
Skill And Script Conventions ✅ Passed Pull request fully adheres to skill and script conventions with all skill references properly qualified using the docs-tools plugin prefix and correct script invocation patterns.
Plugin Registry Consistency ✅ Passed Marketplace sync verified: version 0.0.16 matches in both marketplace.json and plugin.json files with consistent metadata.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch commit-analyst

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
plugins/docs-tools/skills/docs-orchestrator/defaults/docs-commit-workflow.yaml (1)

44-48: Consider adding technical-review as an input dependency for create-pr.

The create-pr step currently depends on style-review and prepare-branch, but not technical-review. This means a PR could be created before technical accuracy is confirmed. If the workflow intends to ensure both reviews pass before PR creation, consider:

     - name: create-pr
       skill: docs-tools:docs-workflow-create-pr
       description: Create PR/MR with documentation changes
       when: create_pr
-      inputs: [style-review, prepare-branch]
+      inputs: [technical-review, style-review, prepare-branch]

If the current behavior is intentional (e.g., style review implicitly follows technical review in practice), this can be ignored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/docs-tools/skills/docs-orchestrator/defaults/docs-commit-workflow.yaml`
around lines 44 - 48, The create-pr step (name: create-pr, skill:
docs-tools:docs-workflow-create-pr) currently lists inputs [style-review,
prepare-branch]; update its inputs to include technical-review so the step waits
for the technical review to complete before creating the PR—i.e., modify the
inputs array on the create-pr step to include "technical-review" alongside
"style-review" and "prepare-branch".
plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py (1)

31-36: Cross-skill import is fragile but functional.

The relative path resolution for importing load_env_file from git-pr-reader works but is brittle—it will break if directory structure changes. Consider documenting this dependency or extracting load_env_file to a shared utilities module.

sys.path.insert(
    0,
    str(pathlib.Path(__file__).resolve().parent.parent.parent / "git-pr-reader" / "scripts"),
)
from git_pr_reader import load_env_file

This is acceptable for now given the internal plugin structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py`
around lines 31 - 36, The cross-skill import using sys.path.insert and a
relative path built from pathlib.Path(__file__).resolve().parent.parent.parent /
"git-pr-reader" / "scripts" to import load_env_file is brittle; either extract
load_env_file into a shared utilities package used by both plugins or add a
clear comment and documentation near the sys.path.insert explaining the
dependency and why the path is chosen (include the exact relative path and that
it must remain in sync), and ensure the import line (from git_pr_reader import
load_env_file) remains after the documented sys.path insertion so future
refactors know to update or replace it with the shared module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/agents/commit-analyst.md`:
- Around line 102-105: The short-circuit marker is inconsistent: Step 5
specifies `<!-- DOC_IMPACT: None -->` but the output template still shows `<!--
DOC_IMPACT: ... | NONE -->`; update the documentation in commit-analyst.md so
both the short-circuit step and the output template use the exact token `<!--
DOC_IMPACT: None -->` (replace any occurrences of `<!-- DOC_IMPACT: ... | NONE
-->` or `NONE` variants with the exact `None` token) to ensure the orchestrator
matches deterministically.

In `@plugins/docs-tools/agents/requirements-analyst.md`:
- Line 124: Update the cross-reference text so it uses fully qualified skill
names: replace mentions of jira-reader and git-pr-reader with
docs-tools:jira-reader and docs-tools:git-pr-reader respectively (so the line
reads that additional context should be fetched using docs-tools:jira-reader and
docs-tools:git-pr-reader). Ensure this rule is applied exactly as typed to match
the repo-wide skill naming convention referenced in the guidelines.

In `@plugins/docs-tools/README.md`:
- Around line 321-380: The CI examples don’t persist
`.claude/docs/.commit-markers/<repo-slug>.json`, causing repeated discovery of
the same commits; update the pipeline steps that run the claude commands (the
shell block that sets RESULT/READY/IDENTIFIER/COMMITS and calls the
docs-orchestrator) to restore the marker directory before checkout processing
and save it after processing (use your CI’s cache/artifact/volume actions), or
alternatively add and document a persistent base path flag to the orchestrator
invocation (e.g., pass a --base-path or env var pointing to a mounted persistent
directory instead of the repo working tree) and ensure the same path is restored
between scheduled runs so last_processed_sha is preserved.

In `@plugins/docs-tools/skills/docs-workflow-commit-analysis/SKILL.md`:
- Around line 39-60: The dispatch step currently relies on the subagent
docs-tools:commit-analyst to write <OUTPUT_FILE>, which is non-deterministic;
change the step so the parent skill captures the dispatched agent's analysis
response and writes it itself to a deterministic file name (analysis.md).
Specifically, after calling the agent with subagent_type
docs-tools:commit-analyst and the given prompt, read the agent's returned
content (response body), validate it, and perform the file write to analysis.md
(instead of assuming the subagent will create <OUTPUT_FILE>); ensure the
verification step checks the locally written analysis.md. Use the existing
dispatch/verify flow and reference the dispatch invocation and <OUTPUT_FILE>
placeholders when updating SKILL.md instructions.

In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`:
- Around line 30-33: The script currently sets COMMIT_READER by resolving a
hard-coded relative path via SCRIPT_DIR and "../../git-pr-reader/..."; update it
to use the repo-wide cross-skill root variable instead: replace the
COMMIT_READER assignment to reference
"${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_commit_reader.py"
(preserve quoting) so the script uses ${CLAUDE_PLUGIN_ROOT} for cross-skill
lookup rather than ../../ traversal; keep SCRIPT_DIR if used elsewhere but
ensure COMMIT_READER points to the CLAUDE_PLUGIN_ROOT-based path and still
points to the correct git_commit_reader.py filename.
- Around line 81-84: The script currently swallows jq failures when reading the
marker file (the SINCE_SHA assignment block using jq on MARKER_FILE), causing
malformed markers to be ignored and triggering a full replay; change this to
detect and fail on malformed marker instead: run jq without the "|| true"
fallback, capture jq's exit code or output, and if jq fails or returns
empty/invalid JSON, log an error (using your existing logging or stderr) and
exit non-zero to stop the gate; specifically modify the SINCE_SHA assignment
block that calls jq -r '.last_processed_sha // empty' "$MARKER_FILE" so it
validates jq succeeded and the value is non-empty before proceeding.

In `@plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py`:
- Around line 94-112: The fallback regex in read_branch_info currently uses a
character class that only lists lowercase letters which can fail for uppercase
branch names; update the fallback pattern (the second re.search assigned to
branch_match) to use an explicit case-inclusive class like [A-Za-z0-9_/-]+ (or a
more permissive pattern such as \S+ if you prefer) so uppercase characters are
accepted, keep or remove re.IGNORECASE as desired, and ensure the
branch_match.group(1) extraction still returns the branch name from branch_file
content.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Around line 406-415: If reader.get_commit_files(c["sha"]) raises, don't
fabricate counts and keep the commit actionable; instead, in the except block
set c["files_changed"]=0 and c["relevant_files_changed"]=0, do not increment
total_files or relevant_files, and add a marker like c["files_fetch_error"]=True
(and log the exception) so downstream logic (e.g., --drop-empty handling) can
drop or skip the commit; update the except branch that currently surrounds
get_commit_files/should_include_file to implement this behavior rather than
using c.get("files_changed", 0).
- Around line 78-93: The platform detection is too permissive for GitHub and too
strict for self‑hosted GitLab: change the conditional that currently uses if
"github.com" in host to an exact check (host == "github.com") so unrelated hosts
like notgithub.com don't match, keep an explicit host == "gitlab.com" branch,
and add a fallback branch that treats any other non‑GitHub host as a self‑hosted
GitLab (reusing the existing GitLab logic that builds owner, repo, owner_repo
from path_parts); update the conditions around host, owner, repo, and owner_repo
to reflect this.

---

Nitpick comments:
In
`@plugins/docs-tools/skills/docs-orchestrator/defaults/docs-commit-workflow.yaml`:
- Around line 44-48: The create-pr step (name: create-pr, skill:
docs-tools:docs-workflow-create-pr) currently lists inputs [style-review,
prepare-branch]; update its inputs to include technical-review so the step waits
for the technical review to complete before creating the PR—i.e., modify the
inputs array on the create-pr step to include "technical-review" alongside
"style-review" and "prepare-branch".

In `@plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py`:
- Around line 31-36: The cross-skill import using sys.path.insert and a relative
path built from pathlib.Path(__file__).resolve().parent.parent.parent /
"git-pr-reader" / "scripts" to import load_env_file is brittle; either extract
load_env_file into a shared utilities package used by both plugins or add a
clear comment and documentation near the sys.path.insert explaining the
dependency and why the path is chosen (include the exact relative path and that
it must remain in sync), and ensure the import line (from git_pr_reader import
load_env_file) remains after the documented sys.path insertion so future
refactors know to update or replace it with the shared module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cab2e3e5-c379-4a9c-b881-4c2de96bbd6b

📥 Commits

Reviewing files that changed from the base of the PR and between 1b4e143 and 5ae0035.

📒 Files selected for processing (14)
  • .claude-plugin/marketplace.json
  • plugins/docs-tools/.claude-plugin/plugin.json
  • plugins/docs-tools/README.md
  • plugins/docs-tools/agents/commit-analyst.md
  • plugins/docs-tools/agents/requirements-analyst.md
  • plugins/docs-tools/skills/docs-orchestrator/SKILL.md
  • plugins/docs-tools/skills/docs-orchestrator/defaults/docs-commit-workflow.yaml
  • plugins/docs-tools/skills/docs-workflow-commit-analysis/SKILL.md
  • plugins/docs-tools/skills/docs-workflow-commits-ready/SKILL.md
  • plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh
  • plugins/docs-tools/skills/docs-workflow-create-pr/SKILL.md
  • plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py
  • plugins/docs-tools/skills/docs-workflow-requirements/SKILL.md
  • plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py

Comment thread plugins/docs-tools/agents/commit-analyst.md
Comment thread plugins/docs-tools/agents/requirements-analyst.md
Comment on lines +321 to +380
```yaml
docs-commit-check:
stage: docs
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
script:
- |
RESULT=$(claude -p "Skill: docs-tools:docs-workflow-commits-ready, \
args: \"--repo https://github.com/org/code-repo --branch main\"")
READY=$(echo "$RESULT" | jq -r '.ready')
if [ "$READY" != "true" ]; then
echo "No doc-impacting commits found."
exit 0
fi
IDENTIFIER=$(echo "$RESULT" | jq -r '.batch.identifier')
COMMITS=$(echo "$RESULT" | jq -r '.batch.commits | join(",")')
claude -p "Skill: docs-tools:docs-orchestrator, \
args: \"$IDENTIFIER --commits https://github.com/org/code-repo $COMMITS --create-pr\""
```

#### GitHub Actions example

```yaml
name: Docs from Commits
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours

jobs:
docs-from-commits:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install dependencies
run: |
npm install -g @anthropic-ai/claude-code
python3 -m pip install PyGithub python-gitlab jira pyyaml ratelimit requests beautifulsoup4 html2text

- name: Check and process commits
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
RESULT=$(claude -p "Skill: docs-tools:docs-workflow-commits-ready, \
args: \"--repo https://github.com/org/code-repo --branch main\"")
READY=$(echo "$RESULT" | jq -r '.ready')
if [ "$READY" != "true" ]; then
echo "No doc-impacting commits found."
exit 0
fi
IDENTIFIER=$(echo "$RESULT" | jq -r '.batch.identifier')
COMMITS=$(echo "$RESULT" | jq -r '.batch.commits | join(",")')
claude -p "Skill: docs-tools:docs-orchestrator, \
args: \"$IDENTIFIER --commits https://github.com/org/code-repo $COMMITS --create-pr\""
```

### Marker files

The commit-driven workflow tracks which commits have been processed using marker files at `.claude/docs/.commit-markers/<repo-slug>.json`. The orchestrator updates the marker on workflow completion, so failed workflows don't skip commits on retry.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The scheduled CI examples need persistent marker storage.

Both examples start from a fresh checkout but never restore or save .claude/docs/.commit-markers/. As written, the next scheduled run loses last_processed_sha and rediscovers the same commit range, which can reopen duplicate workflows or PRs. Please show a cache/artifact/volume restore-save step, or move marker storage to a persistent --base-path.

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 321-321: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


[warning] 343-343: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/README.md` around lines 321 - 380, The CI examples don’t
persist `.claude/docs/.commit-markers/<repo-slug>.json`, causing repeated
discovery of the same commits; update the pipeline steps that run the claude
commands (the shell block that sets RESULT/READY/IDENTIFIER/COMMITS and calls
the docs-orchestrator) to restore the marker directory before checkout
processing and save it after processing (use your CI’s cache/artifact/volume
actions), or alternatively add and document a persistent base path flag to the
orchestrator invocation (e.g., pass a --base-path or env var pointing to a
mounted persistent directory instead of the repo working tree) and ensure the
same path is restored between scheduled runs so last_processed_sha is preserved.

Comment on lines +39 to +60
### 2. Dispatch agent

Dispatch the `docs-tools:commit-analyst` agent with the following prompt.

**Agent tool parameters:**
- `subagent_type`: `docs-tools:commit-analyst`
- `description`: `Analyze commits for documentation impact`

**Prompt:**

> Analyze the following commits for documentation impact.
>
> **Repository**: `<REPO_URL>`
> **Commits**: `<SHA1>`, `<SHA2>`, ...
>
> Save your complete analysis to: `<OUTPUT_FILE>`
>
> Follow your standard analysis methodology (fetch commit data, parse messages, analyze diffs for change signals, grade doc impact). Format the output as structured markdown for the requirements-analyst.

### 3. Verify output

After the agent completes, verify the output file exists at `<OUTPUT_FILE>`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Have the step write analysis.md itself.

This step verifies <OUTPUT_FILE> after dispatch, but the actual write is delegated to docs-tools:commit-analyst. That agent does not advertise Write/Edit tools in the provided context, so file creation depends on ad-hoc shell output inside the subagent. Capture the agent response here and write it in the step to make the contract deterministic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-workflow-commit-analysis/SKILL.md` around
lines 39 - 60, The dispatch step currently relies on the subagent
docs-tools:commit-analyst to write <OUTPUT_FILE>, which is non-deterministic;
change the step so the parent skill captures the dispatched agent's analysis
response and writes it itself to a deterministic file name (analysis.md).
Specifically, after calling the agent with subagent_type
docs-tools:commit-analyst and the given prompt, read the agent's returned
content (response body), validate it, and perform the file write to analysis.md
(instead of assuming the subagent will create <OUTPUT_FILE>); ensure the
verification step checks the locally written analysis.md. Use the existing
dispatch/verify flow and reference the dispatch invocation and <OUTPUT_FILE>
placeholders when updating SKILL.md instructions.

Comment on lines +30 to +33
# Resolve git_commit_reader.py relative to this script
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
COMMIT_READER="${SCRIPT_DIR}/../../git-pr-reader/scripts/git_commit_reader.py"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Resolve the sibling reader via CLAUDE_PLUGIN_ROOT.

This reaches into another skill with ../../git-pr-reader/.... The repo convention is to use ${CLAUDE_PLUGIN_ROOT} for cross-skill script calls instead of hard-coded relative traversal.

Suggested fix
-# Resolve git_commit_reader.py relative to this script
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-COMMIT_READER="${SCRIPT_DIR}/../../git-pr-reader/scripts/git_commit_reader.py"
+: "${CLAUDE_PLUGIN_ROOT:?CLAUDE_PLUGIN_ROOT must be set}"
+COMMIT_READER="${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_commit_reader.py"

As per coding guidelines "Cross-skill and cross-command script calls must use ${CLAUDE_PLUGIN_ROOT} for paths like python3 ${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_pr_reader.py".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Resolve git_commit_reader.py relative to this script
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
COMMIT_READER="${SCRIPT_DIR}/../../git-pr-reader/scripts/git_commit_reader.py"
: "${CLAUDE_PLUGIN_ROOT:?CLAUDE_PLUGIN_ROOT must be set}"
COMMIT_READER="${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_commit_reader.py"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`
around lines 30 - 33, The script currently sets COMMIT_READER by resolving a
hard-coded relative path via SCRIPT_DIR and "../../git-pr-reader/..."; update it
to use the repo-wide cross-skill root variable instead: replace the
COMMIT_READER assignment to reference
"${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_commit_reader.py"
(preserve quoting) so the script uses ${CLAUDE_PLUGIN_ROOT} for cross-skill
lookup rather than ../../ traversal; keep SCRIPT_DIR if used elsewhere but
ensure COMMIT_READER points to the CLAUDE_PLUGIN_ROOT-based path and still
points to the correct git_commit_reader.py filename.

Comment on lines +94 to +112
def read_branch_info(base_path: str) -> dict:
"""Read branch info from the prepare-branch step output."""
branch_file = os.path.join(base_path, "prepare-branch", "branch-info.md")
if not os.path.exists(branch_file):
raise FileNotFoundError(f"Branch info not found at {branch_file}")

with open(branch_file) as f:
content = f.read()

# Extract branch name from the markdown
branch_match = re.search(r"\*\*Branch\*\*:\s*`?([^\s`]+)`?", content)
if not branch_match:
# Try alternative format
branch_match = re.search(r"branch[:\s]+`?([a-z0-9_/-]+)`?", content, re.IGNORECASE)

if not branch_match:
raise ValueError(f"Cannot extract branch name from {branch_file}")

return {"branch": branch_match.group(1), "raw": content}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Branch name extraction regex may be too permissive.

The fallback regex on line 107 (branch[:\s]+\?([a-z0-9_/-]+)`?`) uses case-insensitive matching but the character class only allows lowercase. Branch names with uppercase letters would fail to match.

Suggested fix
-        branch_match = re.search(r"branch[:\s]+`?([a-z0-9_/-]+)`?", content, re.IGNORECASE)
+        branch_match = re.search(r"branch[:\s]+`?([A-Za-z0-9_/-]+)`?", content, re.IGNORECASE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def read_branch_info(base_path: str) -> dict:
"""Read branch info from the prepare-branch step output."""
branch_file = os.path.join(base_path, "prepare-branch", "branch-info.md")
if not os.path.exists(branch_file):
raise FileNotFoundError(f"Branch info not found at {branch_file}")
with open(branch_file) as f:
content = f.read()
# Extract branch name from the markdown
branch_match = re.search(r"\*\*Branch\*\*:\s*`?([^\s`]+)`?", content)
if not branch_match:
# Try alternative format
branch_match = re.search(r"branch[:\s]+`?([a-z0-9_/-]+)`?", content, re.IGNORECASE)
if not branch_match:
raise ValueError(f"Cannot extract branch name from {branch_file}")
return {"branch": branch_match.group(1), "raw": content}
def read_branch_info(base_path: str) -> dict:
"""Read branch info from the prepare-branch step output."""
branch_file = os.path.join(base_path, "prepare-branch", "branch-info.md")
if not os.path.exists(branch_file):
raise FileNotFoundError(f"Branch info not found at {branch_file}")
with open(branch_file) as f:
content = f.read()
# Extract branch name from the markdown
branch_match = re.search(r"\*\*Branch\*\*:\s*`?([^\s`]+)`?", content)
if not branch_match:
# Try alternative format
branch_match = re.search(r"branch[:\s]+`?([A-Za-z0-9_/-]+)`?", content, re.IGNORECASE)
if not branch_match:
raise ValueError(f"Cannot extract branch name from {branch_file}")
return {"branch": branch_match.group(1), "raw": content}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-workflow-create-pr/scripts/create_pr.py`
around lines 94 - 112, The fallback regex in read_branch_info currently uses a
character class that only lists lowercase letters which can fail for uppercase
branch names; update the fallback pattern (the second re.search assigned to
branch_match) to use an explicit case-inclusive class like [A-Za-z0-9_/-]+ (or a
more permissive pattern such as \S+ if you prefer) so uppercase characters are
accepted, keep or remove re.IGNORECASE as desired, and ensure the
branch_match.group(1) extraction still returns the branch name from branch_file
content.

Comment on lines +78 to +93
if "github.com" in host:
platform = "github"
owner = path_parts[0]
repo = path_parts[1].removesuffix(".git")
owner_repo = f"{owner}/{repo}"
elif "gitlab" in host:
platform = "gitlab"
# GitLab supports nested groups: group/subgroup/project
repo = path_parts[-1].removesuffix(".git")
owner = "/".join(path_parts[:-1])
owner_repo = f"{owner}/{repo}"
else:
raise ValueError(
f"Cannot determine platform from URL: {url}\n"
"Supported: github.com, gitlab.com (or self-hosted GitLab)"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The URL parser does not actually support generic self-hosted GitLab.

elif "gitlab" in host only works when the hostname literally contains gitlab, while if "github.com" in host also accepts unrelated hosts like notgithub.com. That contradicts the supported-URL contract in the docstring/help text and can route valid repos to the wrong backend.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 78 - 93, The platform detection is too permissive for GitHub and too
strict for self‑hosted GitLab: change the conditional that currently uses if
"github.com" in host to an exact check (host == "github.com") so unrelated hosts
like notgithub.com don't match, keep an explicit host == "gitlab.com" branch,
and add a fallback branch that treats any other non‑GitHub host as a self‑hosted
GitLab (reusing the existing GitLab logic that builds owner, repo, owner_repo
from path_parts); update the conditions around host, owner, repo, and owner_repo
to reflect this.

Comment on lines +406 to +415
try:
files = reader.get_commit_files(c["sha"])
all_count = len(files)
rel_count = sum(1 for f in files if should_include_file(f["filename"], filters))
c["files_changed"] = all_count
c["relevant_files_changed"] = rel_count
total_files += all_count
relevant_files += rel_count
except Exception:
c["relevant_files_changed"] = c.get("files_changed", 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn file lookup failures into actionable commits.

If get_commit_files() fails here, the code keeps the commit and fabricates relevant_files_changed from a fallback field. With --drop-empty, an auth/rate-limit/API failure can therefore start the workflow on unverified commits.

Suggested fix
-        except Exception:
-            c["relevant_files_changed"] = c.get("files_changed", 0)
+        except Exception as exc:
+            raise RuntimeError(
+                f"Failed to load changed files for commit {c['sha']}: {exc}"
+            ) from exc
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 414-414: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 406 - 415, If reader.get_commit_files(c["sha"]) raises, don't fabricate
counts and keep the commit actionable; instead, in the except block set
c["files_changed"]=0 and c["relevant_files_changed"]=0, do not increment
total_files or relevant_files, and add a marker like c["files_fetch_error"]=True
(and log the exception) so downstream logic (e.g., --drop-empty handling) can
drop or skip the commit; update the except branch that currently surrounds
get_commit_files/should_include_file to implement this behavior rather than
using c.get("files_changed", 0).

mcljot and others added 2 commits March 25, 2026 16:55
Repos use either main or master as their default branch. Instead of
hardcoding main, detect the default branch from the GitHub/GitLab API
when --branch is not specified. The gate script also omits --branch
by default so auto-detection flows through.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Separate stderr from stdout in commits-ready-check.sh so python-gitlab
warnings don't corrupt the JSON output. Suppress the pagination warning
by passing get_all=False. Use the resolved branch name from the reader
output in all gate script JSON responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py (2)

224-226: Unused branch parameter in get_range_files methods.

Both GitHubCommitReader.get_range_files (line 225) and GitLabCommitReader.get_range_files (line 352) declare a branch parameter that is never used in the method body. Consider removing it or documenting why it's reserved for future use.

♻️ Proposed fix for GitHubCommitReader
     def get_range_files(
-        self, from_sha: str, to_sha: str, branch: str = "main"
+        self, from_sha: str, to_sha: str
     ) -> List[Dict[str, Any]]:

Also applies to: 351-353

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 224 - 226, The branch parameter is declared but unused in both
GitHubCommitReader.get_range_files and GitLabCommitReader.get_range_files;
remove the branch: str = "main" parameter from both method signatures (and any
corresponding interface/abstract method declaration) and update all internal
callers to stop passing a branch argument, or if removal breaks external
compatibility, mark it as intentionally unused by renaming to _branch and add a
short comment/docstring stating it's reserved for future use; ensure the change
is mirrored across overridden/related methods so signatures remain consistent.

166-177: Remove unused found_since variable.

The variable found_since is assigned on line 167 but never used anywhere in the function. This appears to be leftover from an earlier implementation.

♻️ Proposed fix
         results = []
-        found_since = since_sha is None
         collected = 0
 
         for commit in commits_iter:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 166 - 177, Remove the unused variable assignment "found_since = since_sha
is None" in the function that iterates over commits (the loop using
commits_iter); it's dead code leftover—delete that single assignment and ensure
no other references to found_since remain in the function (e.g., around the for
commit in commits_iter block and related logic that checks since_sha).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`:
- Around line 52-53: Remove unused variables or implement their intended
behavior: if you want a dry-run mode keep the --dry-run case and the DRY_RUN
variable and wire it into the script's side-effecting steps (e.g., in functions
that mutate state or call external commands like git push/notify, check DRY_RUN
and only print the action instead of executing); otherwise remove the --dry-run
branch and DRY_RUN assignment. Likewise either remove the ALL_SHAS,
FILTERED_COUNT, and FILTERED_MAP variables if they are vestigial, or use them
where filtering/aggregation is intended (populate ALL_SHAS with the list of
candidate commit SHAs, update FILTERED_MAP when filtering commits, and set
FILTERED_COUNT from its length) so downstream logic references them. Ensure
references use the exact names ALL_SHAS, FILTERED_MAP, FILTERED_COUNT and update
any conditional or reporting output to reflect dry-run behavior if kept.

---

Nitpick comments:
In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Around line 224-226: The branch parameter is declared but unused in both
GitHubCommitReader.get_range_files and GitLabCommitReader.get_range_files;
remove the branch: str = "main" parameter from both method signatures (and any
corresponding interface/abstract method declaration) and update all internal
callers to stop passing a branch argument, or if removal breaks external
compatibility, mark it as intentionally unused by renaming to _branch and add a
short comment/docstring stating it's reserved for future use; ensure the change
is mirrored across overridden/related methods so signatures remain consistent.
- Around line 166-177: Remove the unused variable assignment "found_since =
since_sha is None" in the function that iterates over commits (the loop using
commits_iter); it's dead code leftover—delete that single assignment and ensure
no other references to found_since remain in the function (e.g., around the for
commit in commits_iter block and related logic that checks since_sha).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ddbc1748-8e43-4a6c-900b-c3e7b70dfd86

📥 Commits

Reviewing files that changed from the base of the PR and between 5ae0035 and e4ad9e2.

📒 Files selected for processing (3)
  • plugins/docs-tools/skills/docs-workflow-commits-ready/SKILL.md
  • plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh
  • plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py
✅ Files skipped from review due to trivial changes (1)
  • plugins/docs-tools/skills/docs-workflow-commits-ready/SKILL.md

- Fix DOC_IMPACT marker case inconsistency (NONE → None) to match
  orchestrator short-circuit pattern
- Add CI marker persistence tip to README commit-driven workflow docs
- Log warning to stderr on file enrichment failures in git_commit_reader

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py (1)

427-430: ⚠️ Potential issue | 🟡 Minor

Error handling on file fetch still masks failures for --drop-empty logic.

When get_commit_files() fails, the code falls back to using c.get("files_changed", 0) (which may be the line-change count from stats.total) for both files_changed and relevant_files_changed. This means:

  1. Failed fetches are indistinguishable from successful ones in the output
  2. With --drop-empty, commits where the API failed won't be dropped even if they have no relevant files

Consider adding an explicit error marker (e.g., c["files_fetch_error"] = True) so downstream consumers can identify and handle these cases appropriately.

🛡️ Suggested improvement
         except Exception as exc:
             print(f"Warning: failed to fetch files for {c['sha'][:7]}: {exc}", file=sys.stderr)
-            c["files_changed"] = c.get("files_changed", 0)
-            c["relevant_files_changed"] = c.get("files_changed", 0)
+            c["files_changed"] = 0
+            c["relevant_files_changed"] = 0
+            c["files_fetch_error"] = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 427 - 430, The except block that handles failures from get_commit_files
currently masks errors by populating c["files_changed"] and
c["relevant_files_changed"] from stats, making failed fetches indistinguishable
and preventing --drop-empty from dropping those commits; update that except
handler to set a clear error marker c["files_fetch_error"]=True, keep or set
c["files_changed"] to the available fallback value, and explicitly set
c["relevant_files_changed"]=0 (or None if upstream expects that) so downstream
logic (including --drop-empty) can detect the fetch failure via
files_fetch_error and treat the commit as having no relevant files; locate this
change in the except handling around the get_commit_files call where c is
modified.
🧹 Nitpick comments (1)
plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py (1)

167-167: Dead code: found_since is assigned but never used.

This variable is set but never referenced in the function body.

♻️ Suggested fix
         results = []
-        found_since = since_sha is None
         collected = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` at line
167, The variable found_since is assigned but never used; remove the unused
assignment "found_since = since_sha is None" from the function to eliminate dead
code, or if the original intent was to track whether we've encountered
since_sha, use found_since in the function's control flow (e.g., replace any
implicit checks with this boolean) so the variable is actually referenced;
ensure references are to the exact variable names found_since and since_sha when
applying the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Line 189: The field "files_changed" is populated with commit.stats.total
(which is total line changes), so update the code that builds commit records
(the object using "files_changed") to use len(commit.files) instead of
commit.stats.total to represent the number of files changed; alternatively, if
you intend to keep line-change counts, rename the field to something like
"lines_changed" and preserve commit.stats.total. Locate the creation of the
commit dict/record where "files_changed" is assigned (the usage of
commit.stats.total) and replace with len(commit.files) or rename the key
consistently, and ensure any downstream callers such as list_commits() or
cmd_list() are updated to expect the new key or value semantics.

---

Duplicate comments:
In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Around line 427-430: The except block that handles failures from
get_commit_files currently masks errors by populating c["files_changed"] and
c["relevant_files_changed"] from stats, making failed fetches indistinguishable
and preventing --drop-empty from dropping those commits; update that except
handler to set a clear error marker c["files_fetch_error"]=True, keep or set
c["files_changed"] to the available fallback value, and explicitly set
c["relevant_files_changed"]=0 (or None if upstream expects that) so downstream
logic (including --drop-empty) can detect the fetch failure via
files_fetch_error and treat the commit as having no relevant files; locate this
change in the except handling around the get_commit_files call where c is
modified.

---

Nitpick comments:
In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Line 167: The variable found_since is assigned but never used; remove the
unused assignment "found_since = since_sha is None" from the function to
eliminate dead code, or if the original intent was to track whether we've
encountered since_sha, use found_since in the function's control flow (e.g.,
replace any implicit checks with this boolean) so the variable is actually
referenced; ensure references are to the exact variable names found_since and
since_sha when applying the fix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e2abf59-391b-421c-8a21-128d806120f7

📥 Commits

Reviewing files that changed from the base of the PR and between e4ad9e2 and a47e6f1.

📒 Files selected for processing (3)
  • plugins/docs-tools/README.md
  • plugins/docs-tools/agents/commit-analyst.md
  • plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py
✅ Files skipped from review due to trivial changes (1)
  • plugins/docs-tools/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/docs-tools/agents/commit-analyst.md

"author": commit.commit.author.name if commit.commit.author else "unknown",
"date": commit.commit.author.date.isoformat() if commit.commit.author else None,
"parents": [p.sha for p in commit.parents],
"files_changed": commit.stats.total if commit.stats else 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

PyGithub commit stats.total vs files count

💡 Result:

In PyGithub, for a Commit object, commit.stats is a CommitStats object from the GitHub API's /repos/{owner}/{repo}/commits/{ref} endpoint. The GitHub API response includes a 'stats' object with fields: 'additions' (lines added), 'deletions' (lines deleted), and 'total' (total lines changed, i.e., additions + deletions). There is also a separate 'files' array listing changed files, each with per-file stats. PyGithub's CommitStats.total corresponds to the API's stats.total (total line changes). The number of files changed is len(commit.files), which may be limited to 300 by default (paginated up to 3000). They differ: stats.total is total lines changed across all files; files count is the number of unique files modified in the commit.

Citations:


🏁 Script executed:

cat -n plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py | sed -n '180,200p'

Repository: redhat-documentation/redhat-docs-agent-tools

Length of output: 1144


🏁 Script executed:

# Check the get_commit_files method to understand file handling
cat -n plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py | sed -n '199,220p'

Repository: redhat-documentation/redhat-docs-agent-tools

Length of output: 1051


🏁 Script executed:

# Search for usages of "files_changed" in the codebase
rg "files_changed" plugins/docs-tools/skills/git-pr-reader/ -n

Repository: redhat-documentation/redhat-docs-agent-tools

Length of output: 1207


🏁 Script executed:

# Check lines around 423-424 to see what all_count and rel_count are
cat -n plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py | sed -n '400,440p'

Repository: redhat-documentation/redhat-docs-agent-tools

Length of output: 1889


🏁 Script executed:

# Check what list_commits returns and if it's used before being enriched
cat -n plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py | sed -n '1,50p'

Repository: redhat-documentation/redhat-docs-agent-tools

Length of output: 2015


🏁 Script executed:

# Look at the full list_commits method to understand the flow
rg "def list_commits" plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py -A 30

Repository: redhat-documentation/redhat-docs-agent-tools

Length of output: 2146


commit.stats.total returns total line changes, not file count.

In PyGithub, stats.total is the sum of additions and deletions (total line changes), not the number of files changed. This creates a semantic mismatch with the field name files_changed. Using commit.stats.total here means the field contains incorrect data when consumed directly from list_commits() or if the downstream enrichment in cmd_list() fails. Consider using len(commit.files) for accurate file counts, or rename the field to clarify that it represents line changes rather than file count.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` at line
189, The field "files_changed" is populated with commit.stats.total (which is
total line changes), so update the code that builds commit records (the object
using "files_changed") to use len(commit.files) instead of commit.stats.total to
represent the number of files changed; alternatively, if you intend to keep
line-change counts, rename the field to something like "lines_changed" and
preserve commit.stats.total. Locate the creation of the commit dict/record where
"files_changed" is assigned (the usage of commit.stats.total) and replace with
len(commit.files) or rename the key consistently, and ensure any downstream
callers such as list_commits() or cmd_list() are updated to expect the new key
or value semantics.

- Remove unused variables in commits-ready-check.sh (DRY_RUN,
  ALL_SHAS, FILTERED_COUNT, FILTERED_MAP)
- Remove dead found_since assignment in git_commit_reader.py
- Prefix unused branch parameter as _branch in get_range_files
- Fix files_changed initial value (was line count, now 0 like GitLab)
- Simplify exception handler to not mask --drop-empty defaults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mcljot mcljot marked this pull request as ready for review March 25, 2026 20:57
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py (2)

418-429: ⚠️ Potential issue | 🟠 Major

Do not keep commits actionable when file fetch fails.

If get_commit_files() fails, the commit keeps default include behavior (relevant_files_changed unset + fallback 1 in drop logic), which can trigger downstream analysis on unverified commits.

Proposed fix
         except Exception as exc:
             print(f"Warning: failed to fetch files for {c['sha'][:7]}: {exc}", file=sys.stderr)
-            # Leave relevant_files_changed unset; --drop-empty defaults to keeping commit
+            c["files_changed"] = 0
+            c["relevant_files_changed"] = 0
+            c["files_fetch_error"] = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 418 - 429, When reader.get_commit_files(c["sha"]) raises, don't leave
c["relevant_files_changed"] unset (which currently falls back to 1); instead
explicitly mark the commit as having no relevant files so it won't be considered
actionable: inside the except block set c["relevant_files_changed"] = 0 (and
also set c["files_changed"] = 0 if you want total_files accounting to exclude
this commit), and avoid incrementing total_files or relevant_files for this
commit (update/remove the total_files/relevant_files increments accordingly).
This change touches the error handler around reader.get_commit_files, and
references c["relevant_files_changed"], c["files_changed"], total_files,
relevant_files and should_include_file for locating the logic.

78-93: ⚠️ Potential issue | 🟠 Major

Fix platform detection for GitHub and self-hosted GitLab hosts.

if "github.com" in host matches unrelated hosts (e.g., notgithub.com), while elif "gitlab" in host rejects valid self-hosted GitLab domains that do not contain gitlab.

Proposed fix
-    if "github.com" in host:
+    if host == "github.com":
         platform = "github"
         owner = path_parts[0]
         repo = path_parts[1].removesuffix(".git")
         owner_repo = f"{owner}/{repo}"
-    elif "gitlab" in host:
+    elif host == "gitlab.com" or host != "github.com":
         platform = "gitlab"
         # GitLab supports nested groups: group/subgroup/project
         repo = path_parts[-1].removesuffix(".git")
         owner = "/".join(path_parts[:-1])
         owner_repo = f"{owner}/{repo}"
-    else:
-        raise ValueError(
-            f"Cannot determine platform from URL: {url}\n"
-            "Supported: github.com, gitlab.com (or self-hosted GitLab)"
-        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 78 - 93, The platform detection incorrectly uses substring checks on host;
update the logic that examines host (the host variable used with path_parts and
url) to use more precise domain/suffix matching: detect GitHub when host ==
"github.com" or host.endswith(".github.com"); detect GitLab when host ==
"gitlab.com" or host.endswith(".gitlab.com"); for self-hosted GitLab allow a
configurable list (e.g., GITLAB_HOSTS) or as a fallback treat any non-GitHub
host with a standard repo path (path_parts length >=2 and
repo.endswith(".git")/no suffix) as GitLab; keep the existing owner/repo
extraction (owner = path_parts[0] / repo = path_parts[1].removesuffix(".git")
for GitHub, and repo = path_parts[-1].removesuffix(".git"), owner =
"/".join(path_parts[:-1]) for GitLab) and raise the same ValueError only when
none of these detection methods match.
plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh (2)

29-31: ⚠️ Potential issue | 🟠 Major

Use ${CLAUDE_PLUGIN_ROOT} for cross-skill script resolution.

Hard-coded ../../git-pr-reader/... is brittle and violates the repository rule for cross-skill script calls.

Proposed fix
-# Resolve git_commit_reader.py relative to this script
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-COMMIT_READER="${SCRIPT_DIR}/../../git-pr-reader/scripts/git_commit_reader.py"
+: "${CLAUDE_PLUGIN_ROOT:?CLAUDE_PLUGIN_ROOT must be set}"
+COMMIT_READER="${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_commit_reader.py"

As per coding guidelines "Cross-skill and cross-command script calls must use ${CLAUDE_PLUGIN_ROOT} for paths like python3 ${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_pr_reader.py".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`
around lines 29 - 31, Replace the brittle relative path assigned to
COMMIT_READER (currently built from SCRIPT_DIR and "../../git-pr-reader/...")
with a cross-skill-aware invocation using the CLAUDE_PLUGIN_ROOT environment
variable; update the COMMIT_READER reference so it invokes the git_pr_reader
script via python3
${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_pr_reader.py (keep
SCRIPT_DIR unchanged if still needed elsewhere) so all cross-skill script calls
follow the repository guideline.

79-82: ⚠️ Potential issue | 🟠 Major

Fail fast on malformed marker JSON instead of silently replaying.

jq ... || true on Line 81 suppresses marker corruption and causes reprocessing from scratch.

Proposed fix
 if [[ -z "$SINCE_SHA" ]]; then
   if [[ -f "$MARKER_FILE" ]]; then
-    SINCE_SHA=$(jq -r '.last_processed_sha // empty' "$MARKER_FILE" 2>/dev/null || true)
+    if ! jq empty "$MARKER_FILE" >/dev/null 2>&1; then
+      echo "{\"error\": \"Invalid marker file: $MARKER_FILE\"}"
+      exit 1
+    fi
+    SINCE_SHA=$(jq -r '.last_processed_sha // empty' "$MARKER_FILE")
   fi
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`
around lines 79 - 82, The script silently ignores jq errors when reading
MARKER_FILE into SINCE_SHA (the command using jq -r '.last_processed_sha //
empty' "$MARKER_FILE" 2>/dev/null || true), which causes corrupted marker JSON
to be treated as empty and replayed; change this so malformed JSON causes
failure: remove the "|| true" suppression, propagate jq failures (or use jq -e)
and on error print a clear message including MARKER_FILE and exit with a
non-zero status so the problem is surfaced instead of silently replaying.
🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh (1)

86-95: Build reader invocation as an argument array to avoid splitting bugs.

Constructing READER_ARGS as a single string and expanding unquoted can break on edge-case argument values.

Proposed refactor
-READER_ARGS="list $REPO --max $MAX_COMMITS --no-merges --drop-empty --json"
+READER_ARGS=(list "$REPO" --max "$MAX_COMMITS" --no-merges --drop-empty --json)
 if [[ -n "$BRANCH" ]]; then
-  READER_ARGS="$READER_ARGS --branch $BRANCH"
+  READER_ARGS+=(--branch "$BRANCH")
 fi
 if [[ -n "$SINCE_SHA" ]]; then
-  READER_ARGS="$READER_ARGS --since $SINCE_SHA"
+  READER_ARGS+=(--since "$SINCE_SHA")
 fi
@@
-COMMIT_OUTPUT=$(python3 "$COMMIT_READER" $READER_ARGS 2>"$COMMIT_STDERR") || {
+COMMIT_OUTPUT=$(python3 "$COMMIT_READER" "${READER_ARGS[@]}" 2>"$COMMIT_STDERR") || {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`
around lines 86 - 95, The READER_ARGS string expansion is unsafe and can split
on spaces; change READER_ARGS into a bash array (e.g., declare -a READER_ARGS)
and build it as READER_ARGS=(list "$REPO" --max "$MAX_COMMITS" --no-merges
--drop-empty --json), append branch/sha with READER_ARGS+=(--branch "$BRANCH")
and READER_ARGS+=(--since "$SINCE_SHA") when set, then invoke the reader with
python3 "$COMMIT_READER" "${READER_ARGS[@]}" 2>"$COMMIT_STDERR" so arguments are
preserved; keep COMMIT_STDERR and COMMIT_OUTPUT usage the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Around line 170-176: The loop currently breaks on collected >= max_commits
only when since_sha is None, causing --max to be ignored if --since is provided;
update the logic in the loop that inspects commits (variables: collected,
max_commits, since_sha, commit.sha) so that you check and break on collected >=
max_commits regardless of since_sha (i.e., perform the max_commits check before
or independently of the since_sha checks), ensuring you still break if
commit.sha equals or startswith since_sha.

---

Duplicate comments:
In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`:
- Around line 29-31: Replace the brittle relative path assigned to COMMIT_READER
(currently built from SCRIPT_DIR and "../../git-pr-reader/...") with a
cross-skill-aware invocation using the CLAUDE_PLUGIN_ROOT environment variable;
update the COMMIT_READER reference so it invokes the git_pr_reader script via
python3 ${CLAUDE_PLUGIN_ROOT}/skills/git-pr-reader/scripts/git_pr_reader.py
(keep SCRIPT_DIR unchanged if still needed elsewhere) so all cross-skill script
calls follow the repository guideline.
- Around line 79-82: The script silently ignores jq errors when reading
MARKER_FILE into SINCE_SHA (the command using jq -r '.last_processed_sha //
empty' "$MARKER_FILE" 2>/dev/null || true), which causes corrupted marker JSON
to be treated as empty and replayed; change this so malformed JSON causes
failure: remove the "|| true" suppression, propagate jq failures (or use jq -e)
and on error print a clear message including MARKER_FILE and exit with a
non-zero status so the problem is surfaced instead of silently replaying.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py`:
- Around line 418-429: When reader.get_commit_files(c["sha"]) raises, don't
leave c["relevant_files_changed"] unset (which currently falls back to 1);
instead explicitly mark the commit as having no relevant files so it won't be
considered actionable: inside the except block set c["relevant_files_changed"] =
0 (and also set c["files_changed"] = 0 if you want total_files accounting to
exclude this commit), and avoid incrementing total_files or relevant_files for
this commit (update/remove the total_files/relevant_files increments
accordingly). This change touches the error handler around
reader.get_commit_files, and references c["relevant_files_changed"],
c["files_changed"], total_files, relevant_files and should_include_file for
locating the logic.
- Around line 78-93: The platform detection incorrectly uses substring checks on
host; update the logic that examines host (the host variable used with
path_parts and url) to use more precise domain/suffix matching: detect GitHub
when host == "github.com" or host.endswith(".github.com"); detect GitLab when
host == "gitlab.com" or host.endswith(".gitlab.com"); for self-hosted GitLab
allow a configurable list (e.g., GITLAB_HOSTS) or as a fallback treat any
non-GitHub host with a standard repo path (path_parts length >=2 and
repo.endswith(".git")/no suffix) as GitLab; keep the existing owner/repo
extraction (owner = path_parts[0] / repo = path_parts[1].removesuffix(".git")
for GitHub, and repo = path_parts[-1].removesuffix(".git"), owner =
"/".join(path_parts[:-1]) for GitLab) and raise the same ValueError only when
none of these detection methods match.

---

Nitpick comments:
In
`@plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh`:
- Around line 86-95: The READER_ARGS string expansion is unsafe and can split on
spaces; change READER_ARGS into a bash array (e.g., declare -a READER_ARGS) and
build it as READER_ARGS=(list "$REPO" --max "$MAX_COMMITS" --no-merges
--drop-empty --json), append branch/sha with READER_ARGS+=(--branch "$BRANCH")
and READER_ARGS+=(--since "$SINCE_SHA") when set, then invoke the reader with
python3 "$COMMIT_READER" "${READER_ARGS[@]}" 2>"$COMMIT_STDERR" so arguments are
preserved; keep COMMIT_STDERR and COMMIT_OUTPUT usage the same.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4a17921-d1be-4320-b516-0fb360920374

📥 Commits

Reviewing files that changed from the base of the PR and between a47e6f1 and fbe127b.

📒 Files selected for processing (2)
  • plugins/docs-tools/skills/docs-workflow-commits-ready/scripts/commits-ready-check.sh
  • plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py

Comment on lines +170 to +176
if collected >= max_commits and since_sha is None:
break

if since_sha and commit.sha == since_sha:
break
if since_sha and commit.sha.startswith(since_sha):
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--max is currently ignored for GitHub when --since is provided.

On Line 170, the cap only applies when since_sha is None, so a long history can return unbounded commits despite --max.

Proposed fix
-            if collected >= max_commits and since_sha is None:
+            if collected >= max_commits:
                 break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/git-pr-reader/scripts/git_commit_reader.py` around
lines 170 - 176, The loop currently breaks on collected >= max_commits only when
since_sha is None, causing --max to be ignored if --since is provided; update
the logic in the loop that inspects commits (variables: collected, max_commits,
since_sha, commit.sha) so that you check and break on collected >= max_commits
regardless of since_sha (i.e., perform the max_commits check before or
independently of the since_sha checks), ensuring you still break if commit.sha
equals or startswith since_sha.

mcljot and others added 2 commits March 25, 2026 21:14
The create-pr step previously assumed the branch was already pushed
to the remote. Add commit_and_push() to stage, commit, and push
documentation changes before creating the PR/MR via the API. This
closes the gap that prevented --create-pr from working in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add technical-review to create-pr input dependencies in workflow YAML
- Add --draft + --create-pr mutual exclusion check in orchestrator
- Replace git add -A with targeted staging that excludes process
  artifacts (analysis, requirements, plan, reviews) from the PR commit
- Fix setup-hooks.sh to use SCRIPT_DIR instead of CLAUDE_PLUGIN_ROOT
  env var, which is not guaranteed outside hook/MCP subprocesses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mcljot mcljot self-assigned this Mar 25, 2026
@@ -0,0 +1,195 @@
#!/bin/bash
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just update the existing git-pr-reader to fetch commits?

@aireilly
Copy link
Copy Markdown
Member

aireilly commented Apr 1, 2026

Could the existing https://github.com/redhat-documentation/redhat-docs-agent-tools/blob/main/plugins/docs-tools/agents/requirements-analyst.md be updated to also take code repo commit ranges as an input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants