Skip to content

fix(vale): scope changed-files diff to PR branch additions only#6211

Merged
dhtclk merged 1 commit into
mainfrom
fix/vale-pr-scope
May 12, 2026
Merged

fix(vale): scope changed-files diff to PR branch additions only#6211
dhtclk merged 1 commit into
mainfrom
fix/vale-pr-scope

Conversation

@dhtclk
Copy link
Copy Markdown
Collaborator

@dhtclk dhtclk commented May 12, 2026

Summary

  • Vale CI was flagging style issues on files no PR ever touched when the PR's branch was behind main. Reproduced on clickpipes: support 32 vcpus #6210 (a one-file PR; Vale reported 30 errors across 17 unrelated files).
  • Root cause: scripts/vale/changed_lines_to_json.py used endpoint diff (git diff <base> <head>), which surfaces both the PR's edits and main's drift since the branch was created.
  • Fix: switch to three-dot syntax (<base>...<head>), which scopes the diff to what the branch added since it diverged from main.

Problem (reproduced on #6210)

PR #6210 ("support 32 vcpus") changes a single file: docs/integrations/data-ingestion/clickpipes/postgres/scaling.md, +5/−5. Vale CI failed with 30 annotations across 17 different files, none of them scaling.md. Examples of flagged files:

  • docs/whats-new/changelog/index.md
  • docs/best-practices/json_type.md
  • docs/cloud/_snippets/_clickpipes_faq.md
  • docs/chdb/index.md

These are pre-existing style issues on files modified on main after the PR's branch was cut.

Cause

scripts/vale/changed_lines_to_json.py builds the "PR-changed" file list with git diff --name-only <base_sha> <head_sha> (endpoint comparison). When base.sha is today's main HEAD and head.sha is a PR commit whose parent is an older main, this diff returns two kinds of differences lumped together:

  1. The PR's actual edits (correct).
  2. Everything main has gained since the branch diverged but that the PR doesn't have (incorrect — that's main's drift, not the PR's work).

Vale then runs on the combined file set and reports pre-existing issues in files the PR never modified.

Fix

-cmd = f"git diff --name-only {base_sha} {head_sha}"
+cmd = f"git diff --name-only {base_sha}...{head_sha}"

-cmd = f"git diff --unified=0 {base_sha} {head_sha} -- {file_path}"
+cmd = f"git diff --unified=0 {base_sha}...{head_sha} -- {file_path}"

<base>...<head> is git shorthand for merge-base(base, head)..head — only what the branch added since it diverged. No behavior change for up-to-date PRs (when the branch's parent equals base.sha, three-dot and endpoint forms are identical).

Verification

Running the patched script against #6210's actual SHAs:

Found 1 changed files matching pattern
Processing file: docs/integrations/data-ingestion/clickpipes/postgres/scaling.md
Found 5 changed lines in docs/integrations/data-ingestion/clickpipes/postgres/scaling.md

1 file, 5 lines — matches the PR's actual +5 diff exactly. With the same SHAs on the unpatched script, the file list contained 17 files of main-drift.

Test plan

  • Patched script returns 1 file (scaling.md) when run against clickpipes: support 32 vcpus #6210's SHAs locally
  • CI re-run on a behind-main PR (e.g., clickpipes: support 32 vcpus #6210) confirms Vale annotations are scoped to PR-touched files
  • CI re-run on an up-to-date PR confirms no regression — three-dot and endpoint forms are equivalent when the branch is current
  • Spot-check that a PR with multi-hunk edits still produces correct line numbers in logs/changed_lines.json

Related

Sibling fix to #6188, which fixed within-hunk line-range tracking in vale_annotations.py. That PR corrected how the annotation parser maps Vale output to PR lines; this PR corrects which files and lines are considered "PR-changed" in the first place. Both scripts now agree on the PR scope.

🤖 Generated with Claude Code

Switch the two git diff invocations in changed_lines_to_json.py from
endpoint comparison (base head) to three-dot syntax (base...head),
which scopes the diff to what the PR branch added since it diverged
from main. The previous form folded main's drift since the branch
was created into the "PR-changed" file set, causing Vale to run on
files the PR never touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dhtclk dhtclk requested a review from a team as a code owner May 12, 2026 20:27
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview, Comment May 12, 2026 8:38pm
clickhouse-docs-jp Building Building Preview, Comment May 12, 2026 8:38pm
3 Skipped Deployments
Project Deployment Actions Updated (UTC)
clickhouse-docs-ko Ignored Ignored Preview May 12, 2026 8:38pm
clickhouse-docs-ru Ignored Ignored Preview May 12, 2026 8:38pm
clickhouse-docs-zh Ignored Ignored Preview May 12, 2026 8:38pm

Request Review

@dhtclk dhtclk merged commit a9d9dd3 into main May 12, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant