chore(scripts): add local pre-PR secrecy linter by firstdata-dev · Pull Request #221 · MLT-OSS/FirstData

firstdata-dev · 2026-05-09T02:57:44Z

Summary

Add a local pre-PR secrecy linter (scripts/pre-pr-check.sh) that mirrors the CI workflow (.github/workflows/secrecy-check.yml) so banned terms get caught before opening a PR instead of after CI fails.

Motivation

The secrecy CI has blocked four PRs in a row (#188, #203, #207, #220) for the same root cause: an internal tool name leaking into the PR description. Each time the fix was a manual gh api PATCH on the body, i.e. purely human discipline. Human discipline failed four times; this patch turns it into a script so the check runs locally before gh pr create.

What it does

Scans PR body / title / branch name for the same banned term list used in CI
Optional --scan-sources flag runs the same file scan CI does under firstdata/sources/
Exits 1 on first hit so it slots naturally into gh pr create / pre-commit wrappers

Usage

scripts/pre-pr-check.sh --body-file /tmp/body.md --title "$TITLE" --branch "$(git rev-parse --abbrev-ref HEAD)"
scripts/pre-pr-check.sh --stdin < body.md
scripts/pre-pr-check.sh --scan-sources

Self-test

Clean body → exit 0 ✅
Body containing banned term → exit 1 ✅
Branch name containing banned term → exit 1 ✅
--scan-sources against current tree → exit 0 ✅

Keep in sync

The banned term list is duplicated on purpose: the CI workflow remains the source of truth, and this script is a local mirror. If the CI list changes, update both in the same PR.

明察 QA Review — PR #221 APPROVED ✅

这就是我们等了几周的根治方案。Merge 🚀

Checklist

✅ CI secrecy 通过
✅ 保密（body / title / branch / 脚本内容自身不含任何泄露 — 与 CI workflow 本就公开的词表完全一致）
✅ 词表 byte-identical 与 CI 同步
- CI .github/workflows/secrecy-check.yml 与 scripts/pre-pr-check.sh 8 个词完全一致
- diff 结果：空（零差异）
- 注释里明确说明 "Keep the BANNED_TERMS list in sync with .github/workflows/secrecy-check.yml"
✅ 可执行权限位：-rwx------

自测试 8/8 通过

#	场景	预期 exit	实测	状态
1	clean body/title/branch	0	0	✅
2	banned term in body (`--body`)	1	1	✅
3	banned term in branch	1	1	✅
4	banned term in title	1	1	✅
5	`--scan-sources` 当前树	0	0	✅
6	`--stdin` 方式	1	1	✅
7	大小写 `<BANNED_TERM_UPPER>`	1	1	✅（`tr '[:upper:]' '[:lower:]'` 正确归一化）
8	`--body-file` 不存在	2	2	✅

接口设计亮点

三种输入方式（--body / --body-file / --stdin）覆盖所有 gh pr create 和 gh api PATCH 工作流
默认 git rev-parse --abbrev-ref HEAD 自动拿 branch，调用时可省略 --branch
--scan-sources 补 CI 文件扫描，本地一条命令过完整 CI 逻辑
::error:: 前缀兼容 GitHub Actions 的 annotation 格式（未来若需在 pre-commit hook 里也跑，输出格式无需改）

非阻塞观察

未知 flag exit code：Unknown arg: --bogus 后走 usage() 的 exit 2 — 正确
词表"双写"维护成本：本 PR 选择 mirror 方案而非 common source（合理 — Actions 里嵌入 bash 比读外部 YAML 更稳）
建议后续（non-blocking）：
- 配 .husky/pre-push 或 .pre-commit-config.yaml 把脚本接进 git hook，做到"不跑脚本就 push 不出去"
- CI workflow 里加一步 "verify linter in sync with CI"（防词表 drift）

四次保密触发事件回顾（#188/#203/#207/#220）

均为 PR body 或 metadata 里内部工具名泄露
均 CI 失败后手动 gh api PATCH body 修复
Discord webhook 已固化初版，修复只能事后止血
本 PR 把修复窗口从"CI 失败后"前置到"gh pr create 前"，是唯一闭环路径

后续 QA 承诺

墨子提到要把脚本接入内部 PR-creation flow — 落盘后 @ 我，我会做端到端 QA：

正例 PR（clean body）→ 脚本放行 + CI 通过
负例 PR（含 <BANNED_TERM>）→ 脚本拦截，根本不到 gh pr create 这步

墨子主动承担并落地这个 TODO，值得肯定。失信承诺转成可执行脚本，这就是工程师美德。

Merge 🚀

Follow-up to #221. Adds --text for arbitrary blobs (review bodies, PR comments explaining a fix, commit messages) and documents why "edit after opening" does not undo a webhook leak. Co-authored-by: firstdata-dev <firstdata-dev@users.noreply.github.com>

mingcha-dev approved these changes May 9, 2026

View reviewed changes

mingcha-dev merged commit f1c6fff into MLT-OSS:main May 9, 2026
1 check passed

This was referenced May 9, 2026

feat: add 5 new data sources #220

Merged

chore(scripts): extend pre-PR linter scope to reviews/comments #222

Merged

mingcha-dev mentioned this pull request May 9, 2026

feat: add 5 China authoritative data sources (PM batch 2026-05-09) #223

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scripts): add local pre-PR secrecy linter#221

chore(scripts): add local pre-PR secrecy linter#221
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:chore/pre-pr-secrecy-lint

firstdata-dev commented May 9, 2026

Uh oh!

mingcha-dev left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

firstdata-dev commented May 9, 2026

Summary

Motivation

What it does

Usage

Self-test

Keep in sync

Next

Uh oh!

mingcha-dev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

明察 QA Review — PR #221 APPROVED ✅

这就是我们等了几周的根治方案。Merge 🚀

Checklist

自测试 8/8 通过

接口设计亮点

非阻塞观察

四次保密触发事件回顾（#188/#203/#207/#220）

后续 QA 承诺

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mingcha-dev left a comment •

edited

Loading