Skip to content

[TRTLLM-12152][infra] change based testing rules on tests#13899

Open
crazydemo wants to merge 14 commits intoNVIDIA:mainfrom
crazydemo:cbts-v1
Open

[TRTLLM-12152][infra] change based testing rules on tests#13899
crazydemo wants to merge 14 commits intoNVIDIA:mainfrom
crazydemo:cbts-v1

Conversation

@crazydemo
Copy link
Copy Markdown
Collaborator

@crazydemo crazydemo commented May 8, 2026

Summary by CodeRabbit

  • New Features

    • Added three new change-based test selection rules: test definition matching, test list validation, and out-of-scope detection.
    • Introduced dry-run tool to test CBTS behavior across historical commits.
    • Enhanced test matching to support both test IDs and file path lookups.
  • Documentation

    • Expanded rule system documentation with detailed scope definitions and fallback behaviors.
    • Added comprehensive rule architecture guide with trigger patterns and stage resolution logic.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

crazydemo added 14 commits May 8, 2026 17:39
Three rules cover the change surfaces that map to test selection:
WaivesRule (existing), TestsDefRule, TestListRule. A PR mixing them
combines to scope=testsonly via _TESTSONLY_FAMILY in _combine_scopes,
so test-def + test-list + waive edits in one PR still narrow.

blocks.py:
- YAMLIndex._yaml_first_components: top-level path components from
  every entry's canonical target. Drives namespace boundary discovery.
- YAMLIndex.git_path_to_yaml_key(git_path): translate repo-relative
  path to YAML namespace by walking from the first matching component.
  Returns None when no component matches (file outside any YAML tree).
- YAMLIndex.find_match_for_path(path): bidirectional lineage match in
  the pytest tree. Per block, prefix is the most-ancestral covering
  target (clamped at the anchor when only descendants match).
- _path_lookup_anchor: conftest.py / __init__.py anchor on enclosing
  dir; other files anchor on themselves.

rules/_helpers.py (new): pure-function pipeline shared by rules:
- stages_by_yaml_stem
- iter_diff_changes / iter_diff_post_line_numbers
- lookup_ids_into_block_filters (waives + testlist path)
- lookup_paths_into_block_filters (testdef path; registers each
  covering entry's raw text so it passes its own -k guard)
- resolve_affected_stages

rules/waives_rule.py: refactored to use _helpers; behavior unchanged.
parse_waives_diff inlined (no external importers).

rules/tests_def_rule.py (new): narrows on .py changes under tests/.
- AST scope mapping: function-level (file::TestC::test_m), class-level
  (file::TestC, when change is in setup_method etc.), or module-level
  -> file fallback. Reads post-PR file content for AST.
- Blast-radius cap (0.8) so changes lighting up most blocks fall back
  rather than narrow to "everything".
- sanity_relevant from l0_sanity_check membership; perfsanity_relevant
  from l0_perf / *perf_sanity* stems.

rules/test_list_rule.py (new): narrows on test-db YAML changes.
- Only ADDED entries drive narrow. Removed entries don't trigger
  verification. Comments / blank lines ignored. Any other non-entry
  +/- line (condition / mako / whole-block) -> fallback.

main.py: register both new rules, plumb repo_root to TestsDefRule via
build_rules; add _TESTSONLY_FAMILY combine to _combine_scopes.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Some PR file paths (currently: tests/integration/test_lists/qa/**) are
consumed only by post-merge / nightly QA workflows, not by pre-merge CI.
Previously such PRs hit Selector's "Unhandled files" path and fell back
to running the full pre-merge baseline, wasting CI on changes the
pipeline doesn't read.

OutOfScopeRule (rules/out_of_scope_rule.py): claims files matching
OUT_OF_SCOPE_PREFIXES, emits scope="noop" with empty stages and
sanity_relevant=False / perfsanity_relevant=False.

main.py:
- Register OutOfScopeRule.
- _combine_scopes: scope="noop" gives way to any other actionable
  rule that fired in the same PR; "noop" is returned only when it's
  the sole scope.
- Selector.run: skip the empty-stages safety net when scope=="noop"
  so the empty stage set passes through to Groovy Layer 2 instead of
  promoting to scope=None / fallback. Groovy already produces a no-op
  parallelJobsFiltered for cbts != null with empty affected_stages.

Effect: a PR that only touches tests/integration/test_lists/qa/**
narrows to 0 test stages (Build still runs upstream of CBTS Layer 2).
Mixed PRs combine normally -- noop drops out and the actionable scope
drives narrowing.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…odules

Two related blocks.py fixes that recover narrow on PRs previously
falling back to baseline.

1. _yaml_first_components parser missed top-level YAML entries
   (e.g. test_e2e.py::test_X). The old code took canonical.split("/", 1)[0]
   which returns "test_e2e.py::test_X" for a top-level entry, then rejected
   it because "::" is in the head. Fix: split by "/" then by "::" then by
   "[" to extract the file/dir component cleanly. Adds test_e2e.py /
   test_doc.py / test_fmha.py to the namespace boundary set.

2. _path_lookup_anchor previously special-cased only conftest.py and
   __init__.py to anchor on enclosing dir. Generalized to: any file whose
   basename does not start with `test_` anchors on its enclosing dir.
   Captures helper modules like disagg_test_utils.py, pytorch_model_config.py,
   perf_regression_utils.py, etc. — they're typically imported by tests in
   the same dir, and the blast-radius cap (0.8) protects against helpers
   covering most blocks.

Dry-run on last 200 upstream/main commits (60 tests-only PRs) under
/bot run mode: fallback count drops 12 -> 8, gaining narrow on:
  - pr-13482 (test_e2e.py): None -> testdefonly, 39 stages
  - pr-12796 (perf/pytorch_model_config.py): None -> testdefonly, 19 stages
  - pr-13367 (perf/perf_regression_utils.py): None -> testdefonly, 8 stages
  - pr-13496 (perf/open_search_db_utils.py): None -> testsonly, 8 stages

Remaining fallbacks are intentional: top-level conftest blast radius,
waivesonly unmatchable safety net, or test methods not yet enrolled in
any YAML test list.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Previously any waive id that found no covering YAML entry caused
WaivesRule to return scope=None, vetoing the whole PR's narrow even
when other rules (testdef / testlist) had a clean narrow contribution.
Unmatchable waives are pre-merge no-ops (the test isn't in any
pre-merge YAML test list, so adding/removing its SKIP doesn't affect
what runs), so vetoing is over-conservative.

Drop the early scope=None branch on misses. Continue with the matched
ids' narrow (lookup_ids_into_block_filters already excludes misses
from block_filters), and append a note to the reason string for
visibility. Pure-waivesonly all-miss PRs still fall back via Selector's
empty-stages safety net (no other rule contributes -> nothing to run).
Mixed PRs (e.g. waives.txt removal + helper module change) now combine
correctly to testsonly.

Effect on the 60-PR dry-run: pr-13634 (disagg_test_utils.py +
unmatchable waive removals) goes from None -> testsonly, 21 stages.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…tion

Two empty-narrow paths in WaivesRule were emitting scope="waiveonly"
with empty stages, tripping Selector's empty-stages safety net and
falling back to baseline:

- diff has no parseable test ids (whitespace / comment-only edits)
- all waive ids are unmatchable (test not in any pre-merge YAML)

Both are pre-merge no-ops semantically: they don't add or remove any
test from the run set. Emit scope="noop" (mirroring OutOfScopeRule),
so the Selector exception for noop scope passes through to Groovy as
a true no-op and mixed PRs combine cleanly via _combine_scopes drop-
ping noop in favor of any actionable rule that fired.

Effect on the 60-PR dry-run: pure-waivesonly all-miss PRs (pr-13254,
pr-13363, pr-13437, pr-13504, pr-13508) move from None -> noop.
Mixed pr-13634 (helper + unmatchable waive) goes testsonly ->
testdefonly (waives is now noop, only testdef contributes narrow).

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…tribution

Apply the principle that rule firing depends only on file-pattern match,
not on YAML lookup outcome:

testdef: out-of-namespace files (e.g. tests/unittest/...) and in-namespace
files with no covering YAML entry are now claimed as scope=noop instead
of being left unhandled (which forced fallback). Partial-narrow runs note
the no-narrow path count in the reason.

testlist: "no additions" and "all-miss" paths now return scope=noop
(previously testlistonly with empty stages and scope=None respectively).
Partial-miss runs note unresolved entries in the reason, mirroring
waivesonly's miss_note pattern.

This unblocks PRs whose only "unhandled" files are integration test
modules that don't yet appear in any YAML test list (e.g. test_fp8 when
only test_nvfp4 variants are listed) — they now narrow via class-level
lineage instead of falling back to full CI.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
… noop

Out-of-namespace .py files (where git_path_to_yaml_key returns None,
e.g. tests/integration/defs/conftest.py — the top-level integration
conftest auto-imported by every integration test) can have wide blast
radius via implicit pytest discovery. Treating them as noop silently
hides this risk.

Restore the asymmetric handling:
- in-namespace + empty lookup: rule decided no impact → scope=noop
- out-of-namespace: rule cannot decide → unhandled → fallback (None)

Reason text now distinguishes "in-namespace path(s) with no covering
YAML entry" from out-of-namespace paths.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Top-level: replace stale "v0 scope" section with current 4-rule map,
scope catalog (waiveonly / testdefonly / testlistonly / testsonly /
noop / null), and trigger-mode filter section. Update INPUT_JSON
and decision JSON examples to reflect post_merge field, sanity/
perfsanity flags, and noop scope.

rules/README: full per-rule writeup (WaivesRule, TestsDefRule,
TestListRule, OutOfScopeRule) covering each rule's diff handling,
lookup, and outcome matrix. Add helpers table for _helpers.py.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Markdown files under tests/ have no test-execution impact; claim them as
scope=noop so PRs that only touch docs (or mix docs with a code change)
don't fall back over the .md files.

Adds OUT_OF_SCOPE_TESTS_SUFFIXES alongside the existing
OUT_OF_SCOPE_PREFIXES; both feed _is_out_of_scope. Updates rule docs in
the top-level and rules/ READMEs.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Generalize TestsDefRule to handle any path under tests/ in YAML
namespace — not just .py files. find_match_for_path walks up enclosing
directories to the narrowest YAML-covered ancestor, so:

- accuracy/references/*.yaml          → anchor accuracy/
- disaggregated/test_configs/*.yaml   → anchor disaggregated/
- unittest/api_stability/references/*.yaml → anchor unittest/api_stability/
- conftest.py / __init__.py / helpers (existing)

For test_*.py files, file-level anchor is preserved (no walk-up) so
new test files in already-covered dirs still get the noop semantic
when no YAML entry covers them.

Out-of-namespace paths (top-level integration conftest, dirs no YAML
references like tests/integration/test_input_files/) still fall back —
those could have wide implicit blast radius.

Smoke-tested across accuracy refs, disagg configs, api_stability refs,
input fixtures, and a mixed py+yaml PR. Re-ran 60-PR dry-run; same
distribution (none of the historical PRs touched these data files in
the sampled window).

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…hors

Each top-level YAML key in `accuracy/references/<dataset>.yaml` is a
HF model name. AST-scan `accuracy/test_*.py` for class-level
`MODEL_NAME = "<hf>"` literals to build {model_name → [Class anchor]}
(cached). A diff line under `meta-llama/Llama-3.1-8B-Instruct:` narrows
to `accuracy/test_*.py::TestLlama3_1_8BInstruct` only.

Smoke: single-model diff on mmlu.yaml → 21 blocks/45 stages instead of
the dir walk-up's 48 blocks/90 stages (>50% reduction).

Models in the YAML with no matching test class (aliases, new entries
without test code yet) fall back to the dir-level anchor — same as
before. No diff / unreadable file → same fallback.

Map covers 110 distinct MODEL_NAMEs across 9 accuracy/test_*.py files.
For mmlu.yaml's 65 model keys, 59 (90%) match a test class.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ts / microbenchmarks

Add three subtrees no L0 pipeline (pre-merge or post-merge) consumes:

- tests/integration/test_lists/dev/      — dev artifacts, no L0 consumer
- tests/integration/defs/.test_durations — pytest-split timing cache
- tests/microbenchmarks/                  — benchmarking scripts only

Verified consumption: dev/ has no Jenkins or test-db reference;
.test_durations is a pytest cache file; microbenchmarks/ has zero
references in jenkins/ and only a single internal cross-link from
allreduce_perf/ — no L0 stage runs it.

500-commit dry-run effective narrow rate: 97.6% (161/165), up from 95.8%
(158/165). Three previously-fallback PRs (12978, 12887, 12886) now
correctly resolve to noop.

tests/scripts/perf-sanity/aggregated/*.yaml is intentionally left as
fallback: those configs are loaded by perf/test_perf_sanity.py at
runtime in both pre-merge and post-merge PerfSanity stages, so noop
would be wrong. Refining them to per-config narrow is a future-work
item.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
A standalone `test_*.py` whose path is out-of-namespace
(`git_path_to_yaml_key` returns None) is one no L0 YAML references —
i.e. no L0 stage runs it. Pytest doesn't auto-import test files into
other tests, so editing such a file can't change what L0 selects.
Claim it as noop instead of falling back.

The existing fallback path is preserved for non-test out-of-namespace
files (conftest, __init__, helpers, data fixtures) — those can still
impact selection via implicit pytest discovery / sys.path imports.

500-commit dry-run effective narrow rate: 98.2% (162/165), up from
97.6%. PR-12625 (which deleted top-level tests/integration/defs/
test_mlpf_results.py and test_sanity.py — neither in any YAML) now
narrows correctly via the 20 in-namespace test files it also touched.

Reword the partial-narrow miss-note to "N path(s) not in any L0 YAML"
since `no_match` now covers both in-namespace files with no covering
entry and standalone out-of-namespace test files.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ical commits

Promotes the one-shot /tmp/cbts_dryrun.py into a permanent debug CLI
under jenkins/scripts/cbts/tools/. Snapshots test-db YAMLs and
L0_Test.groovy at each commit's own SHA so results reflect the CBTS
decision at the time the PR landed. Supports tests-only / all
filters, --limit, --post-merge, and explicit --sha replay; writes
per-PR summary.txt + narrowed cbts_test_db YAMLs + a top-level
INDEX.md.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
@crazydemo crazydemo requested review from a team as code owners May 8, 2026 09:45
@crazydemo crazydemo requested review from dpitman-nvda and niukuo May 8, 2026 09:45
@crazydemo crazydemo requested a review from QiJune May 8, 2026 09:45
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR expands CBTS from a single-rule waive-matching system to a multi-rule architecture. It adds path-based YAML matching in YAMLIndex, introduces three new rules (TestsDefRule for test file changes, TestListRule for test-db YAML additions, OutOfScopeRule for exclusions), refactors WaivesRule to use shared helpers, updates scope-combination logic to merge testsonly-family scopes, and includes a dry-run tool for testing against historical commits.

Changes

CBTS Multi-Rule System

Layer / File(s) Summary
Architecture & Documentation
jenkins/scripts/cbts/README.md, jenkins/scripts/cbts/rules/README.md
READMEs expanded to document four-rule system, scope combinations, YAML lookup algorithms including path-based matching, trigger-mode filtering, and guidelines for adding rules.
YAMLIndex Path Matching
jenkins/scripts/cbts/blocks.py
Added git-path-to-YAML-namespace translation (git_path_to_yaml_key), path-based block matching (find_match_for_path), and support data (_yaml_first_components, _path_lookup_anchor).
Shared Helper Utilities
jenkins/scripts/cbts/rules/_helpers.py
New module providing diff parsing (iter_diff_changes, iter_diff_post_line_numbers), stage grouping (stages_by_yaml_stem), and block-filter resolution (lookup_ids_into_block_filters, lookup_paths_into_block_filters, resolve_affected_stages).
Rule Implementations
jenkins/scripts/cbts/rules/out_of_scope_rule.py, jenkins/scripts/cbts/rules/test_list_rule.py, jenkins/scripts/cbts/rules/tests_def_rule.py, jenkins/scripts/cbts/rules/waives_rule.py
OutOfScopeRule identifies out-of-scope paths; TestListRule narrows stages from test-db YAML additions; TestsDefRule maps test file changes to affected blocks via AST and path matching; WaivesRule refactored to use shared helpers.
Rules Orchestration
jenkins/scripts/cbts/main.py
RULE_CLASSES extended to register all four rules; build_rules() now accepts repo_root for TestsDefRule; _combine_scopes() merges waiveonly/testdefonly/testlistonly into testsonly; selector respects noop scope.
Dry-Run Testing Tool
jenkins/scripts/cbts/tools/dryrun.py
New executable script replaying CBTS over historical git commits, generating per-commit output directories and an INDEX.md summary with scope distribution.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing new change-based testing rules focused on test file modifications to improve CI test selection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 14 UNAVAILABLE: read ECONNRESET

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/scripts/cbts/rules/test_list_rule.py`:
- Around line 40-45: The _ENTRY_RE currently treats any indented list item as a
test entry; restrict matching so only list items that are children of a tests:
field are captured (or alternatively parse indentation context) by checking for
a preceding "tests:" key and matching indentation level before accepting "-
<pytest_id>" — update _ENTRY_RE and the related logic (also applied in the code
around lines 72-84) to require the same indentation as the tests: block or to
detect the nearest parent key == "tests" before classifying entries, so
wildcard/non-tests lists (e.g. "- \"*h100*\"") are not misclassified as test
IDs.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py`:
- Around line 143-146: The try/except that reads the changed test asset with
(self._repo_root / git_path).read_text(encoding="utf-8") only catches OSError
but not decode failures; update the handler to also catch UnicodeDecodeError (or
UnicodeError) so binary/non-UTF8 fixtures fall back to returning [yaml_path]
instead of crashing, and make the same change for the second occurrence that
mirrors this logic later in the file (the other read_text block around the
second occurrence).
- Around line 167-181: The current per-model refinement is unsafe because
iter_diff_post_line_numbers can misattribute deleted top-level model sections;
fix by detecting deletions in the raw diff and falling back to the full file
when any deletion could affect top-level keys: in the block using diff, before
calling iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff
string for removal lines (e.g. any line starting with '-' in the unified diff
hunks) and if found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.
- Around line 218-247: In apply(), markdown docs under test YAML trees are being
processed into actionable scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.

In `@jenkins/scripts/cbts/tools/dryrun.py`:
- Around line 314-323: The CLI currently allows invalid numeric values for
--window and --limit; update the ap.add_argument calls for "--window" and
"--limit" to validate at parse time by using custom argparse types (or a
validator) that raise argparse.ArgumentTypeError for invalid inputs: implement a
positive_int(value) used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.
- Around line 252-255: The loop over pr_dir.iterdir() uses old.unlink() which
raises on directories and causes crashes when --keep-stale leaves
subdirectories; update the cleanup in dryrun.py to detect whether old is a
file/symlink or a directory (using old.is_dir()/old.is_file()/old.is_symlink())
and call the appropriate remover (unlink for files/symlinks, shutil.rmtree for
directories) or otherwise skip non-removable entries so per-label cleanup in the
for loop does not raise; adjust imports if needed and keep the rest of the logic
(the pr_dir.iterdir() loop and subsequent shared_out assignment) unchanged.
- Around line 100-113: The snapshot helpers (_snapshot_test_db and
_snapshot_groovy) currently call _git with check=False and unconditionally write
.stdout, which can create empty/missing snapshot files; change them to fail fast
by checking the _git result (use check=True or validate result.returncode and
result.stdout) and raise/abort if the git show/ls-tree calls failed or returned
empty content for any expected file; specifically validate the outputs from
_git(repo, "ls-tree", ..., sha, "--", TEST_DB_REL) and each _git(repo, "show",
f"{sha}:{f}") in _snapshot_test_db, and validate the _git(repo, "show",
f"{sha}:{GROOVY_REL}") call in _snapshot_groovy, and only write files when the
calls succeeded and stdout is non-empty, otherwise raise an error so snapshot
creation fails fast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 622d9a53-626c-4e14-892b-a9e0af1b6311

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and 2df702c.

📒 Files selected for processing (10)
  • jenkins/scripts/cbts/README.md
  • jenkins/scripts/cbts/blocks.py
  • jenkins/scripts/cbts/main.py
  • jenkins/scripts/cbts/rules/README.md
  • jenkins/scripts/cbts/rules/_helpers.py
  • jenkins/scripts/cbts/rules/out_of_scope_rule.py
  • jenkins/scripts/cbts/rules/test_list_rule.py
  • jenkins/scripts/cbts/rules/tests_def_rule.py
  • jenkins/scripts/cbts/rules/waives_rule.py
  • jenkins/scripts/cbts/tools/dryrun.py

Comment on lines +40 to +45
# `<2+ space indent>- <pytest_id>`. Top-level `- condition:` (column-0
# block-list item) intentionally does not match — the leading `\s+`
# discriminates entry-within-block from block-boundary edits.
_ENTRY_RE = re.compile(r"^\s+-\s+(\S.*)$")
_COMMENT_RE = re.compile(r"^\s*#")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don’t classify every indented YAML list item as a test entry.

_ENTRY_RE also matches list items under non-tests: fields, e.g. a changed wildcard list item like - "*h100*". Those edits then get parsed as added/removed test IDs instead of structural YAML changes, so this rule can return noop or narrow when it should force fallback because stage selection semantics changed.

Also applies to: 72-84

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/rules/test_list_rule.py` around lines 40 - 45, The
_ENTRY_RE currently treats any indented list item as a test entry; restrict
matching so only list items that are children of a tests: field are captured (or
alternatively parse indentation context) by checking for a preceding "tests:"
key and matching indentation level before accepting "- <pytest_id>" — update
_ENTRY_RE and the related logic (also applied in the code around lines 72-84) to
require the same indentation as the tests: block or to detect the nearest parent
key == "tests" before classifying entries, so wildcard/non-tests lists (e.g. "-
\"*h100*\"") are not misclassified as test IDs.

Comment on lines +143 to +146
try:
content = (self._repo_root / git_path).read_text(encoding="utf-8")
except OSError:
return [yaml_path]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle decode failures when changed test assets aren’t UTF-8 text.

Path.read_text() can raise UnicodeDecodeError for binary or non-UTF8 fixtures under tests/. Right now that bubbles out of CBTS instead of taking the intended file-level fallback, so a fixture-only PR can crash selection.

Suggested fix
-        except OSError:
+        except (OSError, UnicodeError):
             return [yaml_path]
@@
-        except OSError:
+        except (OSError, UnicodeError):
             return [yaml_path]

Also applies to: 172-175

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 143 - 146, The
try/except that reads the changed test asset with (self._repo_root /
git_path).read_text(encoding="utf-8") only catches OSError but not decode
failures; update the handler to also catch UnicodeDecodeError (or UnicodeError)
so binary/non-UTF8 fixtures fall back to returning [yaml_path] instead of
crashing, and make the same change for the second occurrence that mirrors this
logic later in the file (the other read_text block around the second
occurrence).

Comment on lines +167 to +181
if not diff:
return [yaml_path]
line_numbers = iter_diff_post_line_numbers(diff)
if not line_numbers:
return [yaml_path]
try:
content = (self._repo_root / git_path).read_text(encoding="utf-8")
except OSError:
return [yaml_path]
changed_models = _yaml_top_keys_for_lines(content, line_numbers)
if not changed_models:
return [yaml_path]
model_map = self._accuracy_model_to_classes()
anchors = sorted({a for m in changed_models for a in model_map.get(m, ())})
return anchors or [yaml_path]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-model refinement is unsafe when the diff removes reference sections.

iter_diff_post_line_numbers() anchors - lines to the next post-image line, so deleting or moving an entire top-level model section in accuracy/references/*.yaml can be attributed to the following model. That narrows to the wrong test class. The safe behavior here is to fall back to [yaml_path] whenever the refinement can’t prove the changed top-level keys from the post-image alone.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 167 - 181, The
current per-model refinement is unsafe because iter_diff_post_line_numbers can
misattribute deleted top-level model sections; fix by detecting deletions in the
raw diff and falling back to the full file when any deletion could affect
top-level keys: in the block using diff, before calling
iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff string for
removal lines (e.g. any line starting with '-' in the unified diff hunks) and if
found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.

Comment on lines +218 to +247
def apply(self, pr: PRInputs) -> Optional[RuleResult]:
candidates = [f for f in pr.changed_files if f.startswith("tests/")]
if not candidates:
return None

block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}
narrowed: set[str] = set()
handled: set[str] = set()
out_of_namespace: set[str] = set()
no_match: set[str] = set()

for git_path in candidates:
yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)
if yaml_path is None:
base = git_path.rsplit("/", 1)[-1]
if base.startswith("test_") and base.endswith(".py"):
# Standalone test_*.py not referenced by any L0 YAML —
# pytest doesn't auto-import test files into other
# tests, so this file's edits can't affect what L0
# runs. Claim as noop contribution.
handled.add(git_path)
no_match.add(git_path)
continue
# conftest / __init__ / helper / data — could impact
# selection via implicit pytest discovery; let Selector
# report it as Unhandled → fallback.
out_of_namespace.add(git_path)
continue
handled.add(git_path)
anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, ""))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip out-of-scope markdown before attempting testdef narrowing.

An in-namespace doc like tests/integration/defs/accuracy/README.md will currently be translated to accuracy/README.md, walked up to accuracy/, and produce an actionable testdefonly result. Because _combine_scopes() lets actionable scopes override noop, that defeats OutOfScopeRule and still schedules CI for markdown-only edits under YAML-covered trees.

Suggested fix
+from .out_of_scope_rule import _is_out_of_scope
+
     def apply(self, pr: PRInputs) -> Optional[RuleResult]:
         candidates = [f for f in pr.changed_files if f.startswith("tests/")]
         if not candidates:
             return None

         block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}
@@
         for git_path in candidates:
+            if _is_out_of_scope(git_path):
+                continue
             yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def apply(self, pr: PRInputs) -> Optional[RuleResult]:
candidates = [f for f in pr.changed_files if f.startswith("tests/")]
if not candidates:
return None
block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}
narrowed: set[str] = set()
handled: set[str] = set()
out_of_namespace: set[str] = set()
no_match: set[str] = set()
for git_path in candidates:
yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)
if yaml_path is None:
base = git_path.rsplit("/", 1)[-1]
if base.startswith("test_") and base.endswith(".py"):
# Standalone test_*.py not referenced by any L0 YAML —
# pytest doesn't auto-import test files into other
# tests, so this file's edits can't affect what L0
# runs. Claim as noop contribution.
handled.add(git_path)
no_match.add(git_path)
continue
# conftest / __init__ / helper / data — could impact
# selection via implicit pytest discovery; let Selector
# report it as Unhandled → fallback.
out_of_namespace.add(git_path)
continue
handled.add(git_path)
anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, ""))
def apply(self, pr: PRInputs) -> Optional[RuleResult]:
candidates = [f for f in pr.changed_files if f.startswith("tests/")]
if not candidates:
return None
block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}
narrowed: set[str] = set()
handled: set[str] = set()
out_of_namespace: set[str] = set()
no_match: set[str] = set()
for git_path in candidates:
if _is_out_of_scope(git_path):
continue
yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)
if yaml_path is None:
base = git_path.rsplit("/", 1)[-1]
if base.startswith("test_") and base.endswith(".py"):
# Standalone test_*.py not referenced by any L0 YAML —
# pytest doesn't auto-import test files into other
# tests, so this file's edits can't affect what L0
# runs. Claim as noop contribution.
handled.add(git_path)
no_match.add(git_path)
continue
# conftest / __init__ / helper / data — could impact
# selection via implicit pytest discovery; let Selector
# report it as Unhandled → fallback.
out_of_namespace.add(git_path)
continue
handled.add(git_path)
anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, ""))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 218 - 247, In
apply(), markdown docs under test YAML trees are being processed into actionable
scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.

Comment on lines +100 to +113
def _snapshot_test_db(repo: Path, sha: str, dst: Path) -> Path:
out = _git(repo, "ls-tree", "-r", "--name-only", sha, "--", TEST_DB_REL, check=False)
files = [ln for ln in out.stdout.splitlines() if ln.endswith(".yml")]
db = dst / "test-db"
db.mkdir(parents=True, exist_ok=True)
for f in files:
(db / Path(f).name).write_text(_git(repo, "show", f"{sha}:{f}", check=False).stdout)
return db


def _snapshot_groovy(repo: Path, sha: str, dst: Path) -> Path:
p = dst / "L0_Test.groovy"
p.write_text(_git(repo, "show", f"{sha}:{GROOVY_REL}", check=False).stdout)
return p
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when snapshot files are unavailable.

The current check=False + unconditional write_text(...stdout) path can silently create empty snapshots, which can produce misleading CBTS decisions instead of a clear replay failure.

Suggested fix
 def _snapshot_test_db(repo: Path, sha: str, dst: Path) -> Path:
     out = _git(repo, "ls-tree", "-r", "--name-only", sha, "--", TEST_DB_REL, check=False)
     files = [ln for ln in out.stdout.splitlines() if ln.endswith(".yml")]
     db = dst / "test-db"
     db.mkdir(parents=True, exist_ok=True)
     for f in files:
-        (db / Path(f).name).write_text(_git(repo, "show", f"{sha}:{f}", check=False).stdout)
+        shown = _git(repo, "show", f"{sha}:{f}", check=False)
+        if shown.returncode != 0:
+            raise subprocess.CalledProcessError(
+                shown.returncode, shown.args, output=shown.stdout, stderr=shown.stderr
+            )
+        (db / Path(f).name).write_text(shown.stdout)
     return db


 def _snapshot_groovy(repo: Path, sha: str, dst: Path) -> Path:
     p = dst / "L0_Test.groovy"
-    p.write_text(_git(repo, "show", f"{sha}:{GROOVY_REL}", check=False).stdout)
+    shown = _git(repo, "show", f"{sha}:{GROOVY_REL}", check=False)
+    if shown.returncode != 0:
+        raise subprocess.CalledProcessError(
+            shown.returncode, shown.args, output=shown.stdout, stderr=shown.stderr
+        )
+    p.write_text(shown.stdout)
     return p
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 100 - 113, The snapshot
helpers (_snapshot_test_db and _snapshot_groovy) currently call _git with
check=False and unconditionally write .stdout, which can create empty/missing
snapshot files; change them to fail fast by checking the _git result (use
check=True or validate result.returncode and result.stdout) and raise/abort if
the git show/ls-tree calls failed or returned empty content for any expected
file; specifically validate the outputs from _git(repo, "ls-tree", ..., sha,
"--", TEST_DB_REL) and each _git(repo, "show", f"{sha}:{f}") in
_snapshot_test_db, and validate the _git(repo, "show", f"{sha}:{GROOVY_REL}")
call in _snapshot_groovy, and only write files when the calls succeeded and
stdout is non-empty, otherwise raise an error so snapshot creation fails fast.

Comment on lines +252 to +255
for old in pr_dir.iterdir():
if old.name != "summary.txt":
old.unlink()
shared_out = repo / "cbts_test_db"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle stale directories during per-label cleanup.

old.unlink() will raise on directories. With --keep-stale, this can crash replay for a label if any subdirectory exists from prior runs.

Suggested fix
     for old in pr_dir.iterdir():
         if old.name != "summary.txt":
-            old.unlink()
+            if old.is_dir():
+                shutil.rmtree(old)
+            else:
+                old.unlink()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 252 - 255, The loop over
pr_dir.iterdir() uses old.unlink() which raises on directories and causes
crashes when --keep-stale leaves subdirectories; update the cleanup in dryrun.py
to detect whether old is a file/symlink or a directory (using
old.is_dir()/old.is_file()/old.is_symlink()) and call the appropriate remover
(unlink for files/symlinks, shutil.rmtree for directories) or otherwise skip
non-removable entries so per-label cleanup in the for loop does not raise;
adjust imports if needed and keep the rest of the logic (the pr_dir.iterdir()
loop and subsequent shared_out assignment) unchanged.

Comment on lines +314 to +323
"--window", type=int, default=500, help="commits back from --ref (default: 500)"
)
ap.add_argument(
"--filter",
choices=["tests-only", "all"],
default="tests-only",
help="commit filter (default: tests-only)",
)
ap.add_argument("--limit", type=int, default=None, help="cap commits after filter")
ap.add_argument(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate numeric CLI bounds at parse time.

--window and --limit currently accept invalid values (e.g., --window 0, negative values), which can cause runtime failures or surprising selection behavior.

Suggested fix
+def _positive_int(value: str) -> int:
+    ivalue = int(value)
+    if ivalue <= 0:
+        raise argparse.ArgumentTypeError("must be > 0")
+    return ivalue
+
+
+def _non_negative_int(value: str) -> int:
+    ivalue = int(value)
+    if ivalue < 0:
+        raise argparse.ArgumentTypeError("must be >= 0")
+    return ivalue
+
+
 def main(argv: Optional[list[str]] = None) -> int:
@@
-    ap.add_argument(
-        "--window", type=int, default=500, help="commits back from --ref (default: 500)"
-    )
+    ap.add_argument(
+        "--window", type=_positive_int, default=500, help="commits back from --ref (default: 500)"
+    )
@@
-    ap.add_argument("--limit", type=int, default=None, help="cap commits after filter")
+    ap.add_argument("--limit", type=_non_negative_int, default=None, help="cap commits after filter")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 314 - 323, The CLI
currently allows invalid numeric values for --window and --limit; update the
ap.add_argument calls for "--window" and "--limit" to validate at parse time by
using custom argparse types (or a validator) that raise
argparse.ArgumentTypeError for invalid inputs: implement a positive_int(value)
used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant