[TRTLLM-12152][infra] change based testing rules on tests by crazydemo · Pull Request #13899 · NVIDIA/TensorRT-LLM

crazydemo · 2026-05-08T09:45:42Z

Summary by CodeRabbit

New Features
- Added three new change-based test selection rules: test definition matching, test list validation, and out-of-scope detection.
- Introduced dry-run tool to test CBTS behavior across historical commits.
- Enhanced test matching to support both test IDs and file path lookups.
Documentation
- Expanded rule system documentation with detailed scope definitions and fallback behaviors.
- Added comprehensive rule architecture guide with trigger patterns and stage resolution logic.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Three rules cover the change surfaces that map to test selection: WaivesRule (existing), TestsDefRule, TestListRule. A PR mixing them combines to scope=testsonly via _TESTSONLY_FAMILY in _combine_scopes, so test-def + test-list + waive edits in one PR still narrow. blocks.py: - YAMLIndex._yaml_first_components: top-level path components from every entry's canonical target. Drives namespace boundary discovery. - YAMLIndex.git_path_to_yaml_key(git_path): translate repo-relative path to YAML namespace by walking from the first matching component. Returns None when no component matches (file outside any YAML tree). - YAMLIndex.find_match_for_path(path): bidirectional lineage match in the pytest tree. Per block, prefix is the most-ancestral covering target (clamped at the anchor when only descendants match). - _path_lookup_anchor: conftest.py / __init__.py anchor on enclosing dir; other files anchor on themselves. rules/_helpers.py (new): pure-function pipeline shared by rules: - stages_by_yaml_stem - iter_diff_changes / iter_diff_post_line_numbers - lookup_ids_into_block_filters (waives + testlist path) - lookup_paths_into_block_filters (testdef path; registers each covering entry's raw text so it passes its own -k guard) - resolve_affected_stages rules/waives_rule.py: refactored to use _helpers; behavior unchanged. parse_waives_diff inlined (no external importers). rules/tests_def_rule.py (new): narrows on .py changes under tests/. - AST scope mapping: function-level (file::TestC::test_m), class-level (file::TestC, when change is in setup_method etc.), or module-level -> file fallback. Reads post-PR file content for AST. - Blast-radius cap (0.8) so changes lighting up most blocks fall back rather than narrow to "everything". - sanity_relevant from l0_sanity_check membership; perfsanity_relevant from l0_perf / *perf_sanity* stems. rules/test_list_rule.py (new): narrows on test-db YAML changes. - Only ADDED entries drive narrow. Removed entries don't trigger verification. Comments / blank lines ignored. Any other non-entry +/- line (condition / mako / whole-block) -> fallback. main.py: register both new rules, plumb repo_root to TestsDefRule via build_rules; add _TESTSONLY_FAMILY combine to _combine_scopes. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Some PR file paths (currently: tests/integration/test_lists/qa/**) are consumed only by post-merge / nightly QA workflows, not by pre-merge CI. Previously such PRs hit Selector's "Unhandled files" path and fell back to running the full pre-merge baseline, wasting CI on changes the pipeline doesn't read. OutOfScopeRule (rules/out_of_scope_rule.py): claims files matching OUT_OF_SCOPE_PREFIXES, emits scope="noop" with empty stages and sanity_relevant=False / perfsanity_relevant=False. main.py: - Register OutOfScopeRule. - _combine_scopes: scope="noop" gives way to any other actionable rule that fired in the same PR; "noop" is returned only when it's the sole scope. - Selector.run: skip the empty-stages safety net when scope=="noop" so the empty stage set passes through to Groovy Layer 2 instead of promoting to scope=None / fallback. Groovy already produces a no-op parallelJobsFiltered for cbts != null with empty affected_stages. Effect: a PR that only touches tests/integration/test_lists/qa/** narrows to 0 test stages (Build still runs upstream of CBTS Layer 2). Mixed PRs combine normally -- noop drops out and the actionable scope drives narrowing. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…odules Two related blocks.py fixes that recover narrow on PRs previously falling back to baseline. 1. _yaml_first_components parser missed top-level YAML entries (e.g. test_e2e.py::test_X). The old code took canonical.split("/", 1)[0] which returns "test_e2e.py::test_X" for a top-level entry, then rejected it because "::" is in the head. Fix: split by "/" then by "::" then by "[" to extract the file/dir component cleanly. Adds test_e2e.py / test_doc.py / test_fmha.py to the namespace boundary set. 2. _path_lookup_anchor previously special-cased only conftest.py and __init__.py to anchor on enclosing dir. Generalized to: any file whose basename does not start with `test_` anchors on its enclosing dir. Captures helper modules like disagg_test_utils.py, pytorch_model_config.py, perf_regression_utils.py, etc. — they're typically imported by tests in the same dir, and the blast-radius cap (0.8) protects against helpers covering most blocks. Dry-run on last 200 upstream/main commits (60 tests-only PRs) under /bot run mode: fallback count drops 12 -> 8, gaining narrow on: - pr-13482 (test_e2e.py): None -> testdefonly, 39 stages - pr-12796 (perf/pytorch_model_config.py): None -> testdefonly, 19 stages - pr-13367 (perf/perf_regression_utils.py): None -> testdefonly, 8 stages - pr-13496 (perf/open_search_db_utils.py): None -> testsonly, 8 stages Remaining fallbacks are intentional: top-level conftest blast radius, waivesonly unmatchable safety net, or test methods not yet enrolled in any YAML test list. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Previously any waive id that found no covering YAML entry caused WaivesRule to return scope=None, vetoing the whole PR's narrow even when other rules (testdef / testlist) had a clean narrow contribution. Unmatchable waives are pre-merge no-ops (the test isn't in any pre-merge YAML test list, so adding/removing its SKIP doesn't affect what runs), so vetoing is over-conservative. Drop the early scope=None branch on misses. Continue with the matched ids' narrow (lookup_ids_into_block_filters already excludes misses from block_filters), and append a note to the reason string for visibility. Pure-waivesonly all-miss PRs still fall back via Selector's empty-stages safety net (no other rule contributes -> nothing to run). Mixed PRs (e.g. waives.txt removal + helper module change) now combine correctly to testsonly. Effect on the 60-PR dry-run: pr-13634 (disagg_test_utils.py + unmatchable waive removals) goes from None -> testsonly, 21 stages. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…tion Two empty-narrow paths in WaivesRule were emitting scope="waiveonly" with empty stages, tripping Selector's empty-stages safety net and falling back to baseline: - diff has no parseable test ids (whitespace / comment-only edits) - all waive ids are unmatchable (test not in any pre-merge YAML) Both are pre-merge no-ops semantically: they don't add or remove any test from the run set. Emit scope="noop" (mirroring OutOfScopeRule), so the Selector exception for noop scope passes through to Groovy as a true no-op and mixed PRs combine cleanly via _combine_scopes drop- ping noop in favor of any actionable rule that fired. Effect on the 60-PR dry-run: pure-waivesonly all-miss PRs (pr-13254, pr-13363, pr-13437, pr-13504, pr-13508) move from None -> noop. Mixed pr-13634 (helper + unmatchable waive) goes testsonly -> testdefonly (waives is now noop, only testdef contributes narrow). Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…tribution Apply the principle that rule firing depends only on file-pattern match, not on YAML lookup outcome: testdef: out-of-namespace files (e.g. tests/unittest/...) and in-namespace files with no covering YAML entry are now claimed as scope=noop instead of being left unhandled (which forced fallback). Partial-narrow runs note the no-narrow path count in the reason. testlist: "no additions" and "all-miss" paths now return scope=noop (previously testlistonly with empty stages and scope=None respectively). Partial-miss runs note unresolved entries in the reason, mirroring waivesonly's miss_note pattern. This unblocks PRs whose only "unhandled" files are integration test modules that don't yet appear in any YAML test list (e.g. test_fp8 when only test_nvfp4 variants are listed) — they now narrow via class-level lineage instead of falling back to full CI. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

… noop Out-of-namespace .py files (where git_path_to_yaml_key returns None, e.g. tests/integration/defs/conftest.py — the top-level integration conftest auto-imported by every integration test) can have wide blast radius via implicit pytest discovery. Treating them as noop silently hides this risk. Restore the asymmetric handling: - in-namespace + empty lookup: rule decided no impact → scope=noop - out-of-namespace: rule cannot decide → unhandled → fallback (None) Reason text now distinguishes "in-namespace path(s) with no covering YAML entry" from out-of-namespace paths. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Top-level: replace stale "v0 scope" section with current 4-rule map, scope catalog (waiveonly / testdefonly / testlistonly / testsonly / noop / null), and trigger-mode filter section. Update INPUT_JSON and decision JSON examples to reflect post_merge field, sanity/ perfsanity flags, and noop scope. rules/README: full per-rule writeup (WaivesRule, TestsDefRule, TestListRule, OutOfScopeRule) covering each rule's diff handling, lookup, and outcome matrix. Add helpers table for _helpers.py. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Markdown files under tests/ have no test-execution impact; claim them as scope=noop so PRs that only touch docs (or mix docs with a code change) don't fall back over the .md files. Adds OUT_OF_SCOPE_TESTS_SUFFIXES alongside the existing OUT_OF_SCOPE_PREFIXES; both feed _is_out_of_scope. Updates rule docs in the top-level and rules/ READMEs. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Generalize TestsDefRule to handle any path under tests/ in YAML namespace — not just .py files. find_match_for_path walks up enclosing directories to the narrowest YAML-covered ancestor, so: - accuracy/references/*.yaml → anchor accuracy/ - disaggregated/test_configs/*.yaml → anchor disaggregated/ - unittest/api_stability/references/*.yaml → anchor unittest/api_stability/ - conftest.py / __init__.py / helpers (existing) For test_*.py files, file-level anchor is preserved (no walk-up) so new test files in already-covered dirs still get the noop semantic when no YAML entry covers them. Out-of-namespace paths (top-level integration conftest, dirs no YAML references like tests/integration/test_input_files/) still fall back — those could have wide implicit blast radius. Smoke-tested across accuracy refs, disagg configs, api_stability refs, input fixtures, and a mixed py+yaml PR. Re-ran 60-PR dry-run; same distribution (none of the historical PRs touched these data files in the sampled window). Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…hors Each top-level YAML key in `accuracy/references/<dataset>.yaml` is a HF model name. AST-scan `accuracy/test_*.py` for class-level `MODEL_NAME = "<hf>"` literals to build {model_name → [Class anchor]} (cached). A diff line under `meta-llama/Llama-3.1-8B-Instruct:` narrows to `accuracy/test_*.py::TestLlama3_1_8BInstruct` only. Smoke: single-model diff on mmlu.yaml → 21 blocks/45 stages instead of the dir walk-up's 48 blocks/90 stages (>50% reduction). Models in the YAML with no matching test class (aliases, new entries without test code yet) fall back to the dir-level anchor — same as before. No diff / unreadable file → same fallback. Map covers 110 distinct MODEL_NAMEs across 9 accuracy/test_*.py files. For mmlu.yaml's 65 model keys, 59 (90%) match a test class. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…ts / microbenchmarks Add three subtrees no L0 pipeline (pre-merge or post-merge) consumes: - tests/integration/test_lists/dev/ — dev artifacts, no L0 consumer - tests/integration/defs/.test_durations — pytest-split timing cache - tests/microbenchmarks/ — benchmarking scripts only Verified consumption: dev/ has no Jenkins or test-db reference; .test_durations is a pytest cache file; microbenchmarks/ has zero references in jenkins/ and only a single internal cross-link from allreduce_perf/ — no L0 stage runs it. 500-commit dry-run effective narrow rate: 97.6% (161/165), up from 95.8% (158/165). Three previously-fallback PRs (12978, 12887, 12886) now correctly resolve to noop. tests/scripts/perf-sanity/aggregated/*.yaml is intentionally left as fallback: those configs are loaded by perf/test_perf_sanity.py at runtime in both pre-merge and post-merge PerfSanity stages, so noop would be wrong. Refining them to per-config narrow is a future-work item. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

A standalone `test_*.py` whose path is out-of-namespace (`git_path_to_yaml_key` returns None) is one no L0 YAML references — i.e. no L0 stage runs it. Pytest doesn't auto-import test files into other tests, so editing such a file can't change what L0 selects. Claim it as noop instead of falling back. The existing fallback path is preserved for non-test out-of-namespace files (conftest, __init__, helpers, data fixtures) — those can still impact selection via implicit pytest discovery / sys.path imports. 500-commit dry-run effective narrow rate: 98.2% (162/165), up from 97.6%. PR-12625 (which deleted top-level tests/integration/defs/ test_mlpf_results.py and test_sanity.py — neither in any YAML) now narrows correctly via the 20 in-namespace test files it also touched. Reword the partial-narrow miss-note to "N path(s) not in any L0 YAML" since `no_match` now covers both in-namespace files with no covering entry and standalone out-of-namespace test files. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…ical commits Promotes the one-shot /tmp/cbts_dryrun.py into a permanent debug CLI under jenkins/scripts/cbts/tools/. Snapshots test-db YAMLs and L0_Test.groovy at each commit's own SHA so results reflect the CBTS decision at the time the PR landed. Supports tests-only / all filters, --limit, --post-merge, and explicit --sha replay; writes per-PR summary.txt + narrowed cbts_test_db YAMLs + a top-level INDEX.md. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

coderabbitai · 2026-05-08T09:53:18Z

📝 Walkthrough

Walkthrough

This PR expands CBTS from a single-rule waive-matching system to a multi-rule architecture. It adds path-based YAML matching in YAMLIndex, introduces three new rules (TestsDefRule for test file changes, TestListRule for test-db YAML additions, OutOfScopeRule for exclusions), refactors WaivesRule to use shared helpers, updates scope-combination logic to merge testsonly-family scopes, and includes a dry-run tool for testing against historical commits.

Changes

CBTS Multi-Rule System

Layer / File(s)	Summary
Architecture & Documentation `jenkins/scripts/cbts/README.md`, `jenkins/scripts/cbts/rules/README.md`	READMEs expanded to document four-rule system, scope combinations, YAML lookup algorithms including path-based matching, trigger-mode filtering, and guidelines for adding rules.
YAMLIndex Path Matching `jenkins/scripts/cbts/blocks.py`	Added git-path-to-YAML-namespace translation (`git_path_to_yaml_key`), path-based block matching (`find_match_for_path`), and support data (`_yaml_first_components`, `_path_lookup_anchor`).
Shared Helper Utilities `jenkins/scripts/cbts/rules/_helpers.py`	New module providing diff parsing (`iter_diff_changes`, `iter_diff_post_line_numbers`), stage grouping (`stages_by_yaml_stem`), and block-filter resolution (`lookup_ids_into_block_filters`, `lookup_paths_into_block_filters`, `resolve_affected_stages`).
Rule Implementations `jenkins/scripts/cbts/rules/out_of_scope_rule.py`, `jenkins/scripts/cbts/rules/test_list_rule.py`, `jenkins/scripts/cbts/rules/tests_def_rule.py`, `jenkins/scripts/cbts/rules/waives_rule.py`	OutOfScopeRule identifies out-of-scope paths; TestListRule narrows stages from test-db YAML additions; TestsDefRule maps test file changes to affected blocks via AST and path matching; WaivesRule refactored to use shared helpers.
Rules Orchestration `jenkins/scripts/cbts/main.py`	RULE_CLASSES extended to register all four rules; `build_rules()` now accepts `repo_root` for TestsDefRule; `_combine_scopes()` merges waiveonly/testdefonly/testlistonly into testsonly; selector respects noop scope.
Dry-Run Testing Tool `jenkins/scripts/cbts/tools/dryrun.py`	New executable script replaying CBTS over historical git commits, generating per-commit output directories and an INDEX.md summary with scope distribution.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: introducing new change-based testing rules focused on test file modifications to improve CI test selection.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 14 UNAVAILABLE: read ECONNRESET

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/scripts/cbts/rules/test_list_rule.py`:
- Around line 40-45: The _ENTRY_RE currently treats any indented list item as a
test entry; restrict matching so only list items that are children of a tests:
field are captured (or alternatively parse indentation context) by checking for
a preceding "tests:" key and matching indentation level before accepting "-
<pytest_id>" — update _ENTRY_RE and the related logic (also applied in the code
around lines 72-84) to require the same indentation as the tests: block or to
detect the nearest parent key == "tests" before classifying entries, so
wildcard/non-tests lists (e.g. "- \"*h100*\"") are not misclassified as test
IDs.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py`:
- Around line 143-146: The try/except that reads the changed test asset with
(self._repo_root / git_path).read_text(encoding="utf-8") only catches OSError
but not decode failures; update the handler to also catch UnicodeDecodeError (or
UnicodeError) so binary/non-UTF8 fixtures fall back to returning [yaml_path]
instead of crashing, and make the same change for the second occurrence that
mirrors this logic later in the file (the other read_text block around the
second occurrence).
- Around line 167-181: The current per-model refinement is unsafe because
iter_diff_post_line_numbers can misattribute deleted top-level model sections;
fix by detecting deletions in the raw diff and falling back to the full file
when any deletion could affect top-level keys: in the block using diff, before
calling iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff
string for removal lines (e.g. any line starting with '-' in the unified diff
hunks) and if found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.
- Around line 218-247: In apply(), markdown docs under test YAML trees are being
processed into actionable scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.

In `@jenkins/scripts/cbts/tools/dryrun.py`:
- Around line 314-323: The CLI currently allows invalid numeric values for
--window and --limit; update the ap.add_argument calls for "--window" and
"--limit" to validate at parse time by using custom argparse types (or a
validator) that raise argparse.ArgumentTypeError for invalid inputs: implement a
positive_int(value) used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.
- Around line 252-255: The loop over pr_dir.iterdir() uses old.unlink() which
raises on directories and causes crashes when --keep-stale leaves
subdirectories; update the cleanup in dryrun.py to detect whether old is a
file/symlink or a directory (using old.is_dir()/old.is_file()/old.is_symlink())
and call the appropriate remover (unlink for files/symlinks, shutil.rmtree for
directories) or otherwise skip non-removable entries so per-label cleanup in the
for loop does not raise; adjust imports if needed and keep the rest of the logic
(the pr_dir.iterdir() loop and subsequent shared_out assignment) unchanged.
- Around line 100-113: The snapshot helpers (_snapshot_test_db and
_snapshot_groovy) currently call _git with check=False and unconditionally write
.stdout, which can create empty/missing snapshot files; change them to fail fast
by checking the _git result (use check=True or validate result.returncode and
result.stdout) and raise/abort if the git show/ls-tree calls failed or returned
empty content for any expected file; specifically validate the outputs from
_git(repo, "ls-tree", ..., sha, "--", TEST_DB_REL) and each _git(repo, "show",
f"{sha}:{f}") in _snapshot_test_db, and validate the _git(repo, "show",
f"{sha}:{GROOVY_REL}") call in _snapshot_groovy, and only write files when the
calls succeeded and stdout is non-empty, otherwise raise an error so snapshot
creation fails fast.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 622d9a53-626c-4e14-892b-a9e0af1b6311

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and 2df702c.

📒 Files selected for processing (10)

jenkins/scripts/cbts/README.md
jenkins/scripts/cbts/blocks.py
jenkins/scripts/cbts/main.py
jenkins/scripts/cbts/rules/README.md
jenkins/scripts/cbts/rules/_helpers.py
jenkins/scripts/cbts/rules/out_of_scope_rule.py
jenkins/scripts/cbts/rules/test_list_rule.py
jenkins/scripts/cbts/rules/tests_def_rule.py
jenkins/scripts/cbts/rules/waives_rule.py
jenkins/scripts/cbts/tools/dryrun.py

coderabbitai · 2026-05-08T09:53:24Z

+# `<2+ space indent>- <pytest_id>`. Top-level `- condition:` (column-0
+# block-list item) intentionally does not match — the leading `\s+`
+# discriminates entry-within-block from block-boundary edits.
+_ENTRY_RE = re.compile(r"^\s+-\s+(\S.*)$")
+_COMMENT_RE = re.compile(r"^\s*#")
+


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don’t classify every indented YAML list item as a test entry.

_ENTRY_RE also matches list items under non-tests: fields, e.g. a changed wildcard list item like - "*h100*". Those edits then get parsed as added/removed test IDs instead of structural YAML changes, so this rule can return noop or narrow when it should force fallback because stage selection semantics changed.

Also applies to: 72-84

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/rules/test_list_rule.py` around lines 40 - 45, The _ENTRY_RE currently treats any indented list item as a test entry; restrict matching so only list items that are children of a tests: field are captured (or alternatively parse indentation context) by checking for a preceding "tests:" key and matching indentation level before accepting "- <pytest_id>" — update _ENTRY_RE and the related logic (also applied in the code around lines 72-84) to require the same indentation as the tests: block or to detect the nearest parent key == "tests" before classifying entries, so wildcard/non-tests lists (e.g. "- \"*h100*\"") are not misclassified as test IDs.

coderabbitai · 2026-05-08T09:53:24Z

+        try:
+            content = (self._repo_root / git_path).read_text(encoding="utf-8")
+        except OSError:
+            return [yaml_path]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle decode failures when changed test assets aren’t UTF-8 text.

Path.read_text() can raise UnicodeDecodeError for binary or non-UTF8 fixtures under tests/. Right now that bubbles out of CBTS instead of taking the intended file-level fallback, so a fixture-only PR can crash selection.

Suggested fix

- except OSError: + except (OSError, UnicodeError): return [yaml_path] @@ - except OSError: + except (OSError, UnicodeError): return [yaml_path]

Also applies to: 172-175

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 143 - 146, The try/except that reads the changed test asset with (self._repo_root / git_path).read_text(encoding="utf-8") only catches OSError but not decode failures; update the handler to also catch UnicodeDecodeError (or UnicodeError) so binary/non-UTF8 fixtures fall back to returning [yaml_path] instead of crashing, and make the same change for the second occurrence that mirrors this logic later in the file (the other read_text block around the second occurrence).

coderabbitai · 2026-05-08T09:53:25Z

+        if not diff:
+            return [yaml_path]
+        line_numbers = iter_diff_post_line_numbers(diff)
+        if not line_numbers:
+            return [yaml_path]
+        try:
+            content = (self._repo_root / git_path).read_text(encoding="utf-8")
+        except OSError:
+            return [yaml_path]
+        changed_models = _yaml_top_keys_for_lines(content, line_numbers)
+        if not changed_models:
+            return [yaml_path]
+        model_map = self._accuracy_model_to_classes()
+        anchors = sorted({a for m in changed_models for a in model_map.get(m, ())})
+        return anchors or [yaml_path]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-model refinement is unsafe when the diff removes reference sections.

iter_diff_post_line_numbers() anchors - lines to the next post-image line, so deleting or moving an entire top-level model section in accuracy/references/*.yaml can be attributed to the following model. That narrows to the wrong test class. The safe behavior here is to fall back to [yaml_path] whenever the refinement can’t prove the changed top-level keys from the post-image alone.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 167 - 181, The current per-model refinement is unsafe because iter_diff_post_line_numbers can misattribute deleted top-level model sections; fix by detecting deletions in the raw diff and falling back to the full file when any deletion could affect top-level keys: in the block using diff, before calling iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff string for removal lines (e.g. any line starting with '-' in the unified diff hunks) and if found return [yaml_path]; keep the existing flow (diff -> iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when no deletion lines are present so changed_models are provable from the post-image alone.

coderabbitai · 2026-05-08T09:53:25Z

+    def apply(self, pr: PRInputs) -> Optional[RuleResult]:
+        candidates = [f for f in pr.changed_files if f.startswith("tests/")]
+        if not candidates:
+            return None
+
+        block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}
+        narrowed: set[str] = set()
+        handled: set[str] = set()
+        out_of_namespace: set[str] = set()
+        no_match: set[str] = set()
+
+        for git_path in candidates:
+            yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)
+            if yaml_path is None:
+                base = git_path.rsplit("/", 1)[-1]
+                if base.startswith("test_") and base.endswith(".py"):
+                    # Standalone test_*.py not referenced by any L0 YAML —
+                    # pytest doesn't auto-import test files into other
+                    # tests, so this file's edits can't affect what L0
+                    # runs. Claim as noop contribution.
+                    handled.add(git_path)
+                    no_match.add(git_path)
+                    continue
+                # conftest / __init__ / helper / data — could impact
+                # selection via implicit pytest discovery; let Selector
+                # report it as Unhandled → fallback.
+                out_of_namespace.add(git_path)
+                continue
+            handled.add(git_path)
+            anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, ""))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip out-of-scope markdown before attempting testdef narrowing.

An in-namespace doc like tests/integration/defs/accuracy/README.md will currently be translated to accuracy/README.md, walked up to accuracy/, and produce an actionable testdefonly result. Because _combine_scopes() lets actionable scopes override noop, that defeats OutOfScopeRule and still schedules CI for markdown-only edits under YAML-covered trees.

Suggested fix

+from .out_of_scope_rule import _is_out_of_scope + def apply(self, pr: PRInputs) -> Optional[RuleResult]: candidates = [f for f in pr.changed_files if f.startswith("tests/")] if not candidates: return None block_filters: dict[tuple[str, int], dict[str, set[str]]] = {} @@ for git_path in candidates: + if _is_out_of_scope(git_path): + continue yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def apply(self, pr: PRInputs) -> Optional[RuleResult]:

candidates = [f for f in pr.changed_files if f.startswith("tests/")]

if not candidates:

return None

block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}

narrowed: set[str] = set()

handled: set[str] = set()

out_of_namespace: set[str] = set()

no_match: set[str] = set()

for git_path in candidates:

yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)

if yaml_path is None:

base = git_path.rsplit("/", 1)[-1]

if base.startswith("test_") and base.endswith(".py"):

# Standalone test_*.py not referenced by any L0 YAML —

# pytest doesn't auto-import test files into other

# tests, so this file's edits can't affect what L0

# runs. Claim as noop contribution.

handled.add(git_path)

no_match.add(git_path)

continue

# conftest / __init__ / helper / data — could impact

# selection via implicit pytest discovery; let Selector

# report it as Unhandled → fallback.

out_of_namespace.add(git_path)

continue

handled.add(git_path)

anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, ""))

def apply(self, pr: PRInputs) -> Optional[RuleResult]:

candidates = [f for f in pr.changed_files if f.startswith("tests/")]

if not candidates:

return None

block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}

narrowed: set[str] = set()

handled: set[str] = set()

out_of_namespace: set[str] = set()

no_match: set[str] = set()

for git_path in candidates:

if _is_out_of_scope(git_path):

continue

yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)

if yaml_path is None:

base = git_path.rsplit("/", 1)[-1]

if base.startswith("test_") and base.endswith(".py"):

# Standalone test_*.py not referenced by any L0 YAML —

# pytest doesn't auto-import test files into other

# tests, so this file's edits can't affect what L0

# runs. Claim as noop contribution.

handled.add(git_path)

no_match.add(git_path)

continue

# conftest / __init__ / helper / data — could impact

# selection via implicit pytest discovery; let Selector

# report it as Unhandled → fallback.

out_of_namespace.add(git_path)

continue

handled.add(git_path)

anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, ""))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 218 - 247, In apply(), markdown docs under test YAML trees are being processed into actionable scopes; before calling yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat them as noop by adding the path to handled and no_match (or out_of_namespace depending on existing semantics) and continue, so markdown files do not produce testdefonly results; update the branch that currently handles yaml_path != None to early-skip .md files before calling _compute_anchors/_combine_scopes.

coderabbitai · 2026-05-08T09:53:25Z

+def _snapshot_test_db(repo: Path, sha: str, dst: Path) -> Path:
+    out = _git(repo, "ls-tree", "-r", "--name-only", sha, "--", TEST_DB_REL, check=False)
+    files = [ln for ln in out.stdout.splitlines() if ln.endswith(".yml")]
+    db = dst / "test-db"
+    db.mkdir(parents=True, exist_ok=True)
+    for f in files:
+        (db / Path(f).name).write_text(_git(repo, "show", f"{sha}:{f}", check=False).stdout)
+    return db
+
+
+def _snapshot_groovy(repo: Path, sha: str, dst: Path) -> Path:
+    p = dst / "L0_Test.groovy"
+    p.write_text(_git(repo, "show", f"{sha}:{GROOVY_REL}", check=False).stdout)
+    return p


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when snapshot files are unavailable.

The current check=False + unconditional write_text(...stdout) path can silently create empty snapshots, which can produce misleading CBTS decisions instead of a clear replay failure.

Suggested fix

def _snapshot_test_db(repo: Path, sha: str, dst: Path) -> Path: out = _git(repo, "ls-tree", "-r", "--name-only", sha, "--", TEST_DB_REL, check=False) files = [ln for ln in out.stdout.splitlines() if ln.endswith(".yml")] db = dst / "test-db" db.mkdir(parents=True, exist_ok=True) for f in files: - (db / Path(f).name).write_text(_git(repo, "show", f"{sha}:{f}", check=False).stdout) + shown = _git(repo, "show", f"{sha}:{f}", check=False) + if shown.returncode != 0: + raise subprocess.CalledProcessError( + shown.returncode, shown.args, output=shown.stdout, stderr=shown.stderr + ) + (db / Path(f).name).write_text(shown.stdout) return db def _snapshot_groovy(repo: Path, sha: str, dst: Path) -> Path: p = dst / "L0_Test.groovy" - p.write_text(_git(repo, "show", f"{sha}:{GROOVY_REL}", check=False).stdout) + shown = _git(repo, "show", f"{sha}:{GROOVY_REL}", check=False) + if shown.returncode != 0: + raise subprocess.CalledProcessError( + shown.returncode, shown.args, output=shown.stdout, stderr=shown.stderr + ) + p.write_text(shown.stdout) return p

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 100 - 113, The snapshot helpers (_snapshot_test_db and _snapshot_groovy) currently call _git with check=False and unconditionally write .stdout, which can create empty/missing snapshot files; change them to fail fast by checking the _git result (use check=True or validate result.returncode and result.stdout) and raise/abort if the git show/ls-tree calls failed or returned empty content for any expected file; specifically validate the outputs from _git(repo, "ls-tree", ..., sha, "--", TEST_DB_REL) and each _git(repo, "show", f"{sha}:{f}") in _snapshot_test_db, and validate the _git(repo, "show", f"{sha}:{GROOVY_REL}") call in _snapshot_groovy, and only write files when the calls succeeded and stdout is non-empty, otherwise raise an error so snapshot creation fails fast.

coderabbitai · 2026-05-08T09:53:25Z

+    for old in pr_dir.iterdir():
+        if old.name != "summary.txt":
+            old.unlink()
+    shared_out = repo / "cbts_test_db"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle stale directories during per-label cleanup.

old.unlink() will raise on directories. With --keep-stale, this can crash replay for a label if any subdirectory exists from prior runs.

Suggested fix

for old in pr_dir.iterdir(): if old.name != "summary.txt": - old.unlink() + if old.is_dir(): + shutil.rmtree(old) + else: + old.unlink()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 252 - 255, The loop over pr_dir.iterdir() uses old.unlink() which raises on directories and causes crashes when --keep-stale leaves subdirectories; update the cleanup in dryrun.py to detect whether old is a file/symlink or a directory (using old.is_dir()/old.is_file()/old.is_symlink()) and call the appropriate remover (unlink for files/symlinks, shutil.rmtree for directories) or otherwise skip non-removable entries so per-label cleanup in the for loop does not raise; adjust imports if needed and keep the rest of the logic (the pr_dir.iterdir() loop and subsequent shared_out assignment) unchanged.

coderabbitai · 2026-05-08T09:53:25Z

+        "--window", type=int, default=500, help="commits back from --ref (default: 500)"
+    )
+    ap.add_argument(
+        "--filter",
+        choices=["tests-only", "all"],
+        default="tests-only",
+        help="commit filter (default: tests-only)",
+    )
+    ap.add_argument("--limit", type=int, default=None, help="cap commits after filter")
+    ap.add_argument(


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate numeric CLI bounds at parse time.

--window and --limit currently accept invalid values (e.g., --window 0, negative values), which can cause runtime failures or surprising selection behavior.

Suggested fix

+def _positive_int(value: str) -> int: + ivalue = int(value) + if ivalue <= 0: + raise argparse.ArgumentTypeError("must be > 0") + return ivalue + + +def _non_negative_int(value: str) -> int: + ivalue = int(value) + if ivalue < 0: + raise argparse.ArgumentTypeError("must be >= 0") + return ivalue + + def main(argv: Optional[list[str]] = None) -> int: @@ - ap.add_argument( - "--window", type=int, default=500, help="commits back from --ref (default: 500)" - ) + ap.add_argument( + "--window", type=_positive_int, default=500, help="commits back from --ref (default: 500)" + ) @@ - ap.add_argument("--limit", type=int, default=None, help="cap commits after filter") + ap.add_argument("--limit", type=_non_negative_int, default=None, help="cap commits after filter")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 314 - 323, The CLI currently allows invalid numeric values for --window and --limit; update the ap.add_argument calls for "--window" and "--limit" to validate at parse time by using custom argparse types (or a validator) that raise argparse.ArgumentTypeError for invalid inputs: implement a positive_int(value) used for --window that ensures int(value) > 0, and use a positive_int_or_none(value) (or accept None via default and validate if provided) for --limit to ensure int(value) > 0 when not None; replace the current type=int usages in the ap.add_argument calls and/or add explicit argparse validation so invalid or negative values are rejected during parsing.

crazydemo added 14 commits May 8, 2026 17:39

crazydemo requested review from a team as code owners May 8, 2026 09:45

crazydemo requested review from dpitman-nvda and niukuo May 8, 2026 09:45

github-actions Bot assigned crazydemo May 8, 2026

crazydemo requested a review from QiJune May 8, 2026 09:45

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-12152][infra] change based testing rules on tests#13899

[TRTLLM-12152][infra] change based testing rules on tests#13899
crazydemo wants to merge 14 commits intoNVIDIA:mainfrom
crazydemo:cbts-v1

crazydemo commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crazydemo commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 8, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

crazydemo commented May 8, 2026 •

edited by coderabbitai Bot

Loading