[TRTLLM-12152][infra] change based testing rules on tests#13899
[TRTLLM-12152][infra] change based testing rules on tests#13899crazydemo wants to merge 14 commits intoNVIDIA:mainfrom
Conversation
Three rules cover the change surfaces that map to test selection: WaivesRule (existing), TestsDefRule, TestListRule. A PR mixing them combines to scope=testsonly via _TESTSONLY_FAMILY in _combine_scopes, so test-def + test-list + waive edits in one PR still narrow. blocks.py: - YAMLIndex._yaml_first_components: top-level path components from every entry's canonical target. Drives namespace boundary discovery. - YAMLIndex.git_path_to_yaml_key(git_path): translate repo-relative path to YAML namespace by walking from the first matching component. Returns None when no component matches (file outside any YAML tree). - YAMLIndex.find_match_for_path(path): bidirectional lineage match in the pytest tree. Per block, prefix is the most-ancestral covering target (clamped at the anchor when only descendants match). - _path_lookup_anchor: conftest.py / __init__.py anchor on enclosing dir; other files anchor on themselves. rules/_helpers.py (new): pure-function pipeline shared by rules: - stages_by_yaml_stem - iter_diff_changes / iter_diff_post_line_numbers - lookup_ids_into_block_filters (waives + testlist path) - lookup_paths_into_block_filters (testdef path; registers each covering entry's raw text so it passes its own -k guard) - resolve_affected_stages rules/waives_rule.py: refactored to use _helpers; behavior unchanged. parse_waives_diff inlined (no external importers). rules/tests_def_rule.py (new): narrows on .py changes under tests/. - AST scope mapping: function-level (file::TestC::test_m), class-level (file::TestC, when change is in setup_method etc.), or module-level -> file fallback. Reads post-PR file content for AST. - Blast-radius cap (0.8) so changes lighting up most blocks fall back rather than narrow to "everything". - sanity_relevant from l0_sanity_check membership; perfsanity_relevant from l0_perf / *perf_sanity* stems. rules/test_list_rule.py (new): narrows on test-db YAML changes. - Only ADDED entries drive narrow. Removed entries don't trigger verification. Comments / blank lines ignored. Any other non-entry +/- line (condition / mako / whole-block) -> fallback. main.py: register both new rules, plumb repo_root to TestsDefRule via build_rules; add _TESTSONLY_FAMILY combine to _combine_scopes. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Some PR file paths (currently: tests/integration/test_lists/qa/**) are consumed only by post-merge / nightly QA workflows, not by pre-merge CI. Previously such PRs hit Selector's "Unhandled files" path and fell back to running the full pre-merge baseline, wasting CI on changes the pipeline doesn't read. OutOfScopeRule (rules/out_of_scope_rule.py): claims files matching OUT_OF_SCOPE_PREFIXES, emits scope="noop" with empty stages and sanity_relevant=False / perfsanity_relevant=False. main.py: - Register OutOfScopeRule. - _combine_scopes: scope="noop" gives way to any other actionable rule that fired in the same PR; "noop" is returned only when it's the sole scope. - Selector.run: skip the empty-stages safety net when scope=="noop" so the empty stage set passes through to Groovy Layer 2 instead of promoting to scope=None / fallback. Groovy already produces a no-op parallelJobsFiltered for cbts != null with empty affected_stages. Effect: a PR that only touches tests/integration/test_lists/qa/** narrows to 0 test stages (Build still runs upstream of CBTS Layer 2). Mixed PRs combine normally -- noop drops out and the actionable scope drives narrowing. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…odules
Two related blocks.py fixes that recover narrow on PRs previously
falling back to baseline.
1. _yaml_first_components parser missed top-level YAML entries
(e.g. test_e2e.py::test_X). The old code took canonical.split("/", 1)[0]
which returns "test_e2e.py::test_X" for a top-level entry, then rejected
it because "::" is in the head. Fix: split by "/" then by "::" then by
"[" to extract the file/dir component cleanly. Adds test_e2e.py /
test_doc.py / test_fmha.py to the namespace boundary set.
2. _path_lookup_anchor previously special-cased only conftest.py and
__init__.py to anchor on enclosing dir. Generalized to: any file whose
basename does not start with `test_` anchors on its enclosing dir.
Captures helper modules like disagg_test_utils.py, pytorch_model_config.py,
perf_regression_utils.py, etc. — they're typically imported by tests in
the same dir, and the blast-radius cap (0.8) protects against helpers
covering most blocks.
Dry-run on last 200 upstream/main commits (60 tests-only PRs) under
/bot run mode: fallback count drops 12 -> 8, gaining narrow on:
- pr-13482 (test_e2e.py): None -> testdefonly, 39 stages
- pr-12796 (perf/pytorch_model_config.py): None -> testdefonly, 19 stages
- pr-13367 (perf/perf_regression_utils.py): None -> testdefonly, 8 stages
- pr-13496 (perf/open_search_db_utils.py): None -> testsonly, 8 stages
Remaining fallbacks are intentional: top-level conftest blast radius,
waivesonly unmatchable safety net, or test methods not yet enrolled in
any YAML test list.
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Previously any waive id that found no covering YAML entry caused WaivesRule to return scope=None, vetoing the whole PR's narrow even when other rules (testdef / testlist) had a clean narrow contribution. Unmatchable waives are pre-merge no-ops (the test isn't in any pre-merge YAML test list, so adding/removing its SKIP doesn't affect what runs), so vetoing is over-conservative. Drop the early scope=None branch on misses. Continue with the matched ids' narrow (lookup_ids_into_block_filters already excludes misses from block_filters), and append a note to the reason string for visibility. Pure-waivesonly all-miss PRs still fall back via Selector's empty-stages safety net (no other rule contributes -> nothing to run). Mixed PRs (e.g. waives.txt removal + helper module change) now combine correctly to testsonly. Effect on the 60-PR dry-run: pr-13634 (disagg_test_utils.py + unmatchable waive removals) goes from None -> testsonly, 21 stages. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…tion Two empty-narrow paths in WaivesRule were emitting scope="waiveonly" with empty stages, tripping Selector's empty-stages safety net and falling back to baseline: - diff has no parseable test ids (whitespace / comment-only edits) - all waive ids are unmatchable (test not in any pre-merge YAML) Both are pre-merge no-ops semantically: they don't add or remove any test from the run set. Emit scope="noop" (mirroring OutOfScopeRule), so the Selector exception for noop scope passes through to Groovy as a true no-op and mixed PRs combine cleanly via _combine_scopes drop- ping noop in favor of any actionable rule that fired. Effect on the 60-PR dry-run: pure-waivesonly all-miss PRs (pr-13254, pr-13363, pr-13437, pr-13504, pr-13508) move from None -> noop. Mixed pr-13634 (helper + unmatchable waive) goes testsonly -> testdefonly (waives is now noop, only testdef contributes narrow). Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…tribution Apply the principle that rule firing depends only on file-pattern match, not on YAML lookup outcome: testdef: out-of-namespace files (e.g. tests/unittest/...) and in-namespace files with no covering YAML entry are now claimed as scope=noop instead of being left unhandled (which forced fallback). Partial-narrow runs note the no-narrow path count in the reason. testlist: "no additions" and "all-miss" paths now return scope=noop (previously testlistonly with empty stages and scope=None respectively). Partial-miss runs note unresolved entries in the reason, mirroring waivesonly's miss_note pattern. This unblocks PRs whose only "unhandled" files are integration test modules that don't yet appear in any YAML test list (e.g. test_fp8 when only test_nvfp4 variants are listed) — they now narrow via class-level lineage instead of falling back to full CI. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
… noop Out-of-namespace .py files (where git_path_to_yaml_key returns None, e.g. tests/integration/defs/conftest.py — the top-level integration conftest auto-imported by every integration test) can have wide blast radius via implicit pytest discovery. Treating them as noop silently hides this risk. Restore the asymmetric handling: - in-namespace + empty lookup: rule decided no impact → scope=noop - out-of-namespace: rule cannot decide → unhandled → fallback (None) Reason text now distinguishes "in-namespace path(s) with no covering YAML entry" from out-of-namespace paths. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Top-level: replace stale "v0 scope" section with current 4-rule map, scope catalog (waiveonly / testdefonly / testlistonly / testsonly / noop / null), and trigger-mode filter section. Update INPUT_JSON and decision JSON examples to reflect post_merge field, sanity/ perfsanity flags, and noop scope. rules/README: full per-rule writeup (WaivesRule, TestsDefRule, TestListRule, OutOfScopeRule) covering each rule's diff handling, lookup, and outcome matrix. Add helpers table for _helpers.py. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Markdown files under tests/ have no test-execution impact; claim them as scope=noop so PRs that only touch docs (or mix docs with a code change) don't fall back over the .md files. Adds OUT_OF_SCOPE_TESTS_SUFFIXES alongside the existing OUT_OF_SCOPE_PREFIXES; both feed _is_out_of_scope. Updates rule docs in the top-level and rules/ READMEs. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Generalize TestsDefRule to handle any path under tests/ in YAML namespace — not just .py files. find_match_for_path walks up enclosing directories to the narrowest YAML-covered ancestor, so: - accuracy/references/*.yaml → anchor accuracy/ - disaggregated/test_configs/*.yaml → anchor disaggregated/ - unittest/api_stability/references/*.yaml → anchor unittest/api_stability/ - conftest.py / __init__.py / helpers (existing) For test_*.py files, file-level anchor is preserved (no walk-up) so new test files in already-covered dirs still get the noop semantic when no YAML entry covers them. Out-of-namespace paths (top-level integration conftest, dirs no YAML references like tests/integration/test_input_files/) still fall back — those could have wide implicit blast radius. Smoke-tested across accuracy refs, disagg configs, api_stability refs, input fixtures, and a mixed py+yaml PR. Re-ran 60-PR dry-run; same distribution (none of the historical PRs touched these data files in the sampled window). Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…hors
Each top-level YAML key in `accuracy/references/<dataset>.yaml` is a
HF model name. AST-scan `accuracy/test_*.py` for class-level
`MODEL_NAME = "<hf>"` literals to build {model_name → [Class anchor]}
(cached). A diff line under `meta-llama/Llama-3.1-8B-Instruct:` narrows
to `accuracy/test_*.py::TestLlama3_1_8BInstruct` only.
Smoke: single-model diff on mmlu.yaml → 21 blocks/45 stages instead of
the dir walk-up's 48 blocks/90 stages (>50% reduction).
Models in the YAML with no matching test class (aliases, new entries
without test code yet) fall back to the dir-level anchor — same as
before. No diff / unreadable file → same fallback.
Map covers 110 distinct MODEL_NAMEs across 9 accuracy/test_*.py files.
For mmlu.yaml's 65 model keys, 59 (90%) match a test class.
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ts / microbenchmarks Add three subtrees no L0 pipeline (pre-merge or post-merge) consumes: - tests/integration/test_lists/dev/ — dev artifacts, no L0 consumer - tests/integration/defs/.test_durations — pytest-split timing cache - tests/microbenchmarks/ — benchmarking scripts only Verified consumption: dev/ has no Jenkins or test-db reference; .test_durations is a pytest cache file; microbenchmarks/ has zero references in jenkins/ and only a single internal cross-link from allreduce_perf/ — no L0 stage runs it. 500-commit dry-run effective narrow rate: 97.6% (161/165), up from 95.8% (158/165). Three previously-fallback PRs (12978, 12887, 12886) now correctly resolve to noop. tests/scripts/perf-sanity/aggregated/*.yaml is intentionally left as fallback: those configs are loaded by perf/test_perf_sanity.py at runtime in both pre-merge and post-merge PerfSanity stages, so noop would be wrong. Refining them to per-config narrow is a future-work item. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
A standalone `test_*.py` whose path is out-of-namespace (`git_path_to_yaml_key` returns None) is one no L0 YAML references — i.e. no L0 stage runs it. Pytest doesn't auto-import test files into other tests, so editing such a file can't change what L0 selects. Claim it as noop instead of falling back. The existing fallback path is preserved for non-test out-of-namespace files (conftest, __init__, helpers, data fixtures) — those can still impact selection via implicit pytest discovery / sys.path imports. 500-commit dry-run effective narrow rate: 98.2% (162/165), up from 97.6%. PR-12625 (which deleted top-level tests/integration/defs/ test_mlpf_results.py and test_sanity.py — neither in any YAML) now narrows correctly via the 20 in-namespace test files it also touched. Reword the partial-narrow miss-note to "N path(s) not in any L0 YAML" since `no_match` now covers both in-namespace files with no covering entry and standalone out-of-namespace test files. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ical commits Promotes the one-shot /tmp/cbts_dryrun.py into a permanent debug CLI under jenkins/scripts/cbts/tools/. Snapshots test-db YAMLs and L0_Test.groovy at each commit's own SHA so results reflect the CBTS decision at the time the PR landed. Supports tests-only / all filters, --limit, --post-merge, and explicit --sha replay; writes per-PR summary.txt + narrowed cbts_test_db YAMLs + a top-level INDEX.md. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR expands CBTS from a single-rule waive-matching system to a multi-rule architecture. It adds path-based YAML matching in YAMLIndex, introduces three new rules (TestsDefRule for test file changes, TestListRule for test-db YAML additions, OutOfScopeRule for exclusions), refactors WaivesRule to use shared helpers, updates scope-combination logic to merge testsonly-family scopes, and includes a dry-run tool for testing against historical commits. ChangesCBTS Multi-Rule System
🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: 14 UNAVAILABLE: read ECONNRESET Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@jenkins/scripts/cbts/rules/test_list_rule.py`:
- Around line 40-45: The _ENTRY_RE currently treats any indented list item as a
test entry; restrict matching so only list items that are children of a tests:
field are captured (or alternatively parse indentation context) by checking for
a preceding "tests:" key and matching indentation level before accepting "-
<pytest_id>" — update _ENTRY_RE and the related logic (also applied in the code
around lines 72-84) to require the same indentation as the tests: block or to
detect the nearest parent key == "tests" before classifying entries, so
wildcard/non-tests lists (e.g. "- \"*h100*\"") are not misclassified as test
IDs.
In `@jenkins/scripts/cbts/rules/tests_def_rule.py`:
- Around line 143-146: The try/except that reads the changed test asset with
(self._repo_root / git_path).read_text(encoding="utf-8") only catches OSError
but not decode failures; update the handler to also catch UnicodeDecodeError (or
UnicodeError) so binary/non-UTF8 fixtures fall back to returning [yaml_path]
instead of crashing, and make the same change for the second occurrence that
mirrors this logic later in the file (the other read_text block around the
second occurrence).
- Around line 167-181: The current per-model refinement is unsafe because
iter_diff_post_line_numbers can misattribute deleted top-level model sections;
fix by detecting deletions in the raw diff and falling back to the full file
when any deletion could affect top-level keys: in the block using diff, before
calling iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff
string for removal lines (e.g. any line starting with '-' in the unified diff
hunks) and if found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.
- Around line 218-247: In apply(), markdown docs under test YAML trees are being
processed into actionable scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.
In `@jenkins/scripts/cbts/tools/dryrun.py`:
- Around line 314-323: The CLI currently allows invalid numeric values for
--window and --limit; update the ap.add_argument calls for "--window" and
"--limit" to validate at parse time by using custom argparse types (or a
validator) that raise argparse.ArgumentTypeError for invalid inputs: implement a
positive_int(value) used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.
- Around line 252-255: The loop over pr_dir.iterdir() uses old.unlink() which
raises on directories and causes crashes when --keep-stale leaves
subdirectories; update the cleanup in dryrun.py to detect whether old is a
file/symlink or a directory (using old.is_dir()/old.is_file()/old.is_symlink())
and call the appropriate remover (unlink for files/symlinks, shutil.rmtree for
directories) or otherwise skip non-removable entries so per-label cleanup in the
for loop does not raise; adjust imports if needed and keep the rest of the logic
(the pr_dir.iterdir() loop and subsequent shared_out assignment) unchanged.
- Around line 100-113: The snapshot helpers (_snapshot_test_db and
_snapshot_groovy) currently call _git with check=False and unconditionally write
.stdout, which can create empty/missing snapshot files; change them to fail fast
by checking the _git result (use check=True or validate result.returncode and
result.stdout) and raise/abort if the git show/ls-tree calls failed or returned
empty content for any expected file; specifically validate the outputs from
_git(repo, "ls-tree", ..., sha, "--", TEST_DB_REL) and each _git(repo, "show",
f"{sha}:{f}") in _snapshot_test_db, and validate the _git(repo, "show",
f"{sha}:{GROOVY_REL}") call in _snapshot_groovy, and only write files when the
calls succeeded and stdout is non-empty, otherwise raise an error so snapshot
creation fails fast.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 622d9a53-626c-4e14-892b-a9e0af1b6311
📒 Files selected for processing (10)
jenkins/scripts/cbts/README.mdjenkins/scripts/cbts/blocks.pyjenkins/scripts/cbts/main.pyjenkins/scripts/cbts/rules/README.mdjenkins/scripts/cbts/rules/_helpers.pyjenkins/scripts/cbts/rules/out_of_scope_rule.pyjenkins/scripts/cbts/rules/test_list_rule.pyjenkins/scripts/cbts/rules/tests_def_rule.pyjenkins/scripts/cbts/rules/waives_rule.pyjenkins/scripts/cbts/tools/dryrun.py
| # `<2+ space indent>- <pytest_id>`. Top-level `- condition:` (column-0 | ||
| # block-list item) intentionally does not match — the leading `\s+` | ||
| # discriminates entry-within-block from block-boundary edits. | ||
| _ENTRY_RE = re.compile(r"^\s+-\s+(\S.*)$") | ||
| _COMMENT_RE = re.compile(r"^\s*#") | ||
|
|
There was a problem hiding this comment.
Don’t classify every indented YAML list item as a test entry.
_ENTRY_RE also matches list items under non-tests: fields, e.g. a changed wildcard list item like - "*h100*". Those edits then get parsed as added/removed test IDs instead of structural YAML changes, so this rule can return noop or narrow when it should force fallback because stage selection semantics changed.
Also applies to: 72-84
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/rules/test_list_rule.py` around lines 40 - 45, The
_ENTRY_RE currently treats any indented list item as a test entry; restrict
matching so only list items that are children of a tests: field are captured (or
alternatively parse indentation context) by checking for a preceding "tests:"
key and matching indentation level before accepting "- <pytest_id>" — update
_ENTRY_RE and the related logic (also applied in the code around lines 72-84) to
require the same indentation as the tests: block or to detect the nearest parent
key == "tests" before classifying entries, so wildcard/non-tests lists (e.g. "-
\"*h100*\"") are not misclassified as test IDs.
| try: | ||
| content = (self._repo_root / git_path).read_text(encoding="utf-8") | ||
| except OSError: | ||
| return [yaml_path] |
There was a problem hiding this comment.
Handle decode failures when changed test assets aren’t UTF-8 text.
Path.read_text() can raise UnicodeDecodeError for binary or non-UTF8 fixtures under tests/. Right now that bubbles out of CBTS instead of taking the intended file-level fallback, so a fixture-only PR can crash selection.
Suggested fix
- except OSError:
+ except (OSError, UnicodeError):
return [yaml_path]
@@
- except OSError:
+ except (OSError, UnicodeError):
return [yaml_path]Also applies to: 172-175
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 143 - 146, The
try/except that reads the changed test asset with (self._repo_root /
git_path).read_text(encoding="utf-8") only catches OSError but not decode
failures; update the handler to also catch UnicodeDecodeError (or UnicodeError)
so binary/non-UTF8 fixtures fall back to returning [yaml_path] instead of
crashing, and make the same change for the second occurrence that mirrors this
logic later in the file (the other read_text block around the second
occurrence).
| if not diff: | ||
| return [yaml_path] | ||
| line_numbers = iter_diff_post_line_numbers(diff) | ||
| if not line_numbers: | ||
| return [yaml_path] | ||
| try: | ||
| content = (self._repo_root / git_path).read_text(encoding="utf-8") | ||
| except OSError: | ||
| return [yaml_path] | ||
| changed_models = _yaml_top_keys_for_lines(content, line_numbers) | ||
| if not changed_models: | ||
| return [yaml_path] | ||
| model_map = self._accuracy_model_to_classes() | ||
| anchors = sorted({a for m in changed_models for a in model_map.get(m, ())}) | ||
| return anchors or [yaml_path] |
There was a problem hiding this comment.
Per-model refinement is unsafe when the diff removes reference sections.
iter_diff_post_line_numbers() anchors - lines to the next post-image line, so deleting or moving an entire top-level model section in accuracy/references/*.yaml can be attributed to the following model. That narrows to the wrong test class. The safe behavior here is to fall back to [yaml_path] whenever the refinement can’t prove the changed top-level keys from the post-image alone.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 167 - 181, The
current per-model refinement is unsafe because iter_diff_post_line_numbers can
misattribute deleted top-level model sections; fix by detecting deletions in the
raw diff and falling back to the full file when any deletion could affect
top-level keys: in the block using diff, before calling
iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff string for
removal lines (e.g. any line starting with '-' in the unified diff hunks) and if
found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.
| def apply(self, pr: PRInputs) -> Optional[RuleResult]: | ||
| candidates = [f for f in pr.changed_files if f.startswith("tests/")] | ||
| if not candidates: | ||
| return None | ||
|
|
||
| block_filters: dict[tuple[str, int], dict[str, set[str]]] = {} | ||
| narrowed: set[str] = set() | ||
| handled: set[str] = set() | ||
| out_of_namespace: set[str] = set() | ||
| no_match: set[str] = set() | ||
|
|
||
| for git_path in candidates: | ||
| yaml_path = self.yaml_index.git_path_to_yaml_key(git_path) | ||
| if yaml_path is None: | ||
| base = git_path.rsplit("/", 1)[-1] | ||
| if base.startswith("test_") and base.endswith(".py"): | ||
| # Standalone test_*.py not referenced by any L0 YAML — | ||
| # pytest doesn't auto-import test files into other | ||
| # tests, so this file's edits can't affect what L0 | ||
| # runs. Claim as noop contribution. | ||
| handled.add(git_path) | ||
| no_match.add(git_path) | ||
| continue | ||
| # conftest / __init__ / helper / data — could impact | ||
| # selection via implicit pytest discovery; let Selector | ||
| # report it as Unhandled → fallback. | ||
| out_of_namespace.add(git_path) | ||
| continue | ||
| handled.add(git_path) | ||
| anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, "")) |
There was a problem hiding this comment.
Skip out-of-scope markdown before attempting testdef narrowing.
An in-namespace doc like tests/integration/defs/accuracy/README.md will currently be translated to accuracy/README.md, walked up to accuracy/, and produce an actionable testdefonly result. Because _combine_scopes() lets actionable scopes override noop, that defeats OutOfScopeRule and still schedules CI for markdown-only edits under YAML-covered trees.
Suggested fix
+from .out_of_scope_rule import _is_out_of_scope
+
def apply(self, pr: PRInputs) -> Optional[RuleResult]:
candidates = [f for f in pr.changed_files if f.startswith("tests/")]
if not candidates:
return None
block_filters: dict[tuple[str, int], dict[str, set[str]]] = {}
@@
for git_path in candidates:
+ if _is_out_of_scope(git_path):
+ continue
yaml_path = self.yaml_index.git_path_to_yaml_key(git_path)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def apply(self, pr: PRInputs) -> Optional[RuleResult]: | |
| candidates = [f for f in pr.changed_files if f.startswith("tests/")] | |
| if not candidates: | |
| return None | |
| block_filters: dict[tuple[str, int], dict[str, set[str]]] = {} | |
| narrowed: set[str] = set() | |
| handled: set[str] = set() | |
| out_of_namespace: set[str] = set() | |
| no_match: set[str] = set() | |
| for git_path in candidates: | |
| yaml_path = self.yaml_index.git_path_to_yaml_key(git_path) | |
| if yaml_path is None: | |
| base = git_path.rsplit("/", 1)[-1] | |
| if base.startswith("test_") and base.endswith(".py"): | |
| # Standalone test_*.py not referenced by any L0 YAML — | |
| # pytest doesn't auto-import test files into other | |
| # tests, so this file's edits can't affect what L0 | |
| # runs. Claim as noop contribution. | |
| handled.add(git_path) | |
| no_match.add(git_path) | |
| continue | |
| # conftest / __init__ / helper / data — could impact | |
| # selection via implicit pytest discovery; let Selector | |
| # report it as Unhandled → fallback. | |
| out_of_namespace.add(git_path) | |
| continue | |
| handled.add(git_path) | |
| anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, "")) | |
| def apply(self, pr: PRInputs) -> Optional[RuleResult]: | |
| candidates = [f for f in pr.changed_files if f.startswith("tests/")] | |
| if not candidates: | |
| return None | |
| block_filters: dict[tuple[str, int], dict[str, set[str]]] = {} | |
| narrowed: set[str] = set() | |
| handled: set[str] = set() | |
| out_of_namespace: set[str] = set() | |
| no_match: set[str] = set() | |
| for git_path in candidates: | |
| if _is_out_of_scope(git_path): | |
| continue | |
| yaml_path = self.yaml_index.git_path_to_yaml_key(git_path) | |
| if yaml_path is None: | |
| base = git_path.rsplit("/", 1)[-1] | |
| if base.startswith("test_") and base.endswith(".py"): | |
| # Standalone test_*.py not referenced by any L0 YAML — | |
| # pytest doesn't auto-import test files into other | |
| # tests, so this file's edits can't affect what L0 | |
| # runs. Claim as noop contribution. | |
| handled.add(git_path) | |
| no_match.add(git_path) | |
| continue | |
| # conftest / __init__ / helper / data — could impact | |
| # selection via implicit pytest discovery; let Selector | |
| # report it as Unhandled → fallback. | |
| out_of_namespace.add(git_path) | |
| continue | |
| handled.add(git_path) | |
| anchors = self._compute_anchors(git_path, yaml_path, pr.diffs.get(git_path, "")) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/rules/tests_def_rule.py` around lines 218 - 247, In
apply(), markdown docs under test YAML trees are being processed into actionable
scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.
| def _snapshot_test_db(repo: Path, sha: str, dst: Path) -> Path: | ||
| out = _git(repo, "ls-tree", "-r", "--name-only", sha, "--", TEST_DB_REL, check=False) | ||
| files = [ln for ln in out.stdout.splitlines() if ln.endswith(".yml")] | ||
| db = dst / "test-db" | ||
| db.mkdir(parents=True, exist_ok=True) | ||
| for f in files: | ||
| (db / Path(f).name).write_text(_git(repo, "show", f"{sha}:{f}", check=False).stdout) | ||
| return db | ||
|
|
||
|
|
||
| def _snapshot_groovy(repo: Path, sha: str, dst: Path) -> Path: | ||
| p = dst / "L0_Test.groovy" | ||
| p.write_text(_git(repo, "show", f"{sha}:{GROOVY_REL}", check=False).stdout) | ||
| return p |
There was a problem hiding this comment.
Fail fast when snapshot files are unavailable.
The current check=False + unconditional write_text(...stdout) path can silently create empty snapshots, which can produce misleading CBTS decisions instead of a clear replay failure.
Suggested fix
def _snapshot_test_db(repo: Path, sha: str, dst: Path) -> Path:
out = _git(repo, "ls-tree", "-r", "--name-only", sha, "--", TEST_DB_REL, check=False)
files = [ln for ln in out.stdout.splitlines() if ln.endswith(".yml")]
db = dst / "test-db"
db.mkdir(parents=True, exist_ok=True)
for f in files:
- (db / Path(f).name).write_text(_git(repo, "show", f"{sha}:{f}", check=False).stdout)
+ shown = _git(repo, "show", f"{sha}:{f}", check=False)
+ if shown.returncode != 0:
+ raise subprocess.CalledProcessError(
+ shown.returncode, shown.args, output=shown.stdout, stderr=shown.stderr
+ )
+ (db / Path(f).name).write_text(shown.stdout)
return db
def _snapshot_groovy(repo: Path, sha: str, dst: Path) -> Path:
p = dst / "L0_Test.groovy"
- p.write_text(_git(repo, "show", f"{sha}:{GROOVY_REL}", check=False).stdout)
+ shown = _git(repo, "show", f"{sha}:{GROOVY_REL}", check=False)
+ if shown.returncode != 0:
+ raise subprocess.CalledProcessError(
+ shown.returncode, shown.args, output=shown.stdout, stderr=shown.stderr
+ )
+ p.write_text(shown.stdout)
return p🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 100 - 113, The snapshot
helpers (_snapshot_test_db and _snapshot_groovy) currently call _git with
check=False and unconditionally write .stdout, which can create empty/missing
snapshot files; change them to fail fast by checking the _git result (use
check=True or validate result.returncode and result.stdout) and raise/abort if
the git show/ls-tree calls failed or returned empty content for any expected
file; specifically validate the outputs from _git(repo, "ls-tree", ..., sha,
"--", TEST_DB_REL) and each _git(repo, "show", f"{sha}:{f}") in
_snapshot_test_db, and validate the _git(repo, "show", f"{sha}:{GROOVY_REL}")
call in _snapshot_groovy, and only write files when the calls succeeded and
stdout is non-empty, otherwise raise an error so snapshot creation fails fast.
| for old in pr_dir.iterdir(): | ||
| if old.name != "summary.txt": | ||
| old.unlink() | ||
| shared_out = repo / "cbts_test_db" |
There was a problem hiding this comment.
Handle stale directories during per-label cleanup.
old.unlink() will raise on directories. With --keep-stale, this can crash replay for a label if any subdirectory exists from prior runs.
Suggested fix
for old in pr_dir.iterdir():
if old.name != "summary.txt":
- old.unlink()
+ if old.is_dir():
+ shutil.rmtree(old)
+ else:
+ old.unlink()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 252 - 255, The loop over
pr_dir.iterdir() uses old.unlink() which raises on directories and causes
crashes when --keep-stale leaves subdirectories; update the cleanup in dryrun.py
to detect whether old is a file/symlink or a directory (using
old.is_dir()/old.is_file()/old.is_symlink()) and call the appropriate remover
(unlink for files/symlinks, shutil.rmtree for directories) or otherwise skip
non-removable entries so per-label cleanup in the for loop does not raise;
adjust imports if needed and keep the rest of the logic (the pr_dir.iterdir()
loop and subsequent shared_out assignment) unchanged.
| "--window", type=int, default=500, help="commits back from --ref (default: 500)" | ||
| ) | ||
| ap.add_argument( | ||
| "--filter", | ||
| choices=["tests-only", "all"], | ||
| default="tests-only", | ||
| help="commit filter (default: tests-only)", | ||
| ) | ||
| ap.add_argument("--limit", type=int, default=None, help="cap commits after filter") | ||
| ap.add_argument( |
There was a problem hiding this comment.
Validate numeric CLI bounds at parse time.
--window and --limit currently accept invalid values (e.g., --window 0, negative values), which can cause runtime failures or surprising selection behavior.
Suggested fix
+def _positive_int(value: str) -> int:
+ ivalue = int(value)
+ if ivalue <= 0:
+ raise argparse.ArgumentTypeError("must be > 0")
+ return ivalue
+
+
+def _non_negative_int(value: str) -> int:
+ ivalue = int(value)
+ if ivalue < 0:
+ raise argparse.ArgumentTypeError("must be >= 0")
+ return ivalue
+
+
def main(argv: Optional[list[str]] = None) -> int:
@@
- ap.add_argument(
- "--window", type=int, default=500, help="commits back from --ref (default: 500)"
- )
+ ap.add_argument(
+ "--window", type=_positive_int, default=500, help="commits back from --ref (default: 500)"
+ )
@@
- ap.add_argument("--limit", type=int, default=None, help="cap commits after filter")
+ ap.add_argument("--limit", type=_non_negative_int, default=None, help="cap commits after filter")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/scripts/cbts/tools/dryrun.py` around lines 314 - 323, The CLI
currently allows invalid numeric values for --window and --limit; update the
ap.add_argument calls for "--window" and "--limit" to validate at parse time by
using custom argparse types (or a validator) that raise
argparse.ArgumentTypeError for invalid inputs: implement a positive_int(value)
used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.
Summary by CodeRabbit
New Features
Documentation
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.