fix(validator): bound inline-test regex indentation#1360
Open
JSONbored wants to merge 1 commit into
Open
Conversation
Restrict inline-test marker indentation to horizontal whitespace so multiline regex searches cannot consume newline-heavy source files while scanning for Rust, Zig, or D test markers. Add regression coverage that prevents cross-line matching while preserving detection of valid markers on later lines.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\s*to[ \t]*.Why
The inline-test patterns run with
re.MULTILINEduring PR token scoring. In Python,\salso matches newlines, so a pattern like^\s*...can consume many following lines from each line start before the marker alternative fails.That makes newline-heavy files with no inline-test marker much more expensive to scan than intended. The existing 1 MB file-size cap does not prevent this, because much smaller inputs already show the effect.
This change keeps the current file-level inline-test behavior from #314. It only narrows the indentation prefix to horizontal whitespace, so
search()still finds valid markers on later lines.Local proof
I compared the current patterns against this patch with synthetic whitespace-only source below the 1 MB file cap.
Raw Rust regex scan:
End-to-end
calculate_token_score_from_file_changes()timing on 80 KB files:.rs.zig.dA 160 KB Rust file through the same scoring path dropped from 2.895s to 0.0027s.
Behavior check:
match("\n#[test]\n...")can match from the previous line because\s*crosses the newline.match("\n#[test]\n...")does not match from the previous line.search("\n#[test]\n...")still detects the valid marker on the later line.Validation
uv run --extra dev pytest tests/validator/test_inline_test_detection.py tests/validator/test_token_scoring_integration.py tests/validator/oss_contributions/mirror/test_base_score_helper.py -q-> 51 passeduv run --extra dev ruff check gittensor/constants.py tests/validator/test_inline_test_detection.py-> passeduv run --extra dev pre-commit run --files gittensor/constants.py tests/validator/test_inline_test_detection.py-> passedgit diff --check-> passeduv run --extra dev pytest -q-> 860 passedNotes
This is complementary to the tree-sitter parser timeout/isolation work. That work bounds parser hangs; this patch removes a separate Python regex scan cost before it can dominate scoring on newline-heavy Rust/Zig/D files.