fix(validator): bound inline-test regex indentation by JSONbored · Pull Request #1360 · entrius/gittensor

JSONbored · 2026-05-25T13:17:06Z

Summary

Restrict Rust, Zig, and D inline-test regex indentation from \s* to [ \t]*.
Prevent multiline regex searches from treating newlines as indentation while scanning source files for inline test markers.
Add regression coverage that prevents cross-line matching while preserving detection of valid markers on later lines.

Why

The inline-test patterns run with re.MULTILINE during PR token scoring. In Python, \s also matches newlines, so a pattern like ^\s*... can consume many following lines from each line start before the marker alternative fails.

That makes newline-heavy files with no inline-test marker much more expensive to scan than intended. The existing 1 MB file-size cap does not prevent this, because much smaller inputs already show the effect.

This change keeps the current file-level inline-test behavior from #314. It only narrows the indentation prefix to horizontal whitespace, so search() still finds valid markers on later lines.

Local proof

I compared the current patterns against this patch with synthetic whitespace-only source below the 1 MB file cap.

Raw Rust regex scan:

Input	Current pattern	Patched pattern	Growth / speedup
10 KB	0.0122s	0.000062s	196x
20 KB	0.0474s	0.000132s	3.88x growth
40 KB	0.1903s	0.000241s	4.02x growth
80 KB	0.7260s	0.000479s	3.82x growth
160 KB	2.8946s	0.000862s	3357x

End-to-end calculate_token_score_from_file_changes() timing on 80 KB files:

Extension	Current pattern	Patched pattern	Speedup
`.rs`	0.755s	0.0015s	491x
`.zig`	0.735s	0.0030s	245x
`.d`	0.753s	0.0029s	263x

A 160 KB Rust file through the same scoring path dropped from 2.895s to 0.0027s.

Behavior check:

Current pattern: match("\n#[test]\n...") can match from the previous line because \s* crosses the newline.
Patched pattern: match("\n#[test]\n...") does not match from the previous line.
Patched pattern: search("\n#[test]\n...") still detects the valid marker on the later line.

Validation

uv run --extra dev pytest tests/validator/test_inline_test_detection.py tests/validator/test_token_scoring_integration.py tests/validator/oss_contributions/mirror/test_base_score_helper.py -q -> 51 passed
uv run --extra dev ruff check gittensor/constants.py tests/validator/test_inline_test_detection.py -> passed
uv run --extra dev pre-commit run --files gittensor/constants.py tests/validator/test_inline_test_detection.py -> passed
git diff --check -> passed
uv run --extra dev pytest -q -> 860 passed

Notes

This is complementary to the tree-sitter parser timeout/isolation work. That work bounds parser hangs; this patch removes a separate Python regex scan cost before it can dominate scoring on newline-heavy Rust/Zig/D files.

Restrict inline-test marker indentation to horizontal whitespace so multiline regex searches cannot consume newline-heavy source files while scanning for Rust, Zig, or D test markers. Add regression coverage that prevents cross-line matching while preserving detection of valid markers on later lines.

xiao-xiao-mao Bot added the bug Something isn't working label May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(validator): bound inline-test regex indentation#1360

fix(validator): bound inline-test regex indentation#1360
JSONbored wants to merge 1 commit into
entrius:testfrom
JSONbored:codex/inline-test-regex-hardening

JSONbored commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JSONbored commented May 25, 2026

Summary

Why

Local proof

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant