Skip to content

fix(validator): bound inline-test regex indentation#1360

Open
JSONbored wants to merge 1 commit into
entrius:testfrom
JSONbored:codex/inline-test-regex-hardening
Open

fix(validator): bound inline-test regex indentation#1360
JSONbored wants to merge 1 commit into
entrius:testfrom
JSONbored:codex/inline-test-regex-hardening

Conversation

@JSONbored
Copy link
Copy Markdown
Contributor

Summary

  • Restrict Rust, Zig, and D inline-test regex indentation from \s* to [ \t]*.
  • Prevent multiline regex searches from treating newlines as indentation while scanning source files for inline test markers.
  • Add regression coverage that prevents cross-line matching while preserving detection of valid markers on later lines.

Why

The inline-test patterns run with re.MULTILINE during PR token scoring. In Python, \s also matches newlines, so a pattern like ^\s*... can consume many following lines from each line start before the marker alternative fails.

That makes newline-heavy files with no inline-test marker much more expensive to scan than intended. The existing 1 MB file-size cap does not prevent this, because much smaller inputs already show the effect.

This change keeps the current file-level inline-test behavior from #314. It only narrows the indentation prefix to horizontal whitespace, so search() still finds valid markers on later lines.

Local proof

I compared the current patterns against this patch with synthetic whitespace-only source below the 1 MB file cap.

Raw Rust regex scan:

Input Current pattern Patched pattern Growth / speedup
10 KB 0.0122s 0.000062s 196x
20 KB 0.0474s 0.000132s 3.88x growth
40 KB 0.1903s 0.000241s 4.02x growth
80 KB 0.7260s 0.000479s 3.82x growth
160 KB 2.8946s 0.000862s 3357x

End-to-end calculate_token_score_from_file_changes() timing on 80 KB files:

Extension Current pattern Patched pattern Speedup
.rs 0.755s 0.0015s 491x
.zig 0.735s 0.0030s 245x
.d 0.753s 0.0029s 263x

A 160 KB Rust file through the same scoring path dropped from 2.895s to 0.0027s.

Behavior check:

  • Current pattern: match("\n#[test]\n...") can match from the previous line because \s* crosses the newline.
  • Patched pattern: match("\n#[test]\n...") does not match from the previous line.
  • Patched pattern: search("\n#[test]\n...") still detects the valid marker on the later line.

Validation

  • uv run --extra dev pytest tests/validator/test_inline_test_detection.py tests/validator/test_token_scoring_integration.py tests/validator/oss_contributions/mirror/test_base_score_helper.py -q -> 51 passed
  • uv run --extra dev ruff check gittensor/constants.py tests/validator/test_inline_test_detection.py -> passed
  • uv run --extra dev pre-commit run --files gittensor/constants.py tests/validator/test_inline_test_detection.py -> passed
  • git diff --check -> passed
  • uv run --extra dev pytest -q -> 860 passed

Notes

This is complementary to the tree-sitter parser timeout/isolation work. That work bounds parser hangs; this patch removes a separate Python regex scan cost before it can dominate scoring on newline-heavy Rust/Zig/D files.

Restrict inline-test marker indentation to horizontal whitespace so multiline regex searches cannot consume newline-heavy source files while scanning for Rust, Zig, or D test markers.

Add regression coverage that prevents cross-line matching while preserving detection of valid markers on later lines.
@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant