feat: enhance indentation handling#16
Conversation
cb33e5e to
189321e
Compare
227f789 to
7630750
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughReworks HSML indentation handling: replaces Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
e435519 to
c10986b
Compare
7630750 to
26df05e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/parser/text/process.rs (1)
20-52:⚠️ Potential issue | 🔴 CriticalUse byte offsets when scanning piped text blocks.
Line 21 counts characters with
chars().enumerate(), but Line 30 and the final slices at Lines 50-52 treat those values as byte offsets. A block containing multi-byte text before a newline can therefore mis-slice or panic on valid UTF-8 input. Switch this loop to byte indices before storing offsets.🛠️ UTF-8-safe offset handling
- let mut text_block_index = 0; + let mut text_block_end = 0; - for (index, c) in rest.chars().enumerate() { + for (index, c) in rest.char_indices() { if c == '\n' { // if next char is also a \n, then continue - let next_char = rest.chars().nth(index + 1); + let line_start = index + 1; + let next_char = rest[line_start..].chars().next(); if next_char == Some('\n') { - text_block_index = index + 1; + text_block_end = line_start + 1; continue; } - let line = &rest[index + 1..]; + let line = &rest[line_start..]; // otherwise check the indentation and if it does not fulfill the indentation, then break // TODO `@Shinigami92` 2025-03-16: right now this does not support mixed indentations on tag level indentation, but only withing the text block if !line.starts_with(&context.indent_string) { break; @@ } else { - text_block_index = index; + text_block_end = index + c.len_utf8(); continue; } } - let text_block = &rest[..text_block_index + 1]; + let text_block = &rest[..text_block_end]; - let rest = &rest[text_block_index + 1..]; + let rest = &rest[text_block_end..];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/parser/text/process.rs` around lines 20 - 52, The loop uses chars().enumerate() to compute indexes but then uses those indexes as byte offsets when slicing rest and checking prefixes, which is unsafe for multi-byte UTF-8; replace the char-based scan with a byte-based scan (e.g. iterate rest.as_bytes().iter().enumerate() or rest.bytes().enumerate()) so the numeric indexes are valid byte offsets; when checking newline use b'\n', when getting next byte use the byte slice (e.g. rest.as_bytes().get(index + 1)), when building sub-slices like let line = &rest[index + 1..] continue to use those byte indexes, and when testing leading space/tab compare the first byte to b' ' and b'\t' or convert the small byte slice to &str only after validating boundaries; keep variable names text_block_index, rest and context.indent_string unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/parser/tag/node.rs`:
- Around line 121-140: The code currently allows mixed space/tab indentation and
uses indentation.len() to infer nesting depth; change the gate in the
indentation handling so you (1) explicitly reject mixed tabs and spaces by
collecting/returning a diagnostic when indentation.contains('\t') &&
indentation.contains(' '), and (2) require the new indentation to extend the
existing prefix by checking that indentation.starts_with(&context.indent_string)
and that indentation.len() > context.indent_string.len() (instead of the current
plain len() comparison) before incrementing context.nested_tag_level and
assigning context.indent_string; use the local nested_tag_level and
indent_string copies to restore state on failure if needed and ensure
diagnostics are emitted when the checks fail.
In `@src/parser/text/node.rs`:
- Around line 18-23: The current logic for normalizing a piped text block uses
text.lines().map(|line| line.trim_start()) which strips all leading whitespace
and flattens relative indentation; instead compute the minimal/common
leading-whitespace prefix across all non-empty lines of the block and remove
only that shared prefix from each line so deeper indentation (e.g., child lines)
is preserved; update the transformation around the text variable in
src/parser/text/node.rs to calculate that common indent and then map each line
to line[common_prefix_len..] (or equivalent) rather than using trim_start().
---
Outside diff comments:
In `@src/parser/text/process.rs`:
- Around line 20-52: The loop uses chars().enumerate() to compute indexes but
then uses those indexes as byte offsets when slicing rest and checking prefixes,
which is unsafe for multi-byte UTF-8; replace the char-based scan with a
byte-based scan (e.g. iterate rest.as_bytes().iter().enumerate() or
rest.bytes().enumerate()) so the numeric indexes are valid byte offsets; when
checking newline use b'\n', when getting next byte use the byte slice (e.g.
rest.as_bytes().get(index + 1)), when building sub-slices like let line =
&rest[index + 1..] continue to use those byte indexes, and when testing leading
space/tab compare the first byte to b' ' and b'\t' or convert the small byte
slice to &str only after validating boundaries; keep variable names
text_block_index, rest and context.indent_string unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3da8d46b-be8e-4f62-8a15-748042ebb644
📒 Files selected for processing (6)
src/parser/attribute/process.rssrc/parser/mod.rssrc/parser/parse.rssrc/parser/tag/node.rssrc/parser/text/node.rssrc/parser/text/process.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/parser/text/node.rs (1)
18-18: Update the inline comment to match actual dedent behavior.At Line 18, the comment says all leading whitespace is removed per line, but the code strips only
block_indentviastrip_prefix. This is misleading during maintenance.✏️ Suggested comment tweak
- // On every line, replace all leading spaces and tabs with an empty string + // Strip the first non-empty line's indentation prefix from each line.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/parser/text/node.rs` at line 18, Update the inline comment to accurately describe the dedent behavior: instead of saying "replace all leading spaces and tabs" explain that the code removes only the configured block_indent prefix from each line using strip_prefix (leaving other leading whitespace intact). Reference the dedent logic that uses block_indent and strip_prefix in src/parser/text/node.rs so the comment matches the actual behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/parser/text/process.rs`:
- Around line 20-49: The first line after the pipe marker isn't validated before
entering the char loop, so check the initial line stored in rest before
scanning: verify rest starts with context.indent_string and that the character
immediately after that indent is a space or tab; if it fails, do not treat the
following content as a text block (leave text_block_end unchanged / exit early).
Apply this validation using the same identifiers (rest, context.indent_string,
text_block_end) prior to the for (index, c) in rest.char_indices() loop so the
loop only runs when the first piped-text line is confirmed to belong to the text
block.
---
Nitpick comments:
In `@src/parser/text/node.rs`:
- Line 18: Update the inline comment to accurately describe the dedent behavior:
instead of saying "replace all leading spaces and tabs" explain that the code
removes only the configured block_indent prefix from each line using
strip_prefix (leaving other leading whitespace intact). Reference the dedent
logic that uses block_indent and strip_prefix in src/parser/text/node.rs so the
comment matches the actual behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a7912d23-1e94-4d6c-ba3a-f6c761c5e9c8
📒 Files selected for processing (3)
src/parser/tag/node.rssrc/parser/text/node.rssrc/parser/text/process.rs
Summary by CodeRabbit