Skip to content

feat: enhance indentation handling#16

Merged
Shinigami92 merged 3 commits intomainfrom
feat-enhance-indentation-handling
Mar 18, 2026
Merged

feat: enhance indentation handling#16
Shinigami92 merged 3 commits intomainfrom
feat-enhance-indentation-handling

Conversation

@Shinigami92
Copy link
Copy Markdown
Member

@Shinigami92 Shinigami92 commented Mar 16, 2025

Summary by CodeRabbit

  • Refactor
    • Unified and improved indentation handling across the parser so multiline attributes and text blocks follow consistent, line-oriented rules.
    • Parsing now preserves and trims text blocks more reliably, with better handling of blank lines, multibyte/CJK text, and emojis.
    • Mixed tab/space indentation is detected more strictly, preventing ambiguous parsing.
  • Tests
    • Updated tests to reflect the new indentation and trimming behavior.

@Shinigami92 Shinigami92 added the c: feature Request for new feature label Mar 16, 2025
@Shinigami92 Shinigami92 self-assigned this Mar 16, 2025
@Shinigami92 Shinigami92 force-pushed the feat-enhance-indentation-handling branch from 227f789 to 7630750 Compare March 16, 2026 21:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 203be5e3-246a-48f6-bdd4-28972d1d3c5c

📥 Commits

Reviewing files that changed from the base of the PR and between 364bc8a and 9b52bd3.

📒 Files selected for processing (2)
  • src/parser/text/node.rs
  • src/parser/text/process.rs

📝 Walkthrough

Walkthrough

Reworks HSML indentation handling: replaces indent_level with nested_tag_level, changes indent_string from Option<String> to String, computes/stores an explicit indent string during parse, and shifts text-node processing to per-line indentation validation and stripping.

Changes

Cohort / File(s) Summary
Context Structure Definition
src/parser/mod.rs
Public HsmlProcessContext now exposes nested_tag_level: usize and indent_string: String (was indent_level: usize and indent_string: Option<String>); added docs about semantics.
Parse entry / indent extraction
src/parser/parse.rs
Capture leading whitespace up to first non-whitespace, compute concrete indent_string (trailing spaces/tabs after last newline), assign into context.indent_string, and continue parsing with remaining input.
Tag node parsing & state
src/parser/tag/node.rs
Switches to nested_tag_level + concrete indent_string, changes indentation checks to starts_with/length-based logic, propagates/restores new state per child, and treats mixed-tabs-and-spaces as Failure. Tests updated.
Text node processing
src/parser/text/node.rs, src/parser/text/process.rs
Move from block-level trimming to per-line prefix detection/stripping: compute block_indent from first non-empty line, require each line to start with context.indent_string (and a following space/tab), update line-end logic and extraction. Adds many indentation/encoding edge-case tests.
Attribute & other tests
src/parser/attribute/process.rs, src/parser/.../tests
Tests updated to construct HsmlProcessContext with nested_tag_level and concrete indent_string: String; expected outputs adjusted for new per-line indentation behavior and indentation formatting changes.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Input as "Raw Input"
participant Parser as "parse.rs"
participant Context as "HsmlProcessContext"
participant TagNode as "tag/node.rs"
participant TextNode as "text/process.rs"
rect rgba(200,200,255,0.5)
Input->>Parser: feed input
Parser->>Context: capture leading whitespace\ncompute indent_string
Parser->>TagNode: attempt tag parse (uses Context)
Parser->>TextNode: attempt text parse (uses Context)
TagNode->>Context: push nested_tag_level,\nupdate indent_string
TagNode-->>Context: restore nested_tag_level\nand indent_string
TextNode->>Context: compute block_indent from\nfirst non-empty line
TextNode->>Context: validate per-line indent\nstrip prefixes accordingly
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through tabs and spaces, neat and fleet,
I found the string that marks each indented beat,
Per-line I nibble, peel each leading seam,
Nested levels tidy — parsing's a dream,
Nose twitch, carrots ready — code feels sweet! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: enhance indentation handling' directly reflects the main changes across all modified files, which consistently refactor indentation handling mechanisms.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-enhance-indentation-handling
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from feat-init-lsp to main March 18, 2026 14:12
@Shinigami92 Shinigami92 force-pushed the feat-enhance-indentation-handling branch from 7630750 to 26df05e Compare March 18, 2026 14:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/parser/text/process.rs (1)

20-52: ⚠️ Potential issue | 🔴 Critical

Use byte offsets when scanning piped text blocks.

Line 21 counts characters with chars().enumerate(), but Line 30 and the final slices at Lines 50-52 treat those values as byte offsets. A block containing multi-byte text before a newline can therefore mis-slice or panic on valid UTF-8 input. Switch this loop to byte indices before storing offsets.

🛠️ UTF-8-safe offset handling
-    let mut text_block_index = 0;
+    let mut text_block_end = 0;
 
-    for (index, c) in rest.chars().enumerate() {
+    for (index, c) in rest.char_indices() {
         if c == '\n' {
             // if next char is also a \n, then continue
-            let next_char = rest.chars().nth(index + 1);
+            let line_start = index + 1;
+            let next_char = rest[line_start..].chars().next();
             if next_char == Some('\n') {
-                text_block_index = index + 1;
+                text_block_end = line_start + 1;
                 continue;
             }
 
-            let line = &rest[index + 1..];
+            let line = &rest[line_start..];
 
             // otherwise check the indentation and if it does not fulfill the indentation, then break
             // TODO `@Shinigami92` 2025-03-16: right now this does not support mixed indentations on tag level indentation, but only withing the text block
             if !line.starts_with(&context.indent_string) {
                 break;
@@
         } else {
-            text_block_index = index;
+            text_block_end = index + c.len_utf8();
             continue;
         }
     }
 
-    let text_block = &rest[..text_block_index + 1];
+    let text_block = &rest[..text_block_end];
 
-    let rest = &rest[text_block_index + 1..];
+    let rest = &rest[text_block_end..];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/text/process.rs` around lines 20 - 52, The loop uses
chars().enumerate() to compute indexes but then uses those indexes as byte
offsets when slicing rest and checking prefixes, which is unsafe for multi-byte
UTF-8; replace the char-based scan with a byte-based scan (e.g. iterate
rest.as_bytes().iter().enumerate() or rest.bytes().enumerate()) so the numeric
indexes are valid byte offsets; when checking newline use b'\n', when getting
next byte use the byte slice (e.g. rest.as_bytes().get(index + 1)), when
building sub-slices like let line = &rest[index + 1..] continue to use those
byte indexes, and when testing leading space/tab compare the first byte to b' '
and b'\t' or convert the small byte slice to &str only after validating
boundaries; keep variable names text_block_index, rest and context.indent_string
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/parser/tag/node.rs`:
- Around line 121-140: The code currently allows mixed space/tab indentation and
uses indentation.len() to infer nesting depth; change the gate in the
indentation handling so you (1) explicitly reject mixed tabs and spaces by
collecting/returning a diagnostic when indentation.contains('\t') &&
indentation.contains(' '), and (2) require the new indentation to extend the
existing prefix by checking that indentation.starts_with(&context.indent_string)
and that indentation.len() > context.indent_string.len() (instead of the current
plain len() comparison) before incrementing context.nested_tag_level and
assigning context.indent_string; use the local nested_tag_level and
indent_string copies to restore state on failure if needed and ensure
diagnostics are emitted when the checks fail.

In `@src/parser/text/node.rs`:
- Around line 18-23: The current logic for normalizing a piped text block uses
text.lines().map(|line| line.trim_start()) which strips all leading whitespace
and flattens relative indentation; instead compute the minimal/common
leading-whitespace prefix across all non-empty lines of the block and remove
only that shared prefix from each line so deeper indentation (e.g., child lines)
is preserved; update the transformation around the text variable in
src/parser/text/node.rs to calculate that common indent and then map each line
to line[common_prefix_len..] (or equivalent) rather than using trim_start().

---

Outside diff comments:
In `@src/parser/text/process.rs`:
- Around line 20-52: The loop uses chars().enumerate() to compute indexes but
then uses those indexes as byte offsets when slicing rest and checking prefixes,
which is unsafe for multi-byte UTF-8; replace the char-based scan with a
byte-based scan (e.g. iterate rest.as_bytes().iter().enumerate() or
rest.bytes().enumerate()) so the numeric indexes are valid byte offsets; when
checking newline use b'\n', when getting next byte use the byte slice (e.g.
rest.as_bytes().get(index + 1)), when building sub-slices like let line =
&rest[index + 1..] continue to use those byte indexes, and when testing leading
space/tab compare the first byte to b' ' and b'\t' or convert the small byte
slice to &str only after validating boundaries; keep variable names
text_block_index, rest and context.indent_string unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3da8d46b-be8e-4f62-8a15-748042ebb644

📥 Commits

Reviewing files that changed from the base of the PR and between 04c96cb and 26df05e.

📒 Files selected for processing (6)
  • src/parser/attribute/process.rs
  • src/parser/mod.rs
  • src/parser/parse.rs
  • src/parser/tag/node.rs
  • src/parser/text/node.rs
  • src/parser/text/process.rs

Comment thread src/parser/tag/node.rs
Comment thread src/parser/text/node.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/parser/text/node.rs (1)

18-18: Update the inline comment to match actual dedent behavior.

At Line 18, the comment says all leading whitespace is removed per line, but the code strips only block_indent via strip_prefix. This is misleading during maintenance.

✏️ Suggested comment tweak
-    // On every line, replace all leading spaces and tabs with an empty string
+    // Strip the first non-empty line's indentation prefix from each line.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/text/node.rs` at line 18, Update the inline comment to accurately
describe the dedent behavior: instead of saying "replace all leading spaces and
tabs" explain that the code removes only the configured block_indent prefix from
each line using strip_prefix (leaving other leading whitespace intact).
Reference the dedent logic that uses block_indent and strip_prefix in
src/parser/text/node.rs so the comment matches the actual behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/parser/text/process.rs`:
- Around line 20-49: The first line after the pipe marker isn't validated before
entering the char loop, so check the initial line stored in rest before
scanning: verify rest starts with context.indent_string and that the character
immediately after that indent is a space or tab; if it fails, do not treat the
following content as a text block (leave text_block_end unchanged / exit early).
Apply this validation using the same identifiers (rest, context.indent_string,
text_block_end) prior to the for (index, c) in rest.char_indices() loop so the
loop only runs when the first piped-text line is confirmed to belong to the text
block.

---

Nitpick comments:
In `@src/parser/text/node.rs`:
- Line 18: Update the inline comment to accurately describe the dedent behavior:
instead of saying "replace all leading spaces and tabs" explain that the code
removes only the configured block_indent prefix from each line using
strip_prefix (leaving other leading whitespace intact). Reference the dedent
logic that uses block_indent and strip_prefix in src/parser/text/node.rs so the
comment matches the actual behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a7912d23-1e94-4d6c-ba3a-f6c761c5e9c8

📥 Commits

Reviewing files that changed from the base of the PR and between 26df05e and 364bc8a.

📒 Files selected for processing (3)
  • src/parser/tag/node.rs
  • src/parser/text/node.rs
  • src/parser/text/process.rs

Comment thread src/parser/text/process.rs
@Shinigami92 Shinigami92 added this pull request to the merge queue Mar 18, 2026
Merged via the queue into main with commit 839b8a1 Mar 18, 2026
5 checks passed
@Shinigami92 Shinigami92 deleted the feat-enhance-indentation-handling branch March 18, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c: feature Request for new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant