feat: enhance indentation handling by Shinigami92 · Pull Request #16 · hsml-lab/hsml

Shinigami92 · 2025-03-16T09:35:24Z

Summary by CodeRabbit

Refactor
- Unified and improved indentation handling across the parser so multiline attributes and text blocks follow consistent, line-oriented rules.
- Parsing now preserves and trims text blocks more reliably, with better handling of blank lines, multibyte/CJK text, and emojis.
- Mixed tab/space indentation is detected more strictly, preventing ambiguous parsing.
Tests
- Updated tests to reflect the new indentation and trimming behavior.

coderabbitai · 2026-03-16T21:43:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 203be5e3-246a-48f6-bdd4-28972d1d3c5c

📥 Commits

Reviewing files that changed from the base of the PR and between 364bc8a and 9b52bd3.

📒 Files selected for processing (2)

src/parser/text/node.rs
src/parser/text/process.rs

📝 Walkthrough

Walkthrough

Reworks HSML indentation handling: replaces indent_level with nested_tag_level, changes indent_string from Option<String> to String, computes/stores an explicit indent string during parse, and shifts text-node processing to per-line indentation validation and stripping.

Changes

Cohort / File(s)	Summary
Context Structure Definition `src/parser/mod.rs`	Public `HsmlProcessContext` now exposes `nested_tag_level: usize` and `indent_string: String` (was `indent_level: usize` and `indent_string: Option<String>`); added docs about semantics.
Parse entry / indent extraction `src/parser/parse.rs`	Capture leading whitespace up to first non-whitespace, compute concrete `indent_string` (trailing spaces/tabs after last newline), assign into `context.indent_string`, and continue parsing with remaining input.
Tag node parsing & state `src/parser/tag/node.rs`	Switches to `nested_tag_level` + concrete `indent_string`, changes indentation checks to starts_with/length-based logic, propagates/restores new state per child, and treats mixed-tabs-and-spaces as Failure. Tests updated.
Text node processing `src/parser/text/node.rs`, `src/parser/text/process.rs`	Move from block-level trimming to per-line prefix detection/stripping: compute block_indent from first non-empty line, require each line to start with `context.indent_string` (and a following space/tab), update line-end logic and extraction. Adds many indentation/encoding edge-case tests.
Attribute & other tests `src/parser/attribute/process.rs`, `src/parser/.../tests`	Tests updated to construct `HsmlProcessContext` with `nested_tag_level` and concrete `indent_string: String`; expected outputs adjusted for new per-line indentation behavior and indentation formatting changes.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Input as "Raw Input"
participant Parser as "parse.rs"
participant Context as "HsmlProcessContext"
participant TagNode as "tag/node.rs"
participant TextNode as "text/process.rs"
rect rgba(200,200,255,0.5)
Input->>Parser: feed input
Parser->>Context: capture leading whitespace\ncompute indent_string
Parser->>TagNode: attempt tag parse (uses Context)
Parser->>TextNode: attempt text parse (uses Context)
TagNode->>Context: push nested_tag_level,\nupdate indent_string
TagNode-->>Context: restore nested_tag_level\nand indent_string
TextNode->>Context: compute block_indent from\nfirst non-empty line
TextNode->>Context: validate per-line indent\nstrip prefixes accordingly
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through tabs and spaces, neat and fleet,
I found the string that marks each indented beat,
Per-line I nibble, peel each leading seam,
Nested levels tidy — parsing's a dream,
Nose twitch, carrots ready — code feels sweet! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: enhance indentation handling' directly reflects the main changes across all modified files, which consistently refactor indentation handling mechanisms.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat-enhance-indentation-handling

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/parser/text/process.rs (1)

20-52: ⚠️ Potential issue | 🔴 Critical

Use byte offsets when scanning piped text blocks.

Line 21 counts characters with chars().enumerate(), but Line 30 and the final slices at Lines 50-52 treat those values as byte offsets. A block containing multi-byte text before a newline can therefore mis-slice or panic on valid UTF-8 input. Switch this loop to byte indices before storing offsets.

🛠️ UTF-8-safe offset handling

-    let mut text_block_index = 0;
+    let mut text_block_end = 0;
 
-    for (index, c) in rest.chars().enumerate() {
+    for (index, c) in rest.char_indices() {
         if c == '\n' {
             // if next char is also a \n, then continue
-            let next_char = rest.chars().nth(index + 1);
+            let line_start = index + 1;
+            let next_char = rest[line_start..].chars().next();
             if next_char == Some('\n') {
-                text_block_index = index + 1;
+                text_block_end = line_start + 1;
                 continue;
             }
 
-            let line = &rest[index + 1..];
+            let line = &rest[line_start..];
 
             // otherwise check the indentation and if it does not fulfill the indentation, then break
             // TODO `@Shinigami92` 2025-03-16: right now this does not support mixed indentations on tag level indentation, but only withing the text block
             if !line.starts_with(&context.indent_string) {
                 break;
@@
         } else {
-            text_block_index = index;
+            text_block_end = index + c.len_utf8();
             continue;
         }
     }
 
-    let text_block = &rest[..text_block_index + 1];
+    let text_block = &rest[..text_block_end];
 
-    let rest = &rest[text_block_index + 1..];
+    let rest = &rest[text_block_end..];

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/parser/text/process.rs` around lines 20 - 52, The loop uses
chars().enumerate() to compute indexes but then uses those indexes as byte
offsets when slicing rest and checking prefixes, which is unsafe for multi-byte
UTF-8; replace the char-based scan with a byte-based scan (e.g. iterate
rest.as_bytes().iter().enumerate() or rest.bytes().enumerate()) so the numeric
indexes are valid byte offsets; when checking newline use b'\n', when getting
next byte use the byte slice (e.g. rest.as_bytes().get(index + 1)), when
building sub-slices like let line = &rest[index + 1..] continue to use those
byte indexes, and when testing leading space/tab compare the first byte to b' '
and b'\t' or convert the small byte slice to &str only after validating
boundaries; keep variable names text_block_index, rest and context.indent_string
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/parser/tag/node.rs`:
- Around line 121-140: The code currently allows mixed space/tab indentation and
uses indentation.len() to infer nesting depth; change the gate in the
indentation handling so you (1) explicitly reject mixed tabs and spaces by
collecting/returning a diagnostic when indentation.contains('\t') &&
indentation.contains(' '), and (2) require the new indentation to extend the
existing prefix by checking that indentation.starts_with(&context.indent_string)
and that indentation.len() > context.indent_string.len() (instead of the current
plain len() comparison) before incrementing context.nested_tag_level and
assigning context.indent_string; use the local nested_tag_level and
indent_string copies to restore state on failure if needed and ensure
diagnostics are emitted when the checks fail.

In `@src/parser/text/node.rs`:
- Around line 18-23: The current logic for normalizing a piped text block uses
text.lines().map(|line| line.trim_start()) which strips all leading whitespace
and flattens relative indentation; instead compute the minimal/common
leading-whitespace prefix across all non-empty lines of the block and remove
only that shared prefix from each line so deeper indentation (e.g., child lines)
is preserved; update the transformation around the text variable in
src/parser/text/node.rs to calculate that common indent and then map each line
to line[common_prefix_len..] (or equivalent) rather than using trim_start().

---

Outside diff comments:
In `@src/parser/text/process.rs`:
- Around line 20-52: The loop uses chars().enumerate() to compute indexes but
then uses those indexes as byte offsets when slicing rest and checking prefixes,
which is unsafe for multi-byte UTF-8; replace the char-based scan with a
byte-based scan (e.g. iterate rest.as_bytes().iter().enumerate() or
rest.bytes().enumerate()) so the numeric indexes are valid byte offsets; when
checking newline use b'\n', when getting next byte use the byte slice (e.g.
rest.as_bytes().get(index + 1)), when building sub-slices like let line =
&rest[index + 1..] continue to use those byte indexes, and when testing leading
space/tab compare the first byte to b' ' and b'\t' or convert the small byte
slice to &str only after validating boundaries; keep variable names
text_block_index, rest and context.indent_string unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3da8d46b-be8e-4f62-8a15-748042ebb644

📥 Commits

Reviewing files that changed from the base of the PR and between 04c96cb and 26df05e.

📒 Files selected for processing (6)

src/parser/attribute/process.rs
src/parser/mod.rs
src/parser/parse.rs
src/parser/tag/node.rs
src/parser/text/node.rs
src/parser/text/process.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/parser/text/node.rs (1)
18-18: Update the inline comment to match actual dedent behavior.

At Line 18, the comment says all leading whitespace is removed per line, but the code strips only block_indent via strip_prefix. This is misleading during maintenance.
✏️ Suggested comment tweak
-    // On every line, replace all leading spaces and tabs with an empty string
+    // Strip the first non-empty line's indentation prefix from each line.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/text/node.rs` at line 18, Update the inline comment to accurately
describe the dedent behavior: instead of saying "replace all leading spaces and
tabs" explain that the code removes only the configured block_indent prefix from
each line using strip_prefix (leaving other leading whitespace intact).
Reference the dedent logic that uses block_indent and strip_prefix in
src/parser/text/node.rs so the comment matches the actual behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/parser/text/process.rs`:
- Around line 20-49: The first line after the pipe marker isn't validated before
entering the char loop, so check the initial line stored in rest before
scanning: verify rest starts with context.indent_string and that the character
immediately after that indent is a space or tab; if it fails, do not treat the
following content as a text block (leave text_block_end unchanged / exit early).
Apply this validation using the same identifiers (rest, context.indent_string,
text_block_end) prior to the for (index, c) in rest.char_indices() loop so the
loop only runs when the first piped-text line is confirmed to belong to the text
block.

---

Nitpick comments:
In `@src/parser/text/node.rs`:
- Line 18: Update the inline comment to accurately describe the dedent behavior:
instead of saying "replace all leading spaces and tabs" explain that the code
removes only the configured block_indent prefix from each line using
strip_prefix (leaving other leading whitespace intact). Reference the dedent
logic that uses block_indent and strip_prefix in src/parser/text/node.rs so the
comment matches the actual behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a7912d23-1e94-4d6c-ba3a-f6c761c5e9c8

📥 Commits

Reviewing files that changed from the base of the PR and between 26df05e and 364bc8a.

📒 Files selected for processing (3)

src/parser/tag/node.rs
src/parser/text/node.rs
src/parser/text/process.rs

Shinigami92 added the c: feature Request for new feature label Mar 16, 2025

Shinigami92 self-assigned this Mar 16, 2025

Shinigami92 force-pushed the feat-init-lsp branch from cb33e5e to 189321e Compare March 16, 2026 21:40

Shinigami92 force-pushed the feat-enhance-indentation-handling branch from 227f789 to 7630750 Compare March 16, 2026 21:43

Shinigami92 force-pushed the feat-init-lsp branch from e435519 to c10986b Compare March 18, 2026 13:58

Base automatically changed from feat-init-lsp to main March 18, 2026 14:12

feat: enhance indentation handling

26df05e

Shinigami92 force-pushed the feat-enhance-indentation-handling branch from 7630750 to 26df05e Compare March 18, 2026 14:14

coderabbitai Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread src/parser/tag/node.rs

Comment thread src/parser/text/node.rs Outdated

apply review suggestions

364bc8a

coderabbitai Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread src/parser/text/process.rs

apply review suggestions

9b52bd3

Shinigami92 added this pull request to the merge queue Mar 18, 2026

Merged via the queue into main with commit 839b8a1 Mar 18, 2026
5 checks passed

Shinigami92 deleted the feat-enhance-indentation-handling branch March 18, 2026 15:36

This was referenced Mar 20, 2026

refactor: add HsmlError helper methods to reduce error boilerplate #49

Merged

refactor: improve text block line scanning #54

Merged

feat: add mixed indentation error #63

Merged

feat: convert mixed indentation from error to warning #64

Merged

This was referenced Apr 3, 2026

feat: implement hsml fmt command #125

Merged

fix: skip whitespace-only lines in parser indentation #126

Merged

fix: handle EOF with trailing whitespace in parser #127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enhance indentation handling#16

feat: enhance indentation handling#16
Shinigami92 merged 3 commits intomainfrom
feat-enhance-indentation-handling

Shinigami92 commented Mar 16, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Shinigami92 commented Mar 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shinigami92 commented Mar 16, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading