Skip to content

[BUG] [v0.0.7] normalize_indentation() and adjust_indentation() panic on multi-byte UTF-8 due to byte-index slicing (helpers.rs) #171

@xfactor-toml

Description

@xfactor-toml

Bug Description

The normalize_indentation() and adjust_indentation() functions in src/cortex-engine/src/tools/handlers/edit_strategies/helpers.rs use byte-based string slicing which will panic when the indentation boundary falls within a multi-byte UTF-8 character.

Location

src/cortex-engine/src/tools/handlers/edit_strategies/helpers.rs lines 47-48 and 102-103

Code

// In normalize_indentation() - line 47-48
let stripped = if line.len() >= min_indent {
    &line[min_indent..]  // BUG: byte-index slicing
} else {
    line
};

// In adjust_indentation() - line 102-103
let stripped = if line.len() >= min_indent {
    &line[min_indent..]  // BUG: byte-index slicing
} else {
    line
};

Root Cause

The code calculates min_indent using l.len() - l.trim_start().len(), which gives byte counts. It then uses this byte count to slice the string with &line[min_indent..]. If the indentation contains multi-byte UTF-8 characters (e.g., full-width spaces   U+3000, or other Unicode whitespace), the slice boundary may fall in the middle of a multi-byte character, causing a panic.

Steps to Reproduce

// Create content with full-width space indentation (3 bytes each)
let content = "  hello"; // Two full-width spaces (U+3000, 3 bytes each)
let result = normalize_indentation(content);
// Panics if min_indent calculation results in slicing mid-character

Expected Behavior

The functions should use character-based indexing or char_indices() to find safe slice boundaries that respect UTF-8 character boundaries.

Actual Behavior

The functions panic with:

byte index X is not a char boundary; it is inside 'Y' (bytes A..B) of `...`

Impact

  • Severity: High - causes application crash
  • Affected functionality: Edit/Patch tool when editing files containing non-ASCII whitespace in indentation
  • User impact: Users editing files with Unicode whitespace (common in some Asian language codebases or when copy-pasting from certain editors) will experience crashes

Suggested Fix

// Use char_indices to find safe boundaries
fn safe_skip_chars(s: &str, n: usize) -> &str {
    s.char_indices()
        .nth(n)
        .map(|(i, _)| &s[i..])
        .unwrap_or("")
}

// Or calculate indentation using chars instead of bytes
let min_indent = non_empty_lines
    .iter()
    .map(|l| l.chars().take_while(|c| c.is_whitespace()).count())
    .min()
    .unwrap_or(0);

Environment

  • Cortex version: 0.0.7
  • Rust version: stable
  • OS: Any

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions