-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Bug Description
The normalize_indentation() and adjust_indentation() functions in src/cortex-engine/src/tools/handlers/edit_strategies/helpers.rs use byte-based string slicing which will panic when the indentation boundary falls within a multi-byte UTF-8 character.
Location
src/cortex-engine/src/tools/handlers/edit_strategies/helpers.rs lines 47-48 and 102-103
Code
// In normalize_indentation() - line 47-48
let stripped = if line.len() >= min_indent {
&line[min_indent..] // BUG: byte-index slicing
} else {
line
};
// In adjust_indentation() - line 102-103
let stripped = if line.len() >= min_indent {
&line[min_indent..] // BUG: byte-index slicing
} else {
line
};Root Cause
The code calculates min_indent using l.len() - l.trim_start().len(), which gives byte counts. It then uses this byte count to slice the string with &line[min_indent..]. If the indentation contains multi-byte UTF-8 characters (e.g., full-width spaces U+3000, or other Unicode whitespace), the slice boundary may fall in the middle of a multi-byte character, causing a panic.
Steps to Reproduce
// Create content with full-width space indentation (3 bytes each)
let content = " hello"; // Two full-width spaces (U+3000, 3 bytes each)
let result = normalize_indentation(content);
// Panics if min_indent calculation results in slicing mid-characterExpected Behavior
The functions should use character-based indexing or char_indices() to find safe slice boundaries that respect UTF-8 character boundaries.
Actual Behavior
The functions panic with:
byte index X is not a char boundary; it is inside 'Y' (bytes A..B) of `...`
Impact
- Severity: High - causes application crash
- Affected functionality: Edit/Patch tool when editing files containing non-ASCII whitespace in indentation
- User impact: Users editing files with Unicode whitespace (common in some Asian language codebases or when copy-pasting from certain editors) will experience crashes
Suggested Fix
// Use char_indices to find safe boundaries
fn safe_skip_chars(s: &str, n: usize) -> &str {
s.char_indices()
.nth(n)
.map(|(i, _)| &s[i..])
.unwrap_or("")
}
// Or calculate indentation using chars instead of bytes
let min_indent = non_empty_lines
.iter()
.map(|l| l.chars().take_while(|c| c.is_whitespace()).count())
.min()
.unwrap_or(0);Environment
- Cortex version: 0.0.7
- Rust version: stable
- OS: Any