-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Bug Report
Project
cortex
Description
In cortex-engine/src/compaction.rs, the generate_summary() function truncates user message lines using a raw byte-index slice &first_line[..100]. The guard condition first_line.len() > 100 compares byte length, so when it falls through to the slice, the index 100 is a byte offset that may land in the middle of a multi-byte UTF-8 character (which can be 2–4 bytes wide). Rust will panic at runtime with a message like byte index 100 is not a char boundary.
Location: src/cortex-engine/src/compaction.rs, lines 201–202
Buggy code:
let truncated = if first_line.len() > 100 {
format!("{}...", &first_line[..100]) // BUG: byte-based slice panics on multi-byte UTF-8
} else {
first_line.to_string()
};first_line.len() returns the byte length, not the character count. When a user message contains multi-byte characters (e.g. CJK, emoji, accented letters) and the first line exceeds 100 bytes, the slice &first_line[..100] will panic if byte 100 falls inside a multi-byte character sequence.
Example trigger:
// Each CJK character is 3 bytes in UTF-8.
// A line of 34 CJK characters = 102 bytes.
// first_line.len() = 102 > 100, so we try &first_line[..100]
// Byte 100 is in the middle of the 34th character → PANIC
let line = "日本語テストメッセージこれはとても長い行です日本語テストメッセージ"; // 34 chars, 102 bytes
let _ = &line[..100]; // panics: byte index 100 is not a char boundaryThis function is called during context compaction — a critical path that runs automatically when the conversation approaches the token limit. Any user message containing non-ASCII text (CJK, emoji, Arabic, accented characters, etc.) whose first line exceeds 100 bytes will crash the compaction process.
Fix
Replace the byte-based length check and slice with character-aware equivalents:
let truncated = if first_line.chars().count() > 100 {
let truncated_chars: String = first_line.chars().take(100).collect();
format!("{}...", truncated_chars)
} else {
first_line.to_string()
};Error Message
thread 'main' panicked at 'byte index 100 is not a char boundary in the string', cortex-engine/src/compaction.rs:202
Debug Logs
N/A
System Information
Linux (Ubuntu)
Source code review — no runtime needed.
Steps to Reproduce
- Start a conversation with user messages whose first line contains multi-byte UTF-8 characters (e.g. CJK text, emoji) and exceeds 100 bytes in length
- Allow the conversation to grow until context compaction is triggered (token count reaches threshold)
generate_summary()is called, iterates over user messages, and panics at&first_line[..100]