Skip to content

[BUG] [v0.1.0] generate_summary() panics on multi-byte UTF-8 characters due to byte-based string slicing (compaction.rs:202) #161

@EnthusiasticTech

Description

@EnthusiasticTech

Bug Report

Project

cortex

Description

In cortex-engine/src/compaction.rs, the generate_summary() function truncates user message lines using a raw byte-index slice &first_line[..100]. The guard condition first_line.len() > 100 compares byte length, so when it falls through to the slice, the index 100 is a byte offset that may land in the middle of a multi-byte UTF-8 character (which can be 2–4 bytes wide). Rust will panic at runtime with a message like byte index 100 is not a char boundary.

Location: src/cortex-engine/src/compaction.rs, lines 201–202

Buggy code:

let truncated = if first_line.len() > 100 {
    format!("{}...", &first_line[..100])  // BUG: byte-based slice panics on multi-byte UTF-8
} else {
    first_line.to_string()
};

first_line.len() returns the byte length, not the character count. When a user message contains multi-byte characters (e.g. CJK, emoji, accented letters) and the first line exceeds 100 bytes, the slice &first_line[..100] will panic if byte 100 falls inside a multi-byte character sequence.

Example trigger:

// Each CJK character is 3 bytes in UTF-8.
// A line of 34 CJK characters = 102 bytes.
// first_line.len() = 102 > 100, so we try &first_line[..100]
// Byte 100 is in the middle of the 34th character → PANIC
let line = "日本語テストメッセージこれはとても長い行です日本語テストメッセージ"; // 34 chars, 102 bytes
let _ = &line[..100]; // panics: byte index 100 is not a char boundary

This function is called during context compaction — a critical path that runs automatically when the conversation approaches the token limit. Any user message containing non-ASCII text (CJK, emoji, Arabic, accented characters, etc.) whose first line exceeds 100 bytes will crash the compaction process.

Fix

Replace the byte-based length check and slice with character-aware equivalents:

let truncated = if first_line.chars().count() > 100 {
    let truncated_chars: String = first_line.chars().take(100).collect();
    format!("{}...", truncated_chars)
} else {
    first_line.to_string()
};

Error Message

thread 'main' panicked at 'byte index 100 is not a char boundary in the string', cortex-engine/src/compaction.rs:202

Debug Logs

N/A

System Information

Linux (Ubuntu)
Source code review — no runtime needed.

Steps to Reproduce

  1. Start a conversation with user messages whose first line contains multi-byte UTF-8 characters (e.g. CJK text, emoji) and exceeds 100 bytes in length
  2. Allow the conversation to grow until context compaction is triggered (token count reaches threshold)
  3. generate_summary() is called, iterates over user messages, and panics at &first_line[..100]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions