Skip to content

Comments

Fix UTF-8 boundary handling in message splitting#18

Merged
chinkan merged 1 commit intomainfrom
claude/debug-agent-skills-error-e4tcI
Feb 22, 2026
Merged

Fix UTF-8 boundary handling in message splitting#18
chinkan merged 1 commit intomainfrom
claude/debug-agent-skills-error-e4tcI

Conversation

@chinkan
Copy link
Owner

@chinkan chinkan commented Feb 22, 2026

Summary

Fixed a potential panic in the split_message function when splitting text at invalid UTF-8 character boundaries.

Key Changes

  • Added UTF-8 boundary validation before slicing text in the message splitting logic
  • When the calculated end position falls in the middle of a multi-byte UTF-8 character, the code now walks back to the nearest valid character boundary
  • This prevents panic errors that could occur when attempting to slice at invalid UTF-8 positions

Implementation Details

The fix adds a safety check that iterates backwards from the calculated end position until a valid UTF-8 character boundary is found (verified via text.is_char_boundary(end)). This ensures that all subsequent string slicing operations work correctly, even when processing text with multi-byte UTF-8 characters.

https://claude.ai/code/session_01MHwScetduzBSN7R1huNoN6

`split_message` computed chunk end positions as byte offsets, then
sliced the string directly. Multi-byte UTF-8 characters (e.g. CJK)
caused a panic when the byte boundary fell inside a character.

Walk `end` back to the nearest valid UTF-8 char boundary before
slicing, so strings containing non-ASCII text are handled safely.

https://claude.ai/code/session_01MHwScetduzBSN7R1huNoN6
@chinkan chinkan merged commit 9b62094 into main Feb 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants