fix: prevent UTF-8 slicing panic in Differ::fancy_replace#10
fix: prevent UTF-8 slicing panic in Differ::fancy_replace#10RandyMcMillan wants to merge 4 commits intoDimaKudosh:masterfrom
Conversation
Address a thread panic at src/differ.rs:271:52 caused by byte-indexed string slicing on multi-byte (UTF-8) character boundaries. The previous implementation assumed character indices matched byte offsets, leading to crashes when diffing lines containing emojis, mathematical symbols, or non-ASCII characters. Changes: - Replaced direct byte slicing with character-aware indexing using .char_indices(). - Added a regression test 'test_utf8_intraline_diff_no_panic' to ensure stability when comparing strings with multi-byte characters. Verified on: - x86_64-unknown-linux-gnu - aarch64-apple-darwin 03-000ef697
There was a problem hiding this comment.
Pull request overview
Fixes a panic in Differ::fancy_replace/qformat triggered when handling multi-byte UTF-8 characters by avoiding unsafe byte-indexed slicing, and adds a regression test covering emoji input.
Changes:
- Introduces a helper to split
&strat a character index viachar_indices()and uses it when trimming tag prefixes. - Adds a UTF-8/emoji comparison test to ensure the differ output remains stable and does not panic.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/differ.rs |
Adds character-aware splitting and applies it when trimming diff tag strings in qformat. |
tests/tests.rs |
Adds a regression test that diffs two strings containing emojis. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -268,8 +277,8 @@ impl Differ { | |||
| ); | |||
| common = cmp::min(common, count_leading(first_tags.split_at(common).0, ' ')); | |||
| common = cmp::min(common, count_leading(first_tags.split_at(common).0, ' ')); | |||
qformat now trims the shared prefix against both tag lines instead of checking first_tags twice, which keeps the generated ? lines aligned when the second tag line has different leading spaces. Add a regression test for the alignment case. 00-0013c970
|
#10 (comment): based in @copilot suggestion |
Drop the redundant mut binding when building the second-sequence index. The Vec returned from or_insert_with is still mutated, but the binding itself does not need to be mutable. 02-000ca690
Rename the regression test to test_utf8_intraline_diff_no_panic so it matches the PR description and stays easy to search when debugging similar UTF-8 intraline diff issues. 02-0033b485
There was a problem hiding this comment.
Pull request overview
This PR fixes a UTF-8 panic in Differ’s intraline diff formatting by removing byte-indexed slicing and replacing it with character-aware splitting, ensuring diffs work on multi-byte Unicode (e.g., emojis). It also corrects tag-line alignment logic and adds regression coverage.
Changes:
- Add
split_at_char_boundary()and use it inDiffer::qformat()to avoid slicing&strat non–UTF-8 boundaries. - Fix
qformat()alignment to consider both tag lines (previously applied the first tag line twice). - Add regression tests covering UTF-8 intraline diffs and the corrected alignment behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/differ.rs |
Introduces char-boundary splitting in qformat() and fixes tag-line alignment; adds a unit test for alignment. |
tests/tests.rs |
Adds an integration regression test ensuring UTF-8 intraline diffs don’t panic and include both Unicode variants. |
src/sequencematcher.rs |
Minor cleanup: removes an unnecessary mut binding in chain_second_seq(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Fixes a panic in Differ::fancy_replace/qformat when diffing lines that contain multi-byte UTF-8 characters by avoiding byte-indexed slicing, and adds regression coverage to keep UTF-8 intraline diffs stable.
Changes:
- Added
split_at_char_boundaryto convert a character index into a safe byte boundary before splitting. - Updated
qformatto align based on both tag lines and to split tags using character-aware indexing. - Added a UTF-8 regression test to ensure comparing lines with emojis does not panic.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/differ.rs |
Introduces character-boundary splitting and fixes tag alignment logic in qformat; adds a targeted unit test. |
tests/tests.rs |
Adds a regression test ensuring UTF-8 intraline diffs don’t panic and preserve the original Unicode content. |
src/sequencematcher.rs |
Minor cleanup in chain_second_seq (removes unnecessary mut binding). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| first_tags = Self::split_at_char_boundary(first_tags, common).1.trim_right(); | ||
| second_tags = Self::split_at_char_boundary(second_tags, common).1.trim_right(); |
Address a thread panic at src/differ.rs:271:52 caused by byte-indexed string slicing on multi-byte (UTF-8) character boundaries.
The previous implementation assumed character indices matched byte offsets, leading to crashes when diffing lines containing emojis, mathematical symbols, or non-ASCII characters.
Changes:
Verified on:
03-000ef697