fix: prevent UTF-8 slicing panic in Differ::fancy_replace by RandyMcMillan · Pull Request #10 · DimaKudosh/difflib

RandyMcMillan · 2026-04-30T15:43:58Z

Address a thread panic at src/differ.rs:271:52 caused by byte-indexed string slicing on multi-byte (UTF-8) character boundaries.

The previous implementation assumed character indices matched byte offsets, leading to crashes when diffing lines containing emojis, mathematical symbols, or non-ASCII characters.

Changes:

Replaced direct byte slicing with character-aware indexing using .char_indices().
Added a regression test 'test_utf8_intraline_diff_no_panic' to ensure stability when comparing strings with multi-byte characters.

Verified on:

x86_64-unknown-linux-gnu
aarch64-apple-darwin

03-000ef697

Address a thread panic at src/differ.rs:271:52 caused by byte-indexed string slicing on multi-byte (UTF-8) character boundaries. The previous implementation assumed character indices matched byte offsets, leading to crashes when diffing lines containing emojis, mathematical symbols, or non-ASCII characters. Changes: - Replaced direct byte slicing with character-aware indexing using .char_indices(). - Added a regression test 'test_utf8_intraline_diff_no_panic' to ensure stability when comparing strings with multi-byte characters. Verified on: - x86_64-unknown-linux-gnu - aarch64-apple-darwin 03-000ef697

Copilot

Pull request overview

Fixes a panic in Differ::fancy_replace/qformat triggered when handling multi-byte UTF-8 characters by avoiding unsafe byte-indexed slicing, and adds a regression test covering emoji input.

Changes:

Introduces a helper to split &str at a character index via char_indices() and uses it when trimming tag prefixes.
Adds a UTF-8/emoji comparison test to ensure the differ output remains stable and does not panic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/differ.rs`	Adds character-aware splitting and applies it when trimming diff tag strings in `qformat`.
`tests/tests.rs`	Adds a regression test that diffs two strings containing emojis.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -268,8 +277,8 @@ impl Differ {
        );
        common = cmp::min(common, count_leading(first_tags.split_at(common).0, ' '));
        common = cmp::min(common, count_leading(first_tags.split_at(common).0, ' '));


qformat now trims the shared prefix against both tag lines instead of checking first_tags twice, which keeps the generated ? lines aligned when the second tag line has different leading spaces. Add a regression test for the alignment case. 00-0013c970

RandyMcMillan · 2026-04-30T17:22:28Z

#10 (comment): based in @copilot suggestion

Drop the redundant mut binding when building the second-sequence index. The Vec returned from or_insert_with is still mutated, but the binding itself does not need to be mutable. 02-000ca690

Rename the regression test to test_utf8_intraline_diff_no_panic so it matches the PR description and stays easy to search when debugging similar UTF-8 intraline diff issues. 02-0033b485

Copilot

Pull request overview

This PR fixes a UTF-8 panic in Differ’s intraline diff formatting by removing byte-indexed slicing and replacing it with character-aware splitting, ensuring diffs work on multi-byte Unicode (e.g., emojis). It also corrects tag-line alignment logic and adds regression coverage.

Changes:

Add split_at_char_boundary() and use it in Differ::qformat() to avoid slicing &str at non–UTF-8 boundaries.
Fix qformat() alignment to consider both tag lines (previously applied the first tag line twice).
Add regression tests covering UTF-8 intraline diffs and the corrected alignment behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`src/differ.rs`	Introduces char-boundary splitting in `qformat()` and fixes tag-line alignment; adds a unit test for alignment.
`tests/tests.rs`	Adds an integration regression test ensuring UTF-8 intraline diffs don’t panic and include both Unicode variants.
`src/sequencematcher.rs`	Minor cleanup: removes an unnecessary `mut` binding in `chain_second_seq()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Fixes a panic in Differ::fancy_replace/qformat when diffing lines that contain multi-byte UTF-8 characters by avoiding byte-indexed slicing, and adds regression coverage to keep UTF-8 intraline diffs stable.

Changes:

Added split_at_char_boundary to convert a character index into a safe byte boundary before splitting.
Updated qformat to align based on both tag lines and to split tags using character-aware indexing.
Added a UTF-8 regression test to ensure comparing lines with emojis does not panic.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`src/differ.rs`	Introduces character-boundary splitting and fixes tag alignment logic in `qformat`; adds a targeted unit test.
`tests/tests.rs`	Adds a regression test ensuring UTF-8 intraline diffs don’t panic and preserve the original Unicode content.
`src/sequencematcher.rs`	Minor cleanup in `chain_second_seq` (removes unnecessary `mut` binding).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        first_tags = Self::split_at_char_boundary(first_tags, common).1.trim_right();
+        second_tags = Self::split_at_char_boundary(second_tags, common).1.trim_right();


Copilot AI review requested due to automatic review settings April 30, 2026 15:43

Copilot started reviewing on behalf of RandyMcMillan April 30, 2026 15:44 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/tests.rs

Comment thread src/differ.rs Outdated

@@ -268,8 +277,8 @@ impl Differ {

);

common = cmp::min(common, count_leading(first_tags.split_at(common).0, ' '));

common = cmp::min(common, count_leading(first_tags.split_at(common).0, ' '));

RandyMcMillan mentioned this pull request Apr 30, 2026

difflib: parsing modern rust #11

Open

RandyMcMillan added 2 commits April 30, 2026 13:25

fix: remove unnecessary mut in sequencematcher

000000b

Drop the redundant mut binding when building the second-sequence index. The Vec returned from or_insert_with is still mutated, but the binding itself does not need to be mutable. 02-000ca690

fix: rename utf8 intraline regression test

000000d

Rename the regression test to test_utf8_intraline_diff_no_panic so it matches the PR description and stays easy to search when debugging similar UTF-8 intraline diff issues. 02-0033b485

RandyMcMillan requested a review from Copilot April 30, 2026 17:31

Copilot started reviewing on behalf of RandyMcMillan April 30, 2026 17:31 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

RandyMcMillan requested a review from Copilot April 30, 2026 17:43

Copilot started reviewing on behalf of RandyMcMillan April 30, 2026 17:43 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread src/differ.rs

Comment on lines +280 to +281

first_tags = Self::split_at_char_boundary(first_tags, common).1.trim_right();

second_tags = Self::split_at_char_boundary(second_tags, common).1.trim_right();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent UTF-8 slicing panic in Differ::fancy_replace#10

fix: prevent UTF-8 slicing panic in Differ::fancy_replace#10
RandyMcMillan wants to merge 4 commits intoDimaKudosh:masterfrom
gnostr-org:1876/947305/418560/f035fb8/0009f77-difflib-parsing

RandyMcMillan commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

RandyMcMillan commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		first_tags = Self::split_at_char_boundary(first_tags, common).1.trim_right();
		second_tags = Self::split_at_char_boundary(second_tags, common).1.trim_right();

Conversation

RandyMcMillan commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

RandyMcMillan commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants