difflib: parsing modern rust#11
Conversation
Address a thread panic at src/differ.rs:271:52 caused by byte-indexed string slicing on multi-byte (UTF-8) character boundaries. The previous implementation assumed character indices matched byte offsets, leading to crashes when diffing lines containing emojis, mathematical symbols, or non-ASCII characters. Changes: - Replaced direct byte slicing with character-aware indexing using .char_indices(). - Added a regression test 'test_utf8_intraline_diff_no_panic' to ensure stability when comparing strings with multi-byte characters. Verified on: - x86_64-unknown-linux-gnu - aarch64-apple-darwin 03-000ef697
07-000bc119
There was a problem hiding this comment.
Pull request overview
This PR updates the Rust difflib port to better support “modern Rust” string handling and adds coverage for UTF-8 diff output, alongside a small cleanup and a version bump.
Changes:
- Add a UTF-8/emoji regression test for
Differ::compareoutput. - Adjust
Differ::qformatto avoid byte-indexsplit_atusage and replace deprecatedtrim_right()withtrim_end(). - Minor cleanup in
SequenceMatcher::chain_second_seqand bump crate version to0.5.0.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/tests.rs | Adds a UTF-8 compare test to ensure emoji content is preserved in output. |
| src/differ.rs | Introduces split_at_char_boundary and updates qformat trimming/splitting logic. |
| src/sequencematcher.rs | Removes an unnecessary mut binding during map population. |
| Cargo.toml | Bumps crate version from 0.4.0 to 0.5.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fix: make qformat UTF-8 safe and update module paths Switch internal imports to crate::... for the 2018 edition, fix the duplicated qformat prefix check to use second_tags, and keep the UTF-8-safe slicing path so intraline diffs don’t panic on multibyte text. 00-00000805
06-0023829c
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
qformat now trims the shared prefix against both tag lines instead of checking first_tags twice, which keeps the generated ? lines aligned when the second tag line has different leading spaces. Add a regression test for the alignment case. Rename the UTF-8 regression test to match the PR description. This release stays on 0.5.1 because the UTF-8 intraline diff fix changes generated diff output for some inputs. The public API is unchanged, but diff formatting behavior differs, so we treat this as a release-line bump. 07-002e4f71
qformat now trims the shared prefix against both tag lines instead of checking first_tags twice, which keeps the generated ? lines aligned when the second tag line has different leading spaces. The UTF-8 regression test now asserts ? lines and ^ markers so it exercises the intraline diff path instead of passing on a plain replace. Rename the UTF-8 regression test to match the PR description. This release stays on 0.5.1 because the UTF-8 intraline diff fix changes generated diff output for some inputs. The public API is unchanged, but diff formatting behavior differs, so we treat this as a release-line bump. 00-007b6df9
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
offered as a variation on #10
this may be a good merge for @rust-lang 's fork.