Skip to content

difflib: parsing modern rust#11

Open
RandyMcMillan wants to merge 8 commits intoDimaKudosh:masterfrom
gnostr-org:1876/947305/418767/f035fb8/000000b-difflib-parsing-modern-rust
Open

difflib: parsing modern rust#11
RandyMcMillan wants to merge 8 commits intoDimaKudosh:masterfrom
gnostr-org:1876/947305/418767/f035fb8/000000b-difflib-parsing-modern-rust

Conversation

@RandyMcMillan
Copy link
Copy Markdown

@RandyMcMillan RandyMcMillan commented Apr 30, 2026

offered as a variation on #10

this may be a good merge for @rust-lang 's fork.

Address a thread panic at src/differ.rs:271:52 caused by byte-indexed
string slicing on multi-byte (UTF-8) character boundaries.

The previous implementation assumed character indices matched byte
offsets, leading to crashes when diffing lines containing emojis,
mathematical symbols, or non-ASCII characters.

Changes:
- Replaced direct byte slicing with character-aware indexing using .char_indices().
- Added a regression test 'test_utf8_intraline_diff_no_panic' to ensure
  stability when comparing strings with multi-byte characters.

Verified on:
- x86_64-unknown-linux-gnu
- aarch64-apple-darwin

03-000ef697
Copilot AI review requested due to automatic review settings April 30, 2026 15:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Rust difflib port to better support “modern Rust” string handling and adds coverage for UTF-8 diff output, alongside a small cleanup and a version bump.

Changes:

  • Add a UTF-8/emoji regression test for Differ::compare output.
  • Adjust Differ::qformat to avoid byte-index split_at usage and replace deprecated trim_right() with trim_end().
  • Minor cleanup in SequenceMatcher::chain_second_seq and bump crate version to 0.5.0.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/tests.rs Adds a UTF-8 compare test to ensure emoji content is preserved in output.
src/differ.rs Introduces split_at_char_boundary and updates qformat trimming/splitting logic.
src/sequencematcher.rs Removes an unnecessary mut binding during map population.
Cargo.toml Bumps crate version from 0.4.0 to 0.5.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/differ.rs
@RandyMcMillan RandyMcMillan changed the title 1876/947305/418767/f035fb8/000000b difflib parsing modern rust difflib: parsing modern rust Apr 30, 2026
fix: make qformat UTF-8 safe and update module paths
Switch internal imports to crate::... for the 2018 edition, fix the
duplicated qformat prefix check to use second_tags, and keep the
UTF-8-safe slicing path so intraline diffs don’t panic on multibyte text.

00-00000805
@RandyMcMillan
Copy link
Copy Markdown
Author

000000f: update per @copilot review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Cargo.toml
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/tests.rs
qformat now trims the shared prefix against both tag lines instead of
 checking first_tags twice, which keeps the generated ? lines aligned when the
 second tag line has different leading spaces.
Add a regression test for the alignment case.
Rename the UTF-8 regression test to match the PR description.
This release stays on 0.5.1 because the UTF-8 intraline diff fix changes
 generated diff output for some inputs.
The public API is unchanged, but diff formatting behavior differs, so we
 treat this as a release-line bump.

07-002e4f71
qformat now trims the shared prefix against both tag lines instead of
 checking first_tags twice, which keeps the generated ? lines aligned when the
 second tag line has different leading spaces.
The UTF-8 regression test now asserts ? lines and ^ markers so it
 exercises the intraline diff path instead of passing on a plain replace.
Rename the UTF-8 regression test to match the PR description.
This release stays on 0.5.1 because the UTF-8 intraline diff fix changes
 generated diff output for some inputs.
The public API is unchanged, but diff formatting behavior differs, so we
 treat this as a release-line bump.

00-007b6df9
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants