Skip to content

Conversation

@dhr412
Copy link
Contributor

@dhr412 dhr412 commented Jan 19, 2026

Closes #10322

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 19, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing dhr412:wc-invalid-bytes (8b6936f) with main (00f77cc)

Summary

✅ 142 untouched benchmarks
⏩ 180 skipped benchmarks1

Footnotes

  1. 180 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@sylvestre
Copy link
Contributor

some jobs are failing

@dhr412
Copy link
Contributor Author

dhr412 commented Jan 19, 2026

They're failing because the test_utf8 test expects a count of 2119, but after the commit it counts invalid byte sequences as words, increasing the total to 2178.

Should I go ahead and update the test?

@ChrisDryden
Copy link
Collaborator

The general guide is if the gnu implementation conflicts with the test then the test is wrong and needs to be updated. If gnu is giving the same value as the test then there is something wrong with the implementation

@ChrisDryden
Copy link
Collaborator

Validated it locally, this is tricky because there was a difference between different versions of GNU. Yes, the 9.9 version of GNU outputs 2178, so it should be okay to change this test

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

@ChrisDryden
Copy link
Collaborator

Can you clean up the linting with clippy?

error: the borrowed expression implements the required traits
   --> tests/by-util/test_wc.rs:834:18
    |
834 |         .pipe_in(&[b'a', b' ', 0xff, b' ', b'b', b'\n'])
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: change this to: `[b'a', b' ', 0xff, b' ', b'b', b'\n']`
    |

Looks all good to go just failing some of the CI tests

@github-actions
Copy link

GNU testsuite comparison:

Note: The gnu test tests/basenc/bounded-memory is now being skipped but was previously passing.

@ChrisDryden ChrisDryden merged commit f388214 into uutils:main Jan 21, 2026
131 of 132 checks passed
mattsu2020 pushed a commit to mattsu2020/coreutils that referenced this pull request Jan 21, 2026
* wc: fix word undercount with invalid byte sequences

* wc: update utf8 test counts to account invalid byte sequences

* wc: remove unnecessary borrow in test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wc: word undercount when input contains invalid byte sequences

3 participants