wc: fix word undercount with invalid byte sequences #10348

dhr412 · 2026-01-19T10:03:01Z

github-actions · 2026-01-19T10:13:47Z

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

codspeed-hq · 2026-01-19T10:27:36Z

CodSpeed Performance Report

Merging this PR will not alter performance

_{Comparing dhr412:wc-invalid-bytes (8b6936f) with main (00f77cc)}

Summary

✅ 142 untouched benchmarks
⏩ 180 skipped benchmarks¹

180 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

sylvestre · 2026-01-19T12:53:26Z

some jobs are failing

dhr412 · 2026-01-19T13:20:39Z

They're failing because the test_utf8 test expects a count of 2119, but after the commit it counts invalid byte sequences as words, increasing the total to 2178.

Should I go ahead and update the test?

ChrisDryden · 2026-01-19T13:25:30Z

The general guide is if the gnu implementation conflicts with the test then the test is wrong and needs to be updated. If gnu is giving the same value as the test then there is something wrong with the implementation

ChrisDryden · 2026-01-19T13:31:27Z

Validated it locally, this is tricky because there was a difference between different versions of GNU. Yes, the 9.9 version of GNU outputs 2178, so it should be okay to change this test

github-actions · 2026-01-19T14:32:36Z

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

ChrisDryden · 2026-01-21T02:45:41Z

Can you clean up the linting with clippy?

error: the borrowed expression implements the required traits
   --> tests/by-util/test_wc.rs:834:18
    |
834 |         .pipe_in(&[b'a', b' ', 0xff, b' ', b'b', b'\n'])
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: change this to: `[b'a', b' ', 0xff, b' ', b'b', b'\n']`
    |

Looks all good to go just failing some of the CI tests

github-actions · 2026-01-21T05:45:44Z

GNU testsuite comparison:

Note: The gnu test tests/basenc/bounded-memory is now being skipped but was previously passing.

* wc: fix word undercount with invalid byte sequences * wc: update utf8 test counts to account invalid byte sequences * wc: remove unnecessary borrow in test

wc: fix word undercount with invalid byte sequences

e2840d3

wc: update utf8 test counts to account invalid byte sequences

4fc50fe

wc: remove unnecessary borrow in test

8b6936f

ChrisDryden merged commit f388214 into uutils:main Jan 21, 2026
131 of 132 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

wc: fix word undercount with invalid byte sequences #10348

wc: fix word undercount with invalid byte sequences #10348

Uh oh!

dhr412 commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

codspeed-hq bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

sylvestre commented Jan 19, 2026

Uh oh!

dhr412 commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

wc: fix word undercount with invalid byte sequences #10348

wc: fix word undercount with invalid byte sequences #10348

Uh oh!

Conversation

dhr412 commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

codspeed-hq bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will not alter performance

Summary

Footnotes

Uh oh!

sylvestre commented Jan 19, 2026

Uh oh!

dhr412 commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq bot commented Jan 19, 2026 •

edited

Loading