fix: numeric sort (-n) does not recognize thousand separators #10339

Kaua-Klassmann · 2026-01-18T23:52:02Z

Description

This PR fixes the numeric sort to recognize the the locale's thousands separator

Changes

Added function to export the locale grouping separators on i18n from uucore
Changed "thousands_separator" on NumInfoParseSettings from Option to Option to properly support Unicode separators (like non-breaking spaces)
Updated to get the locale grouping separators to set on NumInfoParseSettings to sort

github-actions · 2026-01-19T06:17:25Z

GNU testsuite comparison:

GNU test failed: tests/sort/sort-debug-keys. tests/sort/sort-debug-keys is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

codspeed-hq · 2026-01-19T06:30:49Z

Merging this PR will not alter performance

✅ 142 untouched benchmarks
⏩ 180 skipped benchmarks¹

_{Comparing Kaua-Klassmann:fix_numeric_sort_does_not_recognize_thousand_separators (e3a4b05) with main (cbbff30)}

180 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

sylvestre · 2026-01-19T07:50:05Z

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

well done!

But some jobs are failing

sylvestre · 2026-01-19T07:50:54Z

please also add some high level integration tests in tests/by-util/test_sort.rs

ChrisDryden · 2026-01-19T12:35:36Z

It seems that this ICU library and the libc call have differing values for the separator, For example french here is U+202F but using libc its a regular space

github-actions · 2026-01-20T07:37:08Z

GNU testsuite comparison:

GNU test failed: tests/dd/stderr. tests/dd/stderr is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-debug-keys. tests/sort/sort-debug-keys is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/tac/tac-2-nonseekable. tests/tac/tac-2-nonseekable is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/tail/follow-stdin. tests/tail/follow-stdin is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/rm/rm1 (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

Kaua-Klassmann · 2026-01-20T22:18:52Z

My PR and issue are correct? I world like to contribute to the repository, but I'm beginner in opensource projects.

sylvestre · 2026-01-20T22:24:55Z

yeah, it looks good. i just need more time to verify it

github-actions · 2026-01-21T02:56:44Z

GNU testsuite comparison:

GNU test failed: tests/misc/unexpand. tests/misc/unexpand is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/sort/sort-debug-keys. tests/sort/sort-debug-keys is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

ChrisDryden · 2026-01-21T03:01:12Z

src/uu/sort/src/numeric_str_cmp.rs


        let mut first_char = true;

        for (idx, &char) in num.iter().enumerate() {


This might be outside the scope of this PR since this logic appears to have been here before, but this for loop is incompatible with thousand separators that are multi byte characters (my understanding is that the french one is?). I think the approach might have to be converting this into a while loop so that you can skip over multiple bytes when the thousand separator is matched

echo -e "100\n1\xe2\x80\xaf234\n5\xe2\x80\xaf678" | LC_ALL=fr_FR.utf8 ./target/debug/sort -n

Wow I am mistaken, it appears that this is actually a known limitation of sort, GNU sort currently checks to see if it is multi byte and it it is it disables it.

ChrisDryden · 2026-01-21T11:35:31Z

src/uu/sort/src/numeric_str_cmp.rs

 pub struct NumInfoParseSettings {
    pub accept_si_units: bool,
-    pub thousands_separator: Option<u8>,
+    pub thousands_separator: Option<char>,


This may seem really counterintuitive but this should stay a u8, if you look at the other comment its because GNU thousand seperation actually doesn't work for multi byte chars so to keep compatibility you need to check to see if its multi byte to see if it should be enabled

That's strange... but I can change it without any problem

…housand_separators

ChrisDryden · 2026-01-21T13:43:11Z

src/uu/sort/src/sort.rs

+            // Get the thousands separator from the locale, handling cases where the separator is empty or multi-character
+            let locale_thousands_separator = i18n::decimal::locale_grouping_separator().as_bytes();
+
+            let thousands_separator = match locale_thousands_separator {


Mind adding a comment here about upstream GNU not being able to handle this, for people reading the code to understand why we're doing this

github-actions · 2026-01-21T14:05:02Z

GNU testsuite comparison:

GNU test failed: tests/sort/sort-debug-keys. tests/sort/sort-debug-keys is passing on 'main'. Maybe you have to rebase?

ChrisDryden · 2026-01-21T14:07:20Z

So theres two PRs here that implement the thousands seperator for sort, the other one doesn't have the changes in the test data. I think its because this pr falls back through LC_ALL → LC_NUMERIC → LANG.

I'm still trying to reconcile the differences related to the two PR's. I think we still need the tokenization fixes that are added in #9848 trying to come up with explicit examples that show the deviation with gnu sort for that use case

ChrisDryden · 2026-01-21T14:16:27Z

echo -e "1 000\n500" | LC_ALL=fr_FR.utf8 /usr/bin/sort -n

Shows differently in gnu sort compared to this implementation. I think you need to detect if its a blank thousands seperator and also this is missing the HumanNumeric handling, for example:

echo -e "1 K\n500" | LC_ALL=fr_FR.utf8 /usr/bin/sort -h

ChrisDryden · 2026-01-21T14:18:48Z

If I was to strategize about the best way forward here, I really like the way that you are using the ICU library to get the locale separation data instead of calling libc, and that you have the correct fallback for getting the fallback data, then @mattsu2020 has the implementations for the two things that you are missing here.

@sylvestre mind if we merge this and then we re-base @mattsu2020's pr for this with the additional changes for HumanNumeric and the tokenization changes, to get the best of both?

sylvestre · 2026-01-21T20:48:54Z

sounds good!

…#10339)

Kaua-Klassmann changed the title ~~fix :numeric sort (-n) does not recognize thousand separators~~ fix: numeric sort (-n) does not recognize thousand separators Jan 19, 2026

Kaua-Klassmann force-pushed the fix_numeric_sort_does_not_recognize_thousand_separators branch from 4fa661e to a751737 Compare January 19, 2026 17:11

Kaua-Klassmann added 3 commits January 20, 2026 23:24

export the locale grouping separator from i18n decimal

03d39a1

fix: numeric sort does not recognize thousand separators

c4107c4

fix: update tests to sort

1fa6681

sylvestre force-pushed the fix_numeric_sort_does_not_recognize_thousand_separators branch from a751737 to 1fa6681 Compare January 20, 2026 22:24

ChrisDryden reviewed Jan 21, 2026

View reviewed changes

Merge branch 'uutils:main' into fix_numeric_sort_does_not_recognize_t…

49048b4

…housand_separators

ChrisDryden reviewed Jan 21, 2026

View reviewed changes

Kaua-Klassmann force-pushed the fix_numeric_sort_does_not_recognize_thousand_separators branch from 3885736 to e3a4b05 Compare January 21, 2026 13:48

fix: validate if thousands_separator is not multi byte

e3a4b05

sylvestre merged commit 1a64417 into uutils:main Jan 21, 2026
130 of 131 checks passed

ChrisDryden mentioned this pull request Jan 21, 2026

feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848

Open

Kaua-Klassmann deleted the fix_numeric_sort_does_not_recognize_thousand_separators branch January 21, 2026 21:26

mattsu2020 pushed a commit to mattsu2020/coreutils that referenced this pull request Jan 21, 2026

fix: numeric sort (-n) does not recognize thousand separators (uutils…

5c459a1

…#10339)


		let mut first_char = true;

		for (idx, &char) in num.iter().enumerate() {

Uh oh!

fix: numeric sort (-n) does not recognize thousand separators #10339

fix: numeric sort (-n) does not recognize thousand separators #10339

Uh oh!

Conversation

Kaua-Klassmann commented Jan 18, 2026

Description

Changes

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

codspeed-hq bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

sylvestre commented Jan 19, 2026

Uh oh!

sylvestre commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Kaua-Klassmann commented Jan 20, 2026

Uh oh!

sylvestre commented Jan 20, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

ChrisDryden Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChrisDryden Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ChrisDryden Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Kaua-Klassmann Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ChrisDryden Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

ChrisDryden commented Jan 21, 2026

Uh oh!

ChrisDryden commented Jan 21, 2026

Uh oh!

ChrisDryden commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sylvestre commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq bot commented Jan 19, 2026 •

edited

Loading

ChrisDryden Jan 21, 2026 •

edited

Loading

ChrisDryden commented Jan 21, 2026 •

edited

Loading