[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug)
From: |
Pádraig Brady |
Subject: |
bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug) |
Date: |
Fri, 8 Oct 2021 14:37:42 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0 |
On 04/10/2021 21:01, Paul Eggert wrote:
On 10/4/21 08:58, Pádraig Brady wrote:
The --debug option points out the issue:
$ printf '%s\n' 1,a 0,9 | sort --debug -nk1 -t ,
sort: key 1 is numeric and spans multiple fields
1,a
_
___
0,9
___
___
As Juncheng points out, it is a bit odd that -n and -g disagree here,
even in locales where ',' is not a decimal point. For example:
$ printf '1,a\n0,9\n' | sort -gk1 -t, --debug
sort: text ordering performed using ‘en_US.UTF-8’ sorting rules
sort: key 1 is numeric and spans multiple fields
0,9
_
___
1,a
_
___
$ printf '1,a\n0,9\n' | sort -nk1 -t, --debug
sort: text ordering performed using ‘en_US.UTF-8’ sorting rules
sort: key 1 is numeric and spans multiple fields
1,a
_
___
0,9
___
___
The difference here is due to ',' being treated as a thousands sep,
not a decimal point. So Juncheng to specifically answer your question,
0,9 is being interpreted as 9, which sorts after 1,a. For e.g. consider:
$ printf '%s\n' 1,a 0,900 | sort -s -k1,1g --debug
0,900
_
1,a
_
$ printf '%s\n' 1,a 0,900 | sort -s -k1,1n --debug
1,a
_
0,900
_____
Given the various groupings possible (depending on locale
one can group in 2, 3, 4, 5 digits) we effectively just
ignore the grouping separator in numeric mode, hence the difference.
Note in locales where , is a decimal point we do get
consistent order between -g and -n as expected:
$ printf '%s\n' '1,a' '0,9' | LC_ALL=fr_FR.utf8 sort -s -k1,1n --debug
sort: tri du texte réalisé en utilisant les règles de tri « fr_FR.utf8 »
0,9
___
1,a
__
$ printf '%s\n' '1,a' '0,9' | LC_ALL=fr_FR.utf8 sort -s -k1,1g --debug
sort: tri du texte réalisé en utilisant les règles de tri « fr_FR.utf8 »
0,9
___
1,a
__
For completeness we do have another issue with grouping separators,
where we don't support multi-byte separators appropriately.
For e.g. fr_FR.utf8 uses "narrow non breaking space" as the separator,
which we don't support:
$ sep=$(LC_ALL=fr_FR.utf8 locale thousands_sep)
$ printf '%s\n' 0800 "0${sep}900" | LC_ALL=fr_FR.utf8 sort -s -k1,1n --debug
sort: tri du texte réalisé en utilisant les règles de tri « fr_FR.utf8 »
0 900
_
0800
____
cheers,
Pádraig
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Juncheng Yang, 2021/10/04
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Davide Brini, 2021/10/04
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Pádraig Brady, 2021/10/04
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Paul Eggert, 2021/10/04
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Juncheng Yang, 2021/10/04
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug),
Pádraig Brady <=
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Paul Eggert, 2021/10/08
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Pádraig Brady, 2021/10/08
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Paul Eggert, 2021/10/08
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Pádraig Brady, 2021/10/09
- bug#51011: [GNU sort] Numerical sort with delimiter may be broken (bug), Paul Eggert, 2021/10/09
- bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping chars, Pádraig Brady, 2021/10/10
- bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping chars, Bernhard Voelker, 2021/10/10
- bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping chars, Paul Eggert, 2021/10/10
- bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping chars, Pádraig Brady, 2021/10/10
- bug#51011: [PATCH] sort: --debug: add warnings about radix and grouping chars, Paul Eggert, 2021/10/10