[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort -V behaviour
From: |
Kaz Kylheku (Coreutils) |
Subject: |
Re: sort -V behaviour |
Date: |
Mon, 31 Jul 2017 11:41:07 -0700 |
User-agent: |
Roundcube Webmail/0.9.2 |
On 31.07.2017 09:23, Sven C. Dack wrote:
Hello,
I have a question about the -V option to sort, but first some examples:
$ echo -e "1\n1.2\n1.2.3\n1.2.3.4"|sort -V
1
1.2
1.2.3
1.2.3.4
$ echo -e "f1\nf1.2\nf1.2.3\nf1.2.3.4"|sort -V
f1
f1.2
f1.2.3
f1.2.3.4
$ echo -e "/1\n/1.2\n/1.2.3\n/1.2.3.4"|sort -V
/1
/1.2
/1.2.3
/1.2.3.4
$ echo -e "1f\n1.2f\n1.2.3f\n1.2.3.4f"|sort -V
1f
1.2f
1.2.3f
1.2.3.4f
Note that this also has a problem, though the behavior is what you
expect, so you don't notice.
Here, only the last three lines of input contain version numbers.
In each one, the last dot and everything after it is considered
a suffix; the versions being sorted are "1", "1.2" and "1.2.3".
$ echo -e "1/\n1.2/\n1.2.3/\n1.2.3.4/"|sort -V
1.2.3.4/
1.2.3/
1.2/
1/
My question is, why does the -V option reverse the order in the last
case?
From the info documentation:
Version-sorted strings are compared such that if VER1 and VER2 are
version numbers and PREFIX and SUFFIX (SUFFIX matching the regular
expression `(\.[A-Za-z~][A-Za-z0-9~]*)*') are strings then VER1 < VER2
implies that the name composed of "PREFIX VER1 SUFFIX" sorts before
"PREFIX VER2 SUFFIX".
Looks the SUFFIX regex doesn't match, so these names are not treated
as version names. It doesn't match because of the trailing slash.
If the trailing slash were included in the suffix match, there would
still be the problem that .4/, .3/ and ./2 are a the suffix,
and the version numbers are "1.2.3", "1.2", and "1", with the last
"1" being a non-version-number input.
Also this is noted:
This functionality is implemented using gnulib's `filevercmp'
function, which has some caveats worth noting.
[...]
* Some suffixes will not be matched by the regular expression
mentioned above. Consequently these examples may not sort as you
expect:
abc-1.2.3.4.7z
abc-1.2.3.7z
abc-1.2.3.4.x86_64.rpm
abc-1.2.3.x86_64.rpm
Oops! And as you can see from these examples it is tricky.
Sometimes suffixes contain numeric stuff, which is why it's
specified that way.
Here the .7z files don't match the requirement for treatment as
version numbers, because the suffix, following the required period,
must begin with a letter or tilde.
This behaviour is unintuitive and seems wrong to me.
I agree that the specification is not ideal, but it's not easy to
see how it can be improved given the threat of numeric junk like 7z
which cannot be treated as part of the version.
Consider that 1.7z looks like a bigger version than 1.2.7z,
if the 7 is wrongly treated as part of the version!!!
The designers who specified the filevercmp function were clearly
sober to these cases.