bug#69951: coreutils: printf formatting bug for nb_NO and nn

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales

From:	Pádraig Brady
Subject:	bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales
Date:	Sat, 23 Mar 2024 14:39:04 +0000
User-agent:	Mozilla Thunderbird

tag 69951 notabug
close 69951
stop

On 22/03/2024 20:22, Thomas Dreibholz wrote:

Hi,

I just discovered a printf bug for at least the nb_NO and nn_NO locales
when printing numbers with thousands separator. To reproduce:

#!/bin/bash
for l in de_DE nb_NO ; do
     echo "LC_NUMERIC=$l.UTF-8"
     for n in 1 100 1000 10000 100000 1000000 10000000 ; do
        LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>\n" $n
     done
done

The expected output of "%'10d" is a right-formatted number string with
10 characters.

The output of the test script is fine for e.g. LC_NUMERIC=de_DE.UTF-8
and LC_NUMERIC=en_US.UTF-8:

LC_NUMERIC=de_DE.UTF-8
<         1>
<       100>
<     1.000>
<    10.000>
<   100.000>
< 1.000.000>
<10.000.000>

However, for LC_NUMERIC=nb_NO.UTF-8 and LC_NUMERIC=nn_NO.UTF-8, the
formatting is wrong:

LC_NUMERIC=nb_NO.UTF-8
<         1>
<       100>
<   1 000>
<  10 000>
< 100 000>
<1 000 000>
<10 000 000>

I reproduced the issue with coreutils-8.32-4.1ubuntu1.1 (Ubuntu 22.04)
as well as coreutils-9.3-5.fc39.x86_64 (Fedora 39).

Under FreeBSD 14.0-RELEASE (coreutils-9.4_1), the output looks slightly
better but is still wrong:

LC_NUMERIC=nb_NO.UTF-8
<         1>
<       100>
<    1 000>
<   10 000>
<  100 000>
<1 000 000>
<10 000 000>
LC_NUMERIC=nn_NO.UTF-8
<         1>
<       100>
<    1 000>
<   10 000>
<  100 000>
<1 000 000>
<10 000 000>

May be the issue is that the thousands separator for the Norwegian
locales is a space " ", while it is "."/"," for German/US English locales.


The issue looks to be that the thousands separator for Norwegian locales
is “NARROW NO-BREAK SPACE", or more problematically the _three_ byte
UTF8 sequence E2 80 AF. So it looks like an issue with libc routines
counting bytes rather than characters in this case.

One suggestion is to do the alignment after. For example:

$ export LC_NUMERIC=nb_NO.UTF-8
$ printf "%'.f\n" $(seq -f '1E%.f' 7) | column --table-right=1 -t
        10
       100
     1 000
    10 000
   100 000
 1 000 000
10 000 000

Actually I've just noticed that specifying the %'10.f format
does count characters and not bytes! So another solution is:

$ export LC_NUMERIC=nb_NO.UTF-8
$ printf "%'10.f\n" $(seq -f '1E%.f' 7)
        10
       100
     1 000
    10 000
   100 000
 1 000 000
10 000 000

The issue if there is one is in libc at least.
It would be worth checking existing glibc reports about this
and reporting if not mentioned.

cheers,
Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales, Thomas Dreibholz, 2024/03/22
- bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales, Pádraig Brady <=
- bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales, Thomas Dreibholz, 2024/03/23
- bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales, Thomas Dreibholz, 2024/03/23

Prev by Date: bug#69532: mv's new -x option should be made orthogonal to -t/-T/default
Next by Date: bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales
Previous by thread: bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales
Next by thread: bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales
Index(es):
- Date
- Thread