bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33775: fold: counting multi-byte utf-8 sequences as separate columns


From: Michael Siegel
Subject: bug#33775: fold: counting multi-byte utf-8 sequences as separate columns
Date: Mon, 17 Dec 2018 02:32:55 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Thunderbird/60.3.0

Hello,

I've just discovered an odd behavior of `fold' while trying to wrap a
piece of text containing phonetic characters.

Take the following line, for example:

Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level,

It is 71 characters long. Still, running

echo "Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a
high-level," | fold -w 72 -s

produces

Tcl (pronounced tickle or tee cee ell /ˈtiː siː ɛl/) is a
high-level,

I've had someone test this with FreeBSD's `fold', which didn't behave
that way. Instead, it filled out the line as expected.

Further investigation by developers of Adélie Linux revealed that GNU's
`fold' is counting multi-byte utf-8 sequences (in this case, the
phonetic characters) as separate columns:

awilcox on gwyn [pts/11 Sun 16 19:01] ~: cat testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^
yep.
awilcox on gwyn [pts/11 Sun 16 19:01] ~: fold -w 72 -s testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70
chars ^
yep.



msi





reply via email to

[Prev in Thread] Current Thread [Next in Thread]