[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33775: fold: counting multi-byte utf-8 sequences as separate columns
From: |
Michael Siegel |
Subject: |
bug#33775: fold: counting multi-byte utf-8 sequences as separate columns |
Date: |
Mon, 17 Dec 2018 02:32:55 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 |
Hello,
I've just discovered an odd behavior of `fold' while trying to wrap a
piece of text containing phonetic characters.
Take the following line, for example:
Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level,
It is 71 characters long. Still, running
echo "Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a
high-level," | fold -w 72 -s
produces
Tcl (pronounced tickle or tee cee ell /ˈtiː siː ɛl/) is a
high-level,
I've had someone test this with FreeBSD's `fold', which didn't behave
that way. Instead, it filled out the line as expected.
Further investigation by developers of Adélie Linux revealed that GNU's
`fold' is counting multi-byte utf-8 sequences (in this case, the
phonetic characters) as separate columns:
awilcox on gwyn [pts/11 Sun 16 19:01] ~: cat testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^
yep.
awilcox on gwyn [pts/11 Sun 16 19:01] ~: fold -w 72 -s testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70
chars ^
yep.
msi
- bug#33775: fold: counting multi-byte utf-8 sequences as separate columns,
Michael Siegel <=