bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7960: [PATCH] fmt: fix formatting multibyte text (bug #7372)


From: Eric Blake
Subject: bug#7960: [PATCH] fmt: fix formatting multibyte text (bug #7372)
Date: Wed, 02 Feb 2011 14:33:44 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7

[readding the list]

On 02/02/2011 02:11 PM, Kostya Stopani wrote:
> On Wed, Feb 02, 2011 at 10:15:53AM -0700, Eric Blake wrote:
> 
>> Thanks for the patch.  However, it's not trivial, so it would need
>> copyright assignment.
> 
> Oh boy... Anyway I don't mind signing papers, if you (or whoever)
> don't mind bothering with it.

OK, I'll send you those details off-list.

> 
>> Furthermore, there are already known issues where upstream coreutils
>> is lacking multibyte character support, but a solution has to be
>> both maintainable and no-impact to the single-byte locale case.
> 
> I believe this patch doesn't break single-byte behavior because no
> conversion takes place. mbsnrtowcs() is used only to count
> characters. I've tested various cases (8-bit encoding was KOI8-R):
> 
> |--------+---------------+--------------------------|
> | Locale | Text encoding | Result                   |
> |--------+---------------+--------------------------|
> | UTF-8  | UTF-8         | old fmt: text too narrow |
> |        |               | new fmt: ok              |
> |--------+---------------+--------------------------|
> | UTF-8  | 8-bit         | same                     |
> |--------+---------------+--------------------------|
> | 8-bit  | UTF-8         | same                     |
> |--------+---------------+--------------------------|
> | 8-bit  | 8-bit         | same                     |
> |--------+---------------+--------------------------|
> 
> From my point of view the alternative is to convert everything to
> wchar_t, which imposes the need to keep track of conversion errors and
> gracefully fall back to single-byte.

Keeping things in multibyte rather than converting to wchar_t is the way
to go (especially given the ongoing discussion of how to handle the fact
that on cygwin, wchar_t is UTF-16 and thus still multi-unit as an
extension to POSIX, with all sorts of ramifications to programs that
expect POSIX semantics).

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]