coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multibyte processing - handling invalid sequences (long)


From: Eric Blake
Subject: Re: multibyte processing - handling invalid sequences (long)
Date: Wed, 20 Jul 2016 07:04:51 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 07/20/2016 06:21 AM, Pádraig Brady wrote:

> It's worth considering having a separate (already existing?) util
> to fix data before processing. That could have options to:
>   drop invalid chars, replace with replacement char,
>   apply various http://unicode.org/reports/tr15/#Norm_Forms,
>   convert enclosed forms like ㊷ to 42 etc.
> I.E. we should avoid complicating each util where possible,
> and at least avoid having options on each util that could be
> hoisted to a more general util like above.
> 
> Silently dropping invalid characters probably isn't a great idea,
> and warnings to stderr is a bit messy and could be seen to contradict
> POSIX which suggests exiting with failure if anything output to stderr.
> A compromise might be to just replace invalid chars with
> the replacement character � and then include that in
> normal character processing, to make issues in input apparent.

Since there are several plausible error-handling methods (silently
discard invalid input, flag input as invalid with an error and no
further output, convert invalid input into replacement character and
proceed with output), all of which can be considered desirable in some
circumstances, I wonder if we should give ALL utilities a common
--encoding-error=POLICY option that allows runtime selection between the
three policies, and/or an environment variable that selects the default
policy in absence of a command line choice.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]