bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mbcel module for Gnulib?, incomplete multibyte sequences


From: Paul Eggert
Subject: Re: mbcel module for Gnulib?, incomplete multibyte sequences
Date: Thu, 27 Jul 2023 22:16:52 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 2023-07-27 12:19, Paul Eggert wrote:

  --- a/lib/mbcel.h
  +++ b/lib/mbcel.h
  @@ -191,3 +191,3 @@ mbcel_scan (char const *p, char const *lim)
     if (_GL_UNLIKELY ((size_t) -1 / 2 < len))
  -    return (mbcel_t) { .err = *p, .len = 1 };
 +    return (mbcel_t) { .err = *p, .len = len == (size_t) -2 ? lim - p : 1 };

Come to think of it, this would merely make mbcel compatible with mbu?iterf?, by causing mbcel to return a length greater than 1 given an incomplete character at input end. But even with this change, mbcel would still not implement the multi-byte-per-encoding-error interpretation ("MEE") behavior that Kuhn and/or the Unicode standard describe. This is because mbu?iterf? doesn't implement MEE either.

For MEE, mbiterf would need something like the attached untested patch, and mbiter, mbcel, etc. would all need similar patches. I'm not suggesting that we make this change, though, as it would bloat the code for little benefit to many callers.

It would be better to change mbu?iterf? to use single-byte-per-encoding-error ("SEE") behavior, as this is simpler and is more consistent with how Emacs etc. behave. Any programs that need MEE can implement it themselves, or if the need is common enough we could add a Gnulib API that an app can use to support MEE when mbiter/mbcel etc. indicate an encoding error.

Attachment: mbiterf-mee.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]