Re: From wchar_t to char32

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: From wchar_t to char32_t

From:	Paul Eggert
Subject:	Re: From wchar_t to char32_t
Date:	Mon, 3 Jul 2023 16:30:04 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 2023-07-03 15:00, Bruno Haible wrote:

   Level 3: Behave correctly. Don't split a 2-Unicode-character sequence.
            This is what code that uses mbrtoc32() does, when it has the
            lines
                 if (bytes == (size_t) -3)
                   bytes = 0;
            and uses !mbsinit (&state) in the loop termination condition.

With diffutils even level 3 would not suffice, since diffutils truncatesat input byte boundaries, so it doesn't suffice to merely treat (size_t)-3 as zero even if one also checks mbsinit. Instead, one would have totreat all the characters in the sequence ABBB... (where A is an ordinarymultibyte character and the Bs all return (size_t) -3) as a single unit,because one cannot truncate in the middle of that sequence. Or wait aminute - in theory I suppose it could even be an arbitrary sequence ofAs and Bs, so long as the total "sizes" of the As equals the number ofbytes in the original byte sequence that stands for a series of characters.

The diffutils truncation approach also has problems with coding systemsthat have shift state, but that's OK: nobody uses these coding systemswith GNU apps as they're not practical. Similarly, any platform wherembrtoc32 returns (size_t) -3 won't be practical with GNU apps, so itshould be OK for diffutils to not worry about this possibility either,given that it would be a hassle to support it. We don't have time tosupport every oddball coding system that POSIX allows.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
  - Re: From wchar_t to char32_t, Paul Eggert, 2023/07/02
    - Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
    - Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
    - Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
    - Re: From wchar_t to char32_t, Bruno Haible, 2023/07/03
    - Re: From wchar_t to char32_t, Paul Eggert <=
    - Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
    - Re: From wchar_t to char32_t, Paul Eggert, 2023/07/04
    - Re: From wchar_t to char32_t, Bruno Haible, 2023/07/06
    - Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
    - mbcel module for Gnulib?, Paul Eggert, 2023/07/09
    - Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/11
    - Re: mbcel module for Gnulib?, Paul Eggert, 2023/07/12
    - Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/13
    - Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/16
    - Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/20

Prev by Date: Re: libunistring v1.1 : 22 errors during `make check`
Next by Date: Re: proposed performance tweaks to Gnulib mbchar module
Previous by thread: Re: From wchar_t to char32_t
Next by thread: Re: From wchar_t to char32_t
Index(es):
- Date
- Thread