[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mbcel module for Gnulib?, incomplete multibyte sequences
From: |
Bruno Haible |
Subject: |
Re: mbcel module for Gnulib?, incomplete multibyte sequences |
Date: |
Tue, 25 Jul 2023 02:07:55 +0200 |
Paul Eggert wrote:
> >> in UTF-8 the byte sequence E0 80 is not an incomplete character
> >> (in the sense that additional bytes may lead to a complete character),
> >> because every byte you append to E0 80 causes glibc mbrtoc32 to return
> >> (size_t) -1. Yet glibc mbrtoc32 returns (size_t) -2 for E0 80.
> >
> > And gnulib/lib/unistr/u8-mbtouc-aux.c does it wrong as well!
> > The return value for E0 {80..9F} should be (size_t) -1, because
> > U+0800 is E0 A0 80.
> >
> > I'll fix the gnulib part soon. Very good point. It looks like few people
> > understood the implications of
> > https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125, table 3-7.
>
> I hope we don't need to replace mbrtoc32 merely because of this obscure
> issue.
We don't need to, because
- it's not blatant POSIX failure,
- this bug has been in glibc for 20 years and in GNU libunistring for
more than 10 years, and no one noticed,
- the difference is only whether mbrtowc() returns (size_t)-1 vs.
(size_t)-2.
Bruno
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, (continued)
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/17
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/25
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/22
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/24
- Re: mbcel module for Gnulib?, incomplete multibyte sequences,
Bruno Haible <=
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/27
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/28
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/26
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/10
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/11
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/13
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/13