bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mbcel module for Gnulib?, incomplete multibyte sequences


From: Bruno Haible
Subject: Re: mbcel module for Gnulib?, incomplete multibyte sequences
Date: Tue, 25 Jul 2023 00:58:55 +0200

Paul Eggert wrote:
> in UTF-8 the byte sequence E0 80 is not an incomplete character 
> (in the sense that additional bytes may lead to a complete character), 
> because every byte you append to E0 80 causes glibc mbrtoc32 to return 
> (size_t) -1. Yet glibc mbrtoc32 returns (size_t) -2 for E0 80.

And gnulib/lib/unistr/u8-mbtouc-aux.c does it wrong as well!
The return value for E0 {80..9F} should be (size_t) -1, because
U+0800 is E0 A0 80.

I'll fix the gnulib part soon. Very good point. It looks like few people
understood the implications of
https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125, table 3-7.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]