bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: From wchar_t to char32_t


From: Bruno Haible
Subject: Re: From wchar_t to char32_t
Date: Thu, 06 Jul 2023 20:34:56 +0200

Paul Eggert wrote:
> I still see a couple of problems with it. First, it mishandles the case 
> where mbrtoc32 returns 0, which ISO C allows.

I thought that we could assume that no locale encoding maps a multibyte
sequence other than "\0" to (char32_t) 0. But OK, if you don't want to
assume that, it's easy to not assume it.

> Second and more interestingly, its "fwrite (tp0, 1, bytes, out);" could 
> output a byte string that represents multiple characters where the first 
> character fits in the output column width but the remaining characters 
> do not, and this would exceed the output column width.

This is a hypothetical scenario, because in most cases, the several
Unicode characters that come out of a multibyte sequence consist of a
base character (of width > 0) and one or more non-spacing marks
(of width == 0). But OK, let's make the minimum possible number of
assumptions...

> I suppose one could fix the second problem by not outputting such a byte 
> string, just as the code already suppresses the output of a byte string 
> representing a single character that occupies multiple columns 
> straddling the column border. That is, count all the columns of all the 
> characters that the byte string represents, before deciding whether to 
> output the byte string.

Indeed, this is the solution that makes no assumptions. Find a patch that
does it.

I had expected that the replacement mbrtowc -> mbrtoc32 would be purely
mechanical; I'm surprised that it requires application-specific considerations
here.

Also find attached a unit test. Just to verify that my change and future changes
make no gross mistake.

Bruno

Attachment: 0001-tests-Add-a-side-by-side-output-test.patch
Description: Text Data

Attachment: 0002-diff-Improve-handling-of-mbrtoc32-result.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]