[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: From wchar_t to char32_t
From: |
Bruno Haible |
Subject: |
Re: From wchar_t to char32_t |
Date: |
Thu, 06 Jul 2023 20:34:56 +0200 |
Paul Eggert wrote:
> I still see a couple of problems with it. First, it mishandles the case
> where mbrtoc32 returns 0, which ISO C allows.
I thought that we could assume that no locale encoding maps a multibyte
sequence other than "\0" to (char32_t) 0. But OK, if you don't want to
assume that, it's easy to not assume it.
> Second and more interestingly, its "fwrite (tp0, 1, bytes, out);" could
> output a byte string that represents multiple characters where the first
> character fits in the output column width but the remaining characters
> do not, and this would exceed the output column width.
This is a hypothetical scenario, because in most cases, the several
Unicode characters that come out of a multibyte sequence consist of a
base character (of width > 0) and one or more non-spacing marks
(of width == 0). But OK, let's make the minimum possible number of
assumptions...
> I suppose one could fix the second problem by not outputting such a byte
> string, just as the code already suppresses the output of a byte string
> representing a single character that occupies multiple columns
> straddling the column border. That is, count all the columns of all the
> characters that the byte string represents, before deciding whether to
> output the byte string.
Indeed, this is the solution that makes no assumptions. Find a patch that
does it.
I had expected that the replacement mbrtowc -> mbrtoc32 would be purely
mechanical; I'm surprised that it requires application-specific considerations
here.
Also find attached a unit test. Just to verify that my change and future changes
make no gross mistake.
Bruno
0001-tests-Add-a-side-by-side-output-test.patch
Description: Text Data
0002-diff-Improve-handling-of-mbrtoc32-result.patch
Description: Text Data
- Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/02
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/04
- Re: From wchar_t to char32_t,
Bruno Haible <=
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
- mbcel module for Gnulib?, Paul Eggert, 2023/07/09
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/11
- Re: mbcel module for Gnulib?, Paul Eggert, 2023/07/12
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/13
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/17
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/20