[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: From wchar_t to char32_t
From: |
Bruno Haible |
Subject: |
Re: From wchar_t to char32_t |
Date: |
Thu, 13 Jul 2023 17:14:24 +0200 |
Paul Eggert wrote:
> > Based on the comments in gnulib/lib/mbrtoc16.c, I think it should better
> > clear the first 24, not 12, bytes of the struct. Otherwise it can be in
> > a state where mbsinit() returns true but the mbrto* functions have
> > undefined behaviour.
>
> For mbcel all all that matters is mbrtoc32. Could you give an example of
> the undefined behavior there? I looked at the citrus implementations in
> current FreeBSD, OpenBSD and macOS and thought that 12 bytes is enough
> for mbrtoc32 on all their porting targets. NetBSD is a bit different and
> needs just a pointer width.
There's a difference between the part that mbsinit() looks at and the part
that needs to be zeroed, to avoid undefined behaviour. For example, if we
have a
typedef struct
{
unsigned int count;
unsigned int wchar;
}
mbstate_t;
mbsinit() can return true if state->count == 0. But that does not mean
that every state with state->count == 0 is valid. It is perfectly OK
for mbrtowc() or mbrtoc32() (or other functions) to call abort() or to
crash if
state->count == 0 && state->wchar != 0.
By reading the source code of FreeBSD, NetBSD, OpenBSD, macOS, Solaris,
and so on, I can easily determine
- which parts of the mbstate_t mbsinit() tests,
- which parts of the mbstate_t the various functions use.
But in order to understand what interdependencies there are, between
the various mbstate_t fields, and what are the assumed invariants,
I would need to carefully read each of the mentioned files (one per
OS and per locale type). And this would not be future-proof: After
changes in the bulk of code, the interdependencies and assumed invariants
might not be the same any more. If we then have cleared too few fields
of the mbstate_t, things might crash.
Bruno
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/04
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/06
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
- mbcel module for Gnulib?, Paul Eggert, 2023/07/09
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/10
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/11
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t,
Bruno Haible <=
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/13
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/16
- Re: From wchar_t to char32_t, new module mbszero, Paul Eggert, 2023/07/16
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/17
- Re: From wchar_t to char32_t, new module mbszero, Paul Eggert, 2023/07/18
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/19
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/17