bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: From wchar_t to char32_t


From: Bruno Haible
Subject: Re: From wchar_t to char32_t
Date: Thu, 13 Jul 2023 17:14:24 +0200

Paul Eggert wrote:
> > Based on the comments in gnulib/lib/mbrtoc16.c, I think it should better
> > clear the first 24, not 12, bytes of the struct. Otherwise it can be in
> > a state where mbsinit() returns true but the mbrto* functions have
> > undefined behaviour.
> 
> For mbcel all all that matters is mbrtoc32. Could you give an example of 
> the undefined behavior there? I looked at the citrus implementations in 
> current FreeBSD, OpenBSD and macOS and thought that 12 bytes is enough 
> for mbrtoc32 on all their porting targets. NetBSD is a bit different and 
> needs just a pointer width.

There's a difference between the part that mbsinit() looks at and the part
that needs to be zeroed, to avoid undefined behaviour. For example, if we
have a

   typedef struct
     {
       unsigned int count;
       unsigned int wchar;
     }
   mbstate_t;

mbsinit() can return true if state->count == 0. But that does not mean
that every state with state->count == 0 is valid. It is perfectly OK
for mbrtowc() or mbrtoc32() (or other functions) to call abort() or to
crash if
    state->count == 0 && state->wchar != 0.

By reading the source code of FreeBSD, NetBSD, OpenBSD, macOS, Solaris,
and so on, I can easily determine
  - which parts of the mbstate_t mbsinit() tests,
  - which parts of the mbstate_t the various functions use.
But in order to understand what interdependencies there are, between
the various mbstate_t fields, and what are the assumed invariants,
I would need to carefully read each of the mentioned files (one per
OS and per locale type). And this would not be future-proof: After
changes in the bulk of code, the interdependencies and assumed invariants
might not be the same any more. If we then have cleared too few fields
of the mbstate_t, things might crash.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]