bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Texinfo 7.0.93 pretest available


From: Gavin Smith
Subject: Re: Texinfo 7.0.93 pretest available
Date: Mon, 9 Oct 2023 20:39:59 +0100

On Mon, Oct 09, 2023 at 05:06:39PM +0300, Eli Zaretskii wrote:
> > I wonder if for some of the non-ASCII characters wcwidth is returning 0 or
> > -1 leading the line to be longer.
> 
> Yes, quite a few characters return -1 from wcwidth, in particular the
> ȷ character above (which explains the above difference).
> 
> > It's also possible that other codepoints have inconsistent wcwidth results,
> > especially for combining accents.
> > 
> > Do you know if it is the gnulib implementation of wcwidth that is being
> > used or a MinGW one?
> 
> AFAIK, MinGW doesn't have wcwidth, so we are using the one from
> Gnulib.  But what Gnulib does in this case is not what Texinfo
> expects, I think:
> 
> int
> wcwidth (wchar_t wc)
> #undef wcwidth
> {
>   /* In UTF-8 locales, use a Unicode aware width function.  */
>   if (is_locale_utf8_cached ())
>     {
>       /* We assume that in a UTF-8 locale, a wide character is the same as a
>          Unicode character.  */
>       return uc_width (wc, "UTF-8");
>     }
>   else
>     {
>       /* Otherwise, fall back to the system's wcwidth function.  */
> #if HAVE_WCWIDTH
>       return wcwidth (wc);
> #else
>       return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
> #endif
>     }
> }
> 
> IOW, unless the locale's codeset is UTF-8, any character that is not
> printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
> guessing that no one has ever tried to run the test suite in a
> non-UTF-8 locale before?

It is supposed to attempt to force the locale to a UTF-8 locale.  You
can see the code in xspara_init that attempts to change the locale.  There
is also a comment before xspara_add_text:

  "This function relies on there being a UTF-8 locale in LC_CTYPE for
  mbrtowc to work correctly."

For MS-Windows there is the w32_setlocale function that may use something
different:

  /* Switch to the Windows U.S. English locale with its default
     codeset.  We will handle the non-ASCII text ourselves, so the
     codeset is unimportant, and Windows doesn't support UTF-8 as the
     codeset anyway.  */
  return setlocale (category, "ENU");

mbrtowc has its own override which handle UTF-8.

As far as this relates to wcwidth, there used to be an MS-Windows specific
stub implementation of this, removed in commit 5a66bc49ac032 (Patrice Dumas,
2022-08-19) which added a gnulib implementation of wcwidth:

diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
index 93924a623c..bf4ef91650 100644
--- a/tp/Texinfo/XS/xspara.c
+++ b/tp/Texinfo/XS/xspara.c
@@ -206,13 +206,6 @@ iswspace (wint_t wc)
   return 0;
 }
 
-/* FIXME: Provide a real implementation.  */
-int
-wcwidth (const wchar_t wc)
-{
-  return wc == 0 ? 0 : 1;
-}
-
 int
 iswupper (wint_t wi)
 {


If this simple stub is preferable to the Gnulib implementation for
MS-Windows, (e.g. it makes the tests pass) we could re-add it again.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]