bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Use UTF-8 active code page for Windows host.


From: Costas Argyris
Subject: Re: [PATCH] Use UTF-8 active code page for Windows host.
Date: Mon, 20 Mar 2023 13:45:14 +0000

That's most probably because $(wildcard) calls a Win32 API that is
case-insensitive.  So the jury is still out on this matter, and I
still believe that the below is true:


In that case, are you aware of any Make construct other than $(wildcard)
that will lead to calling an API of interest?    I'd be happy to test it against
the patched UTF-8 version of Make that I have built.

In any case, do you see this as a blocking issue for the UTF-8 feature?

Is the concern that the UTF-8 feature might break existing things that
work, or that some things that we might naively expect to work with the
switch to UTF-8, won't actually work?

This is about UCRT specifically, so I wonder whether MSVCRT will
behave the same.


That's true.    I wonder how the examples I did so far worked, given
that (as you found out) my UTF-8-patched Make is linked against
MSVCRT.    Is it just that everything I tried so far is so simple that it
doesn't even trigger calls to sensitive functions in MSVCRT?    Because
from what I found online, MSVCRT does not support UTF-8, and yet
somehow things appear to be working, at least on the surface.

Only on Windows versions that support this.

Yes, this whole feature makes sense only on
Windows Version 1903 (May 2019 Update)
or later anyway (this is Windows 10).

Previous versions will simply be unaffected.    Make will still run, but
will still break when faced with UTF-8 input in any way.

Given that the feature will only work on Windows 10, UCRT will also
be available, so if linking against UCRT it will be possible to call
setlocale(LC_ALL, ".UTF8") and get full UTF-8 support in the C lib as well.

If linking against MSVCRT, we are forced to face the restrictions it has anyway.
Which brings me back to my question of whether you see this as a potential
blocking issue for Make switching to UTF-8 on Windows by embedding the
UTF-8 manifest at build time.

On Mon, 20 Mar 2023 at 11:54, Eli Zaretskii <eliz@gnu.org> wrote:
> From: Costas Argyris <costas.argyris@gmail.com>
> Date: Sun, 19 Mar 2023 21:25:30 +0000
> Cc: bug-make@gnu.org, Paul Smith <psmith@gnu.org>
>
> I create a file src.β first:
>
> touch src.β
>
> and then run the following UTF-8 encoded Makefile:
>
> hello :
> @gcc ©\src.c -o ©\src.exe
>
> ifneq ("$(wildcard src.β)","")
> @echo src.β exists
> else
> @echo src.β does NOT exist
> endif
>
> ifneq ("$(wildcard src.Β)","")
> @echo src.Β exists
> else
> @echo src.Β does NOT exist
> endif
>
> ifneq ("$(wildcard src.βΒ)","")
> @echo src.βΒ exists
> else
> @echo src.βΒ does NOT exist
> endif
>
> and the output of Make is:
>
> C:\Users\cargyris\temp>make -f utf8.mk
> src.β exists
> src.Β exists
> src.βΒ does NOT exist
>
> which shows that it finds the one with the upper case extension as well,
> despite the fact that it exists in the file system as a lower case extension.

That's most probably because $(wildcard) calls a Win32 API that is
case-insensitive.  So the jury is still out on this matter, and I
still believe that the below is true:

> My guess would be that only characters within the locale, defined by
> the ANSI codepage, are supported by locale-aware functions in the C
> runtime.  That's because this is what happens even if you use "wide"
> Unicode APIs and/or functions like _wcsicmp that accept wchar_t
> characters: they all support only the characters of the current locale
> set by 'setlocale'.  I don't expect that to change just because UTF-8
> is used on the outside: internally, everything is converted to UTF-16,
> i.e. to the Windows flavor of wchar_t.
>
> But this one looks most relevant to your point:
>
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support
>
>
> "Starting in Windows 10 version 1803 (10.0.17134.0), the Universal C Runtime supports using a UTF-8 code
> page. The change means that char strings passed to C runtime functions can expect strings in the UTF-8
> encoding. To enable UTF-8 mode, use ".UTF8" as the code page when using setlocale. For example,
> setlocale(LC_ALL, ".UTF8") will use the current default Windows ANSI code page (ACP) for the locale and
> UTF-8 for the code page."

This is about UCRT specifically, so I wonder whether MSVCRT will
behave the same.

> My point is, with the manifest embedded at build time, ACP will be UTF-8
> already when the program (Make) runs, so no need to do anything more.

Only on Windows versions that support this.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]