Re: [PATCH] Use UTF-8 active code page for Windows host.

That's most probably because $(wildcard) calls a Win32 API that is
case-insensitive. So the jury is still out on this matter, and I
still believe that the below is true:

In that case, are you aware of any Make construct other than $(wildcard)

that will lead to calling an API of interest? I'd be happy to test it against

the patched UTF-8 version of Make that I have built.

In any case, do you see this as a blocking issue for the UTF-8 feature?

Is the concern that the UTF-8 feature might break existing things that

work, or that some things that we might naively expect to work with the

switch to UTF-8, won't actually work?

This is about UCRT specifically, so I wonder whether MSVCRT will
behave the same.

That's true. I wonder how the examples I did so far worked, given

that (as you found out) my UTF-8-patched Make is linked against

MSVCRT. Is it just that everything I tried so far is so simple that it

doesn't even trigger calls to sensitive functions in MSVCRT? Because

from what I found online, MSVCRT does not support UTF-8, and yet

somehow things appear to be working, at least on the surface.

Only on Windows versions that support this.

Yes, this whole feature makes sense only on

Windows Version 1903 (May 2019 Update)

or later anyway (this is Windows 10).

Previous versions will simply be unaffected. Make will still run, but

will still break when faced with UTF-8 input in any way.

Given that the feature will only work on Windows 10, UCRT will also

be available, so if linking against UCRT it will be possible to call

setlocale(LC_ALL, ".UTF8") and get full UTF-8 support in the C lib as well.

If linking against MSVCRT, we are forced to face the restrictions it has anyway.

Which brings me back to my question of whether you see this as a potential

blocking issue for Make switching to UTF-8 on Windows by embedding the

UTF-8 manifest at build time.

On Mon, 20 Mar 2023 at 11:54, Eli Zaretskii <eliz@gnu.org> wrote:

> From: Costas Argyris <costas.argyris@gmail.com>
> Date: Sun, 19 Mar 2023 21:25:30 +0000
> Cc: bug-make@gnu.org, Paul Smith <psmith@gnu.org>
>
> I create a file src.β first:
>
> touch src.β
>
> and then run the following UTF-8 encoded Makefile:
>
> hello :
> @gcc ©\src.c -o ©\src.exe
>
> ifneq ("$(wildcard src.β)","")
> @echo src.β exists
> else
> @echo src.β does NOT exist
> endif
>
> ifneq ("$(wildcard src.Β)","")
> @echo src.Β exists
> else
> @echo src.Β does NOT exist
> endif
>
> ifneq ("$(wildcard src.βΒ)","")
> @echo src.βΒ exists
> else
> @echo src.βΒ does NOT exist
> endif
>
> and the output of Make is:
>
> C:\Users\cargyris\temp>make -f utf8.mk
> src.β exists
> src.Β exists
> src.βΒ does NOT exist
>
> which shows that it finds the one with the upper case extension as well,
> despite the fact that it exists in the file system as a lower case extension.

That's most probably because $(wildcard) calls a Win32 API that is
case-insensitive. So the jury is still out on this matter, and I
still believe that the below is true:

> My guess would be that only characters within the locale, defined by
> the ANSI codepage, are supported by locale-aware functions in the C
> runtime. That's because this is what happens even if you use "wide"
> Unicode APIs and/or functions like _wcsicmp that accept wchar_t
> characters: they all support only the characters of the current locale
> set by 'setlocale'. I don't expect that to change just because UTF-8
> is used on the outside: internally, everything is converted to UTF-16,
> i.e. to the Windows flavor of wchar_t.
>
> But this one looks most relevant to your point:
>
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support
>
>
> "Starting in Windows 10 version 1803 (10.0.17134.0), the Universal C Runtime supports using a UTF-8 code
> page. The change means that char strings passed to C runtime functions can expect strings in the UTF-8
> encoding. To enable UTF-8 mode, use ".UTF8" as the code page when using setlocale. For example,
> setlocale(LC_ALL, ".UTF8") will use the current default Windows ANSI code page (ACP) for the locale and
> UTF-8 for the code page."

This is about UCRT specifically, so I wonder whether MSVCRT will
behave the same.

> My point is, with the manifest embedded at build time, ACP will be UTF-8
> already when the program (Make) runs, so no need to do anything more.

Only on Windows versions that support this.

From:	Costas Argyris
Subject:	Re: [PATCH] Use UTF-8 active code page for Windows host.
Date:	Mon, 20 Mar 2023 13:45:14 +0000