[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Patch: Make url_file_name also convert remote path to loc
From: |
Eli Zaretskii |
Subject: |
Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded |
Date: |
Mon, 13 Nov 2017 18:32:46 +0200 |
> Cc: address@hidden, address@hidden
> From: Tim Rühsen <address@hidden>
> Date: Mon, 13 Nov 2017 16:36:39 +0100
>
> > I don't think it's a Gnulib issue. The problem is that on Windows,
> > the implicit call at the beginning of Wget
> >
> > setlocale (LC_ALL, "C");
>
> Why is there an explicit call with "C" ? There is an explicit call with "".
I said "implicit", not "explicit". Such an implicit call is made at
the beginning of every C program, per ANSI C Standard. Right?
The MSDN documentation says it clearly:
At program startup, the equivalent of the following statement is executed:
setlocale( LC_ALL, "C" );
> From the man page:
> "If locale is an empty string, "", each part of the locale that should
> be modified is set according to the environment variables."
The call with a locale of "" is only done in a build that has
ENABLE_NLS defined. I was talking about a build which didn't define
ENABLE_NLS.
> > is not good enough to work in multibyte locales of the Far East,
> > because the Windows runtime assumes a single-byte locale after that
> > call. And since Wget happens to need to display text and create files
> > with non-ASCII characters, it gets hit more than other programs.
>
> I (hopefully) can understand why this doesn't work. NTFS uses UTF-16 for
> the filenames. If your environment specifies a single-character encoding
> (e.g. C) and we use at some point a multi-character encoding (e.g.
> utf-8), then any automatic conversion to UTF-16 filenames are likely to
> fail. For me the question is: a) does wget has a bug (e.g. creating a
> filename with a wrong encoded name string or b) does the Windows API has
> a problem.
>
> > The proposed solution is to add a special call to setlocale which gets
> > this right on Windows.
>
> Why can't we just convert the filename string into the correct encoding
> and then create the file ? What do I miss ?
I guess you are missing a short introduction to the Windows l10n/i18n
mess. Let me try.
First, the fact that NTFS uses UTF-16 is not really relevant. Wget
uses 'char *' strings, not 'wchar *' strings to store file names and
call C library functions that accept file names. So we cannot use the
UTF-16 encoding of non-ASCII file names directly. Instead, we use the
locale's codepage (the C library and the OS APIs then convert to
UTF-16 before hitting the disk, but that's not important now).
Next, creating and opening file names is not the only problem: we need
also to display these file names and URLs, and that also needs to use
the encoding expected by the Windows console.
Now, in any locale which uses single-byte encoding of non-ASCII
characters, the C locale will support those characters, both for I/O
and for functions like strcmp, strlen, strcoll, etc. But not in
double-byte locales of the Far East: there, you must explicitly call
setlocale with the correct codepage, to have the local character set
supported. This support includes manipulating file names, calling C
library functions to access files, and displaying non-ASCII text, such
as file names and URLs, on the console.
IOW, this is a Windows runtime subtlety that unfortunately needs to be
fixed in the application code.
(UTF-8 is not relevant at all here, because Windows doesn't support
UTF-8 as the locale's codeset; if you try to call setlocale to set
UTF-8 as the codeset, setlocale will simply fail. So if we have a
UTF-8 encoded URL or file name inside wget, we must convert it to the
current codepage by calling libiconv functions.)
Does the above make sense? Let me know if I have to explain some
more.
- [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, YX Hao, 2017/11/02
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Tim Rühsen, 2017/11/12
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Eli Zaretskii, 2017/11/12
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/13
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Tim Rühsen, 2017/11/13
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded,
Eli Zaretskii <=
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/14
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Tim Rühsen, 2017/11/15
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Eli Zaretskii, 2017/11/15
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/13
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/13