[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Support non-ASCII URLs
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] Support non-ASCII URLs |
Date: |
Sun, 20 Dec 2015 16:26:20 +0100 |
User-agent: |
KMail/4.14.10 (Linux/4.3.0-1-amd64; KDE/4.14.14; x86_64; ; ) |
Am Samstag, 19. Dezember 2015, 14:11:20 schrieb Eli Zaretskii:
> > Date: Sat, 19 Dec 2015 10:15:03 +0200
> > From: Eli Zaretskii <address@hidden>
> > Cc: address@hidden
> >
> > > 2. contrib/check-hard fails with
> > > TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check
> > >
> > > FAIL: Test-iri-forced-remote
> > >
> > > My son has birthday tomorrow, so I am not sure how much time I can spend
> > > on
> > > the weekend on this issue. Maybe Eli or you could have a look ?
> >
> > I cannot bootstrap the Git repo (too many prerequisites I don't have).
> > Can you or someone else produce a distribution tarball out of Git that
> > I could then build "as usual"?
> >
> > Also, can you show me the log of the failed test? Turkish locales
> > have "an issue" with certain upper/lower-case characters, maybe that's
> > the problem. Or maybe it's something else; looking at the log might
> > give good clues.
>
> Tim sent me the tarball and the log off-list (thanks!). I didn't yet
> try to build Wget, but just looking at the test, I guess I don't
> understand its idea. It has an index.html page that's encoded in
> ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1,
> and the URLs themselves in "my %urls" are all encoded in UTF-8. How's
> this supposed to work?
Regarding the wget man page, --remote-encoding just sets the *default* server
encoding. This only comes into play when the HTTP header does not contain a
Content-type with charset set *and* the HTML page does not contain a <meta
http-equiv="Content-Type" with 'content=... charset=...'.
'index.html' in this test is correctly having a meta tag with charset=utf-8
and the URLs encoded in utf-8.
> Also, I'm not following the logic of overriding Content-type by the
> remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but
> includes a link encoded in ISO-8859-1, and the test seems to expect
> Wget to use the remote encoding in preference to what "charset=" says.
Either the test is wrong here or the man page. I would say the man page should
be correct here - it makes the most sense to me. In this case the test is
wrong, also the comment.
> Does the remote encoding override the encoding for the _contents_ of
> the URL, not just for the URL itself? That seems to make little sense
> to me: the contents and the name can legitimately be encoded
> differently, I think.
The filenames in %expected_downloaded_files depend on --local-encoding.
Since this is not given on the command line, this test will behave differently
with different settings for LC_ALL ('make check' use LC_ALL=C, contrib/check-
hard will also 'make check' with turkish UTF-8 locale).
To fix the test, we should use --local-encoding to some kind of UTF-8 locale
(or something else, but than we have to fix the filenames regarding that
locale).
Regards, Tim
- Re: [Bug-wget] Support non-ASCII URLs, (continued)
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/18
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/19
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/19
- Re: [Bug-wget] Support non-ASCII URLs,
Tim Rühsen <=
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/20
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/20