[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wget2 | URL parser does unwanted transformations of URL (#598)
From: |
Nikita Ofitserov (@himikof) |
Subject: |
Re: wget2 | URL parser does unwanted transformations of URL (#598) |
Date: |
Tue, 23 Aug 2022 18:51:02 +0000 |
Nikita Ofitserov commented:
There is a even more interesting consequence of this: Metalink files with more
than one query parameter in the URL result in wrong (mangled) URLs being
downloaded, and file names could be mangled too.
This is a simple example (the raw uri is
`https://example.com/a&b.txt?apikey=foo&log=1`):
```xml
<?xml version='1.0' encoding='utf-8'?>
<metalink xmlns="urn:ietf:params:xml:ns:metalink">
<file name="a&b.txt">
<size>42</size>
<url>https://example.com/a&b.txt?apikey=foo&log=1</url>
</file>
</metalink>
```
The metalink code just calls `wget_iri_parse` on the `url` element text
contents, which actually calls `wget_iri_unescape_inline` a few times inside,
but [this
code](https://gitlab.com/gnuwget/wget2/-/blob/ed80255d/libwget/iri.c#L603)
explicitly refuses to unescape the query part!
> `/* do not unescape query else we get ambiguity for chars like &, =, +, ...
> */`
So while the ampersand in the URL path is unescaped, the ones in the file name
and URL query part are not, and a wrong URL is being downloaded and saved to a
wrong file name...
Also, while reading the metalink code I realized that it silently (and wrongly)
assumes that the metalink XML contains only a single file, though it is
probably a separate issue.
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/-/issues/598#note_1074999600
You're receiving this email because of your account on gitlab.com.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: wget2 | URL parser does unwanted transformations of URL (#598),
Nikita Ofitserov (@himikof) <=