bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #60494] Percent character in filename gets escaped twice


From: Tim Ruehsen
Subject: [bug #60494] Percent character in filename gets escaped twice
Date: Sat, 22 May 2021 08:41:04 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Follow-up Comment #8, bug #60494 (project wget):

So what KodeCharlie says is correct (regarding b)).
I would reword it like: We have arbitrary user input which has nothing to do
what RFCs say. And this is the hard part as we have to 'guess' what the user
meant.
Once the input is 'normalized' (unescaped, charset translated (into utf-8),
protcoll extended, ...), the rest is straight forward following the RFCs.

@PetrPisar Regarding the filename: it is also user input. And the problem is
that the wget author(s) made some decisions in the past on how to treat user
input. There is no black and white here and any decision has it's pros and
cons.
I think that part of the problem is that URLs on web sites are often printed
in their escaped form. And wget users explicitly wanted to use copy&paste
(from web site to console).

Then the next aspect is: we don't want to change a long-standing (default)
behavior. This breaks (production) scripts and command lines. What we can
possibly do is to add a new '--strict-input' option that skips 'guessing' and
instead assumes a 100% valid URL. BTW, this is a good idea for wget2 ;-)

I agree that "wget [option]... [URL]..." is not 100% correct in terms of RFCs.
But wget is also a user tool, and normal users don't have the RFCs in mind
when they think about URLs.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60494>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]