bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] Support for multiple regex patterns for URL matching


From: Nekun
Subject: Re: [PATCH 1/3] Support for multiple regex patterns for URL matching
Date: Mon, 3 May 2021 15:43:35 +0000

Hi Tim,

Well, if wget is frozen, what about just failing when multiple regex
arguments are passed? Yeah, it may break poorly written scripts too,
but (imo) users shouldn't expect that dirty undocumented tricks
will be work forever and it will produce an apparent error which is
easy to fix, rather than silent behavior change. At first, I wanted to
do exactly that, but after a quick look at the *grandiose* options
infrastructure decided that adding functionality should be easier :)

If wget2 is aimed to be a well-featured and supporting actual
technologies web scraper replacing old and obscure things like HTTrack,
it's a high chance that I will be interested to participate. However, I
think I need to touch it as user first. Thanks for your appreciance.

On Sun, 2 May 2021 14:57:34 +0200
Tim Rühsen <tim.ruehsen@gmx.de> wrote:

> Hi,
> 
> your patches look great, good work. And allowing multiple regexes
> seems to be a good idea to me.
> 
> Here comes the (small) but...
> 
> a) Wget is in maintenance mode - we try not to add any new features 
> here; just bugs are fixed. New features (and I consider this a new 
> feature due to b)) should go into Wget2[1] only.
> 
> b) This is a breaking change. Well, on the first glance it just
> extends an already existing feature. But I see corner cases where
> this might break existing production. E.g. scripts that rely on the
> behavior that the last regex on the command line overrides any
> previous would behave differently with your patch. Another example is
> when someone has a default regex in a config file and overrides it
> via command line.
> 
> Then there also open questions like what is the most flexible way in 
> chaining regexes with (not|and|or) operators.
> 
> So if you consider rewriting your patches for Wget2, I'll be happy to 
> support you and/or discuss this topic with you :-)
> 
> There is little "obstacle": for non-trivial work (>15 lines of code)
> to be accepted, you have to sign the FSF copyright assignment. For
> this, follow the instructions at [2].
> 
> I would love to hear from you.
> 
> Regards, Tim
> 
> [1] https://gitlab.com/gnuwget/wget2
> [2] 
> https://git.savannah.gnu.org/cgit/gnulib.git/tree/doc/Copyright/request-assign.future

Attachment: pgpA7dn1SptJ2.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]