wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget2 | no-parent option not working as expected (#620)


From: @rockdaboot
Subject: Re: wget2 | no-parent option not working as expected (#620)
Date: Fri, 16 Dec 2022 13:51:12 +0000



Tim Rühsen commented:


> Ok, but why is it needed to scan outside the parent directory. In my case 
> this leads into more than 500 Million documents that are loaded into RAM but 
> not needed. It also took a lot of time (for nothing?!).

I see that this is unfortunate in your case.
But imagine that `2022_10_October/index.html` would just have links to HTML 
pages outside `2022_10_October/`, from where all the documents inside 
`2022_10_October/` are linked. In this case, wget2 would only download 
`index.html` and won't see any other pages.

There is a way to filter out URLs before they are considered to be downloaded 
(no matter if they contain HTML/CSS or not). In your case, I'd add 
`--accept='*2022_10_October*' --filter-urls`. It will only download URLs that 
contain `2022_10_October`.

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/620#note_1212618540
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]