wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget2 | HTTP Response 0 flooding. (#609)


From: @rockdaboot
Subject: Re: wget2 | HTTP Response 0 flooding. (#609)
Date: Sat, 30 Jul 2022 17:36:13 +0000



Tim Rühsen commented:


Some thoughts from my first view:
- `--retry-on-http-error=*,\!404` will retry 3xx (e.g. redirections). You may 
want ot use `--retry-on-http-error=4*,5*,\!404`
- `-t 3` as `--tries 3` works for me
- Where did you get `--stats-all` from ? It is not part of the --help out nor 
documented.
- `-R "index.html*"`: Better use single quotes here as otherwise the shell may 
interpret the `*`. Recursive downloads require all pages with possble URLs 
inside to be scanned, no matter if these are excluded or not. These won't be 
stored locally, though.
- HTTP response code 0 is not a real response code. There likely is some kind 
of connection failure or early close. This would be interesting to track down, 
but I can't reproduce it. If you can reproduce, please add --debug to the 
command line and paste the relevant portion of the download. It sounds like the 
web server (or proxy) has some kind of scan protection (maybe a cookie that 
times out after a while).

> Sort the --stats-site output in ascending/descending order based on HTTP 
> code, that would greatly simplify verifying what errors occured during the 
> operation if high number of files is involved.

Sometimes recusive downloads take hours or even days. Updating the --stats-site 
on-the-fly allows viewing/parsing while wget2 still runs. The downside is that 
wget2 can't sort it.

The csv output (`--stats-site=csv:out`) can easily be sorted by the HTTP 
response code with `cat out|sort -t, -k4,4`.

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/609#note_1045471015
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]