[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Not getting the wildcards to work in wget
From: |
Felix Dietrich |
Subject: |
Re: Not getting the wildcards to work in wget |
Date: |
Fri, 05 Feb 2021 06:25:37 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Hello,
Cherise Haywood <Cherise.Haywood@metoffice.gov.tt> writes:
> I am trying to download specific .zip files from this website:
> https://www2.census.gov/geo/tiger/TIGER2012/ROADS/
>
> I have used several iterations of wget to yield only the folders (
> directories) being formed, but absolutely no data being downloaded.
>
> Here are copies of the code I have used:
>
> OPTION 1: wget --no-parent --relative --recursive --level=2
> --accept=zip --mirror -A .zip
> https://www2.census.gov/geo/tiger/TIGER2012/ROADS/
>
> Can you assist?
It seems that wget has problems with parsing the /robots.txt correctly:
the empty record for “User-Agent: *” appears to cause it to consider all
paths disallowed. To work around the issue you may disable honouring
the /robots.txt by adding “--execute robots=off” to your command-line.
> OPTION 2: wget --no-parent --relative --recursive --level=2
> --accept=zip --mirror -A *_72*.zip --time-stamps
> https://www2.census.gov/geo/tiger/TIGER2012/ROADS/
--time-stamps should probably have been --timestamping.
--mirror sets an infinite recursion depth (--level=inf). You may limit
the depth when using --mirror by specifying --level after --mirror (I
believe).
> OPTION 3: wget --no-parent --relative --recursive --level=2
> --accept=zip --mirror -A _72
> https://www2.census.gov/geo/tiger/TIGER2012/ROADS
Having multiple patterns specified with -A, --accept either using
separate arguments or comma separated patterns will accept a file if
*any one* of the patterns matches.
> I only want the files with *_72*.zip to be downloaded to a copy of the
> directories it comes from on my system.
This is the invocation I have come up with (backslash used as line
continuation marker):
wget --execute robots=off --timestamping \
--no-parent --recursive --level=1 \
--accept '*_72*.zip' \
'https://www2.census.gov/geo/tiger/TIGER2012/ROADS/'
Make sure to quote strings containing characters with special meaning to
your shell (like the ‘*’ often used for globing). --level=1 seems to be
enough to get the .zip files: they are all in the directory your URL
points to – but you should check that.
> I have attached error imgs, I captured!
It would have been better, had you provided a log in text form. Wget
can be instructed to output to a log file using --output-file or
--append-output; if you still want to see the progress bar also add
--show-progress. You may also use the Windows’ command-prompt
redirection operator “> /path/to/file” to write wget’s output to a file.
Happy data analysing, I presume.
--
Felix Dietrich