wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wget-dev] wget2 | -X does not work (#365)


From: Tim Rühsen
Subject: Re: [Wget-dev] wget2 | -X does not work (#365)
Date: Fri, 06 Jul 2018 12:49:42 +0000

Lets have a file structure like on the server
```
drwxr-xr-x 3 oms users 4096 06-07-18 14:38:39 datenschutzhinweis
drwxr-xr-x 2 oms users 4096 06-07-18 14:38:39 files
-rw-r--r-- 1 oms users 7310 03-07-18 17:42:55 index.html
drwxr-xr-x 3 oms users 4096 06-07-18 14:38:39 rw_common
```

Without filter-urls, every file is downloaded+parsed if a HTML, CSS or RSS:
```
$ ../src/wget2 -r --reject-regex ".*/datenschutz.*" example.com|grep datenschutz
Adding URL: http://example.com/datenschutzhinweis/datenschutzhinweis.html
[1] Checking 'http://example.com/datenschutzhinweis/datenschutzhinweis.html' ...
HTTP response 200 OK 
[http://example.com/datenschutzhinweis/datenschutzhinweis.html]
[1] Downloading 'http://example.com/datenschutzhinweis/datenschutzhinweis.html' 
...
HTTP response 200 OK 
[http://example.com/datenschutzhinweis/datenschutzhinweis.html]
Adding URL: http://example.com/datenschutzhinweis/files/page0_1.gif
[1] Checking 'http://example.com/datenschutzhinweis/files/page0_1.gif' ...
HTTP response 200 OK [http://example.com/datenschutzhinweis/files/page0_1.gif]
```

Without filter-urls, those files are filtered out before download+parse step:
```
$ ../src/wget2 -r --filter-urls --reject-regex ".*/datenschutz.*" 
example.com|grep datenschutz
Adding URL: http://example.com/datenschutzhinweis/datenschutzhinweis.html
```

The `Adding` means it has been added to the 'blacklist' (list of finished files)

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/issues/365#note_86524859
You're receiving this email because of your account on gitlab.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]