[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Wget-dev] wget2 | Robot fix (!454)
From: |
Archit Pandey |
Subject: |
[Wget-dev] wget2 | Robot fix (!454) |
Date: |
Tue, 22 Oct 2019 05:46:22 +0000 |
Archit Pandey created a merge request:
https://gitlab.com/gnuwget/wget2/merge_requests/454
Project:Branches: archit-p/wget2:robot-fix to gnuwget/wget2:master
Author: Archit Pandey
Hello maintainers!
This merge request addresses https://gitlab.com/gnuwget/wget2/issues/456
Description of files changed:
1. `src/wget.c (add_url_to_queue, add_url)`: robots.txt is downloaded when
config.recursive option is set, without checking for config.robots option.
2. `src/wget.c (add_url)`: config.robots option is checked when updating URLs
not to follow
3. `tests/test-robots-off.c`: Most of the file was borrowed from
`tests/test-robots.c`. It tests whether robots.txt is downloaded even with
robots=off, and that the disallowed URLs are not respected.
4. `tests/test-iri-percent.c`: changing the robots=off behavior broke the
`test-iri-percent` testcase since it wasn't expecting robots.txt to be
downloaded. Adding robots.txt to expected files ensures it passes now.
I have done a clean install after making these changes. Also run `make check`,
66/69 test cases PASS, 3/69 are skipped.
It appears to me that this was a very quick fix, there might be better ways to
do the same. Please point out any gaps to this approach, or suggestions on how
to improve.
Thanks!
```
### Approver's checklist:
* [ ] The author has submitted the FSF Copyright Assignment and is listed in
AUTHORS
* [ ] There is a test suite reasonably covering new functionality or
modifications
* [ ] Function naming, parameters, return values, types, etc., are consistent
with existing code
* [ ] This feature/change has adequate documentation added (if appropriate)
* [ ] No obvious mistakes / misspelling in the code
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/merge_requests/454
You're receiving this email because of your account on gitlab.com.
- [Wget-dev] wget2 | Robot fix (!454),
Archit Pandey <=
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Archit Pandey, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Tim Rühsen, 2019/10/22
- Re: [Wget-dev] wget2 | Using --robots=off / --no-robots downloads the robots.txt file and scans it for sitemaps (!454), Archit Pandey, 2019/10/22