[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Bug-wget] How to ignore link like "index.html?lang=ja"?
From: |
Tony Lewis |
Subject: |
RE: [Bug-wget] How to ignore link like "index.html?lang=ja"? |
Date: |
Mon, 7 Jun 2010 08:41:10 -0700 |
Micah Cowan wrote:
> Yeah, that was the original thinking. But I still hate it. For one
> thing, there are no longer any guarantees that recurse-able HTML files
> end in ".html"
There are a bunch of suffixes that are actively used for HTML plus there is
no reason that one has to include a suffix at all. Furthermore, the
existence of a .html suffix is no guarantee that the file really contains
HTML.
> It's better to let you explicitly specifiy what files to download
I think an option that says "spider the site and save any PDF files that you
find" is useful. It's a matter of figuring out a meaningful way to implement
"spider the site" for this scenario.
I wonder if it would make more sense to look at the Content-Type header and
only parse "text/html" files. By using HEAD, you can quickly ignore files
that don't need to be parsed.
Tony
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Peng Yu, 2010/06/01
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Micah Cowan, 2010/06/01
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Keisial, 2010/06/03
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Micah Cowan, 2010/06/03
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Keisial, 2010/06/03
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Guillaume Turri, 2010/06/03
- RE: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Tony Lewis, 2010/06/03
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Guillaume Turri, 2010/06/06
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Micah Cowan, 2010/06/06
- RE: [Bug-wget] How to ignore link like "index.html?lang=ja"?,
Tony Lewis <=
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Micah Cowan, 2010/06/07
- RE: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Tony Lewis, 2010/06/07
- Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?, Micah Cowan, 2010/06/07