wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wget2 | Parsing comments in <style> content (patch attached) (#540)


From: Sergei Litvin
Subject: wget2 | Parsing comments in <style> content (patch attached) (#540)
Date: Sun, 22 Nov 2020 20:13:01 +0000


Sergei Litvin created an issue: https://gitlab.com/gnuwget/wget2/-/issues/540



Hello, currently parsing html-file content fails if "<"-symbols occur in 
<style> content.

Command line to reproduce:
```
wget2 -m --max-threads=1 --content-disposition --regex-type=pcre 
--accept-regex="www\.3gpp\.org/DynaReport/23.*?\.htm|portal\.3gpp\.org/desktopmodules/Specifications/SpecificationDetails\.aspx\?specificationId=|portal\.etsi\.org/webapp/workprogram/Report_WorkItem\.asp\?WKI_ID=|www\.etsi\.org/deliver/etsi_ts/.*?\.pdf"
 --domains="portal.etsi.org" --span-hosts --filter-urls 
https://www.3gpp.org/ftp/Specs/html-info/23-series.htm
```
Parsing and following of <a ... href=23XXX.htm> links are expected. 

Patch with proposed fix is attached:
[0001-Fix-parsing-comments-in-style-content.patch](/uploads/4da83b12c5b9ced80420d3ee6cec7a13/0001-Fix-parsing-comments-in-style-content.patch)

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/540
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]