wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wget2 | When mirroring, URI fragments in converted links to parent direc


From: @kuukunen
Subject: wget2 | When mirroring, URI fragments in converted links to parent directories get removed (#623)
Date: Wed, 22 Feb 2023 04:14:29 +0000


Aki Jäntti created an issue: https://gitlab.com/gnuwget/wget2/-/issues/623



I noticed that when I try to mirror a site, URI fragments (the stuff after hash 
sign in links) get removed sometimes.

Basically, I have two files on the server:

test/a.html:
`<html><body><a href="http://XXXXXXX/test/test2/b.html#bb";>bb</a></body></html>`

test/test2/b.html
`<html><body><a href="http://XXXXXXX/test/a.html#aa";>aa</a></body></html>`


I run this command:

`wget2 --mirror --convert-links http://XXXXXXX/test/a.html`

And what I get is files:

XXXXXXX/test/a.html:
`<html><body><a href="test2/b.html#bb">bb</a></body></html>`

XXXXXXX/test/test2/b.html:
`<html><body><a href="../a.html">aa</a></body></html>`

You can see in the second file the fragment #aa was removed. Noteworthy is that 
it seems only fragments in links upwards in the hierarchy get removed.

I tried this with the latest version (2.0.1) and also the master branch from 
git. (After fixing a compilation error about print_error in net.c)

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/623
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]