[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Hello again
From: |
'Darshit Shah' |
Subject: |
Re: [Bug-wget] Hello again |
Date: |
Thu, 11 Oct 2018 11:34:52 +0200 |
User-agent: |
NeoMutt/20180716 |
* address@hidden <address@hidden> [181009 17:12]:
>
> Hello Darshit Shah,
>
> Thank you for your welcome message. I am glad to be part of your project!
>
> I don't understand the term "javascript engine". AFAK javascript is code that
> run on the browser side, and we have no problem fetching it.
>
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
>
> There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by
> programmers and they will have to take some action on their site to
> incorporate the engine.
Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
>
> POST requests to comments and mail will need to taken care of so they will
> work on static site. One solution is to do hosted supplier that will carry
> the task and deliver spam removal as well.
> I think I will be able to a howto document on that.
>
> Michael
>
> -----Original Message-----
> From: Darshit Shah <address@hidden>
> Sent: Tuesday, 9 October, 2018 2:52 PM
> To: address@hidden
> Cc: address@hidden
> Subject: Re: [Bug-wget] Hello again
>
> Hi Michael,
>
> Nice to hear from you again. I vaguely remember a mention of someone who
> wanted
> to work on this feature. When deciding to make this work, please remember that
> any of this can only work if the site does not rely on Javascript; which given
> Wordpress is a difficult thing. The reason for this is that we do _not_ intend
> to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
> much of a maintenance nightmare. However, if the site can work without
> Javascript, then I would assume that Wget2 can already handle making a static
> copy. If it can't handle something, please let us know / file a bug report
> about it.
>
> Of course, I welcome you to work on Wget2 as you see fit. And we would love to
> look at any contributions you can make. We will also try and help you out as
> much as possible when dealing with the codebase.
>
> About the dev setup, I only use vim and gdb to work with Wget. As Tim has
> already mentioned, he uses Netbeans and might be able to help you out.
>
> You also mentioned something about the lib/ directory. That is an
> auto-generated dir with compatibility libs that you don't need to care about.
> All the code for Wget2 is in src/ and the code for the library is in libwget/.
> Those are the two main directories you need to care about. And of course
> tests/
> for the tests.
>
> * address@hidden <address@hidden> [181008 21:22]:
> >
> > Hello again,
> >
> > My name is Michael. I have approached you about a year ago.
> >
> > I am interested in making wget2 a tool that can convert content management
> > systems (like WordPress) output to HTML. This actually limits the content
> > management system to generate the website every time it is changed, and the
> > presentation is done using the HTTP server only.
> >
> > This is an important feature as it prevents security risk - penetration of
> > hacker to the site and installing viruses or stealing data.
> > It also allows the website to be delivered much faster as no PHP code needs
> > to run in order to deliver the content. Google already announced that site
> > download speed is a factor in its SEO evaluation.
> >
> > I will be able to work for 3 hours every week on the project. I do need some
> > guidance from you.
> >
> > I have started to configure Netbeans IDE as using a debugger can help me
> > delve into the code much faster. There are some issues with the Netbeans. Do
> > you use Id? Which one?
> >
> > Best regards,
> >
> > Michael
> >
> >
> >
> >
>
> --
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
>
>
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
signature.asc
Description: PGP signature