[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wget2 | Add support for pre/post download scripts (#80)
From: |
Andrew White (@awhite27) |
Subject: |
Re: wget2 | Add support for pre/post download scripts (#80) |
Date: |
Mon, 06 May 2024 12:47:16 +0000 |
Andrew White commented:
https://gitlab.com/gnuwget/wget2/-/issues/80#note_1894095767
I wrote a wget2 plugin that runs a python script to do the filtering. I've
attached it here in case it is useful for anyone else. There is likely to be a
few bugs in it. It is good enough for what I want to do.
[wget2-python-plugin.tgz](/uploads/98344c996c00b75ea7a1f7bd9c13a456/wget2-python-plugin.tgz)
Run wget2 with `--local-plugin=libwget-python-plugin.so`. It runs a python
script in the current directory at startup. After startup the callbacks will be
called for each event. The script is currently hardcoded as `wget_plugin.py`
due to a problem with the plugin options (see below). I've included a python
script as an example. The plugin creates a python module "wget" which the
script imports. This module provides the methods:
```
log_info()
log_error()
log_debug()
register_exit_callback()
register_url_filter_callback()
register_post_processor_callback()
```
The url filter callback is called with a Filter class object. This object has
the methods:
```
get_url()
get_local_filename()
accept()
reject()
set_alt_url()
set_local_filename()
```
The post processor callback is called with a PostProcess class object. This
object has the methods:
```
get_url()
get_local_filename()
get_data()
get_recurse()
add_recurse_url()
```
While working on this I found the following additional problems with the wget2
plugin API.
The options call back is useless. After registering the callback in the
initializer, the option callback is called once for each option. The problem is
most plugins are going to need the options during initialization. The only
workaround is to defer initialization until the first call to url_filter but
that will cause other issues since the initialization and finalization may be
on different threads. Also if the plugin name does not start with "lib" the
options are ignored.
The filter local filename is of the form `dir/file`. The post processor
filename is of the form `example.com/dir/file`. The filter local filename
should also include the hostname directory. It would also be helpful if the API
provided a function to return the download directory so the full pathname of
the local file can be obtained.
It would be useful if the post processor also provided functions that returned
the content type and charset. There is a function that indicates if the
download will be recursed. It would be useful to also provide a function that
can disable recursion of the downloaded object. Also, the docs for the
add_recurse_url() function states that it has no effect if get_recurse()
returns false. I'm not sure why. It would be useful to be able to add URLs
regardless of whether the current download will be recursed.
The plugin API is multi-threaded. This increases the complexity of plugins and
makes them less portable as they need to know what threading API wget2 is
using. I put a mutex in my python plugin to make the python script single
threaded. I wanted to keep the script simple and I'm not even sure what would
be involved in making the plugin fully multi-threaded.
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/-/issues/80#note_1894095767
You're receiving this email because of your account on gitlab.com.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: wget2 | Add support for pre/post download scripts (#80),
Andrew White (@awhite27) <=