[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#52555] [RFC PATCH 0/3] Decentralized substitute distribution with E
From: |
pukkamustard |
Subject: |
[bug#52555] [RFC PATCH 0/3] Decentralized substitute distribution with ERIS |
Date: |
Thu, 23 Dec 2021 11:42:46 +0000 |
Hi Ludo,
Thanks for your comments!
Ludovic Courtès <ludo@gnu.org> writes:
>> StorePath: /gnu/store/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10
>> URL: nar/gzip/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10
>> Compression: gzip
>> FileSize: 67363
>> ERIS:
>> urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE
>> URL: nar/zstd/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10
>> Compression: zstd
>> FileSize: 64917
>> ERIS:
>> urn:erisx2:BIBO7KS7SAWHDNC43DVILOSQ3F3SRRHEV6YPLDCSZ7MMD6LZVCHQMEQ6FUBTJAPSNFF7XR5XPTP4OQ72OPABNEO7UYBUN42O46ARKHBTGM
>
> Do we really need one URN per compression method? Couldn’t we leave
> compression (of individual chunks, possibly) as a “detail” handled by
> the encoding or the transport layer?
>
I agree that it would be nice to leave this to the encoding layer as
that would allow certain optimizations (e.g. de-duplication).
Unfortunately, we haven't figured out yet what the most suitable
compression/format would be. Something like EROSFS seems good (as it
aligns data to fixed block sizes) [1]. But this seems a bit "clunky" for
just an archive format and there do not seem to be any libraries that we
could use to neatly integrate. It seems possible to block-align a Tar
archive, but that seems a bit hackey [2]. Other things to look into
might be Tarlz [3] and ZPAQ [4].
To get started I suggest just using one of the compressions/formats
already in Guix. zstd seems to be a reasonable choice (for the same
reasons why it makes sense to use zstd with `--discover` [5]).
Does that sound like a plan?
[1] https://inqlab.net/git/guile-eris.git/tree/examples/dedup-fs/Readme.org
[2]
https://unix.stackexchange.com/questions/276908/make-tar-or-other-archive-with-data-block-aligned-like-in-original-files-for/279384#279384
[3] http://lzip.nongnu.org/tarlz.html
[4] http://mattmahoney.net/dc/zpaq.html
[5] https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/
>> If the `--ipfs` is used for `guix publish` then the encoded blocks are also
>> uploaded to the IPFS daemon. The nar could then be retrieved from anywhere
>> like
>> this:
>>
>> (use-modules (eris)
>> (eris blocks ipfs))
>>
>> (eris-decode->bytevector
>>
>> "urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE"
>> eris-blocks-ipfs-ref)
>>
>> These patches do not yet retrieve content from IPFS (TODO). But in principle,
>> anybody connected to IPFS can get the nar with the ERIS URN. This could be
>> used
>> to reduce load on substitute server as they would only need to publish the
>> ERIS
>> URN directly - substitutes could be delivered much more peer-to-peer.
>
> Nice. So adjusting ‘guix substitute’ should be relatively easy?
Yes, relatively! :)
I meant to send in a V2 that does this before going on holidays, but I'm
afraid I won't make it. V2 will come in early January!
>> Other transports that I have been looking in to and am pretty sure will work
>> include: HTTP (with RFC 2169 [3]), GNUNet, OpenDHT. This is, imho, the
>> advantage of ERIS over IPFS directly or GNUNet directly. The encoding and
>> identifiers (URN) are abstracted away from specific transports (and also
>> applications). ERIS is almost exactly the same encoding as used in GNUNet
>> (ECRS).
>
> As a first step, ‘guix publish’ could implement RFC 2169, too.
>
> I gather implementing the HTTP and IPFS backends in ‘guix substitute’
> should be relatively easy, right?
Yes, those seem to be the two easiest backends to implement.
>> A tricky things is figuring out how to multiplex all these different
>> transports and storages...
>
> Yes. We don’t know yet what performance and data availability will be
> like on IPFS, for instance, so it’s important for users to be able to
> set priorities. It’s also important to gracefully fall back to direct
> HTTP downloads when fancier p2p methods fail, regardless of how they
> fail.
Agree.
Thanks,
-pukkamustard