Re: Encoding for Robust Immutable Storage (ERIS)

gnunet-developers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding for Robust Immutable Storage (ERIS)

From:	pukkamustard
Subject:	Re: Encoding for Robust Immutable Storage (ERIS)
Date:	Sun, 26 Jul 2020 19:28:49 +0200
User-agent:	mu4e 1.4.10; emacs 26.3


Hello Christian,

Thank you for your comments!

For my taste, the block size is much too small. I understand 4kcan makesense for page tables and SATA, but looking at benchmarks 4k isstilltoo small to maximize SATA throughput. I would also worry about4k for arequest size in any database or network protocol. The overheadsperrequest are still too big for modern hardware. You could easilygo to8k, which could be justified with 9k jumbo frames for Ethernetand wouldat least also utilitze all of the bits in your paths. The 32kof ECRSare close to the 64k which are reportedly the optimum for modernM.2
media. IIRC Torrents even use 256k.

I agree that increasing block size makes sense for improvingperformance

in storage and transport.

The overhead from padding may be
large for very small files if you go beyond 4k, but you shouldalsothink in terms of absolute overhead: even a 3100% overheaddoesn'tchange the fact that the absolute overhead is tiny for a 1kfile.

The use-case I have in mind for ERIS is very small pieces of data(not

even small files). Examples include ActivityStreams objects or
OpenStreetMaps nodes.

Apparently the average size of individual ActivityStreams objectsisless than 1kB (unfortunately I don't have the data to back thisup).


I agree that the overhead of 3100% for a single 1kB object is

acceptable. But I would argue that an overhead of 3100% for verymany1kB objects is not. The difference might be a 32 GB databaseinstead of

a 1 GB database.

Furthermore, you should consider a trick we use in GNUnet-FS,which isthat we share *directories*, and for small files, we simply_inline_ thefull file data in the meta data of the file that is stored withthedirectory or search result. So you can basically avoid having toeverdownload tiny files as separate entities, so for files <32k wehave zero
overhead this way.


That makes a lot of sense.

But packing multiple objects into a single transport packet orgroupingfor storage on disk/in database works for small block sizes aswell. The

optimization just happens at a "different layer".

The key value I see in having small block sizes is that tinypieces of

data can be individually referenced and used (securely).

I'd be curious to see how much the two pass encoding costs inpractice-- it might be less expensive than ECRS if you are lucky(hashing onebig block being cheaper than many small hash operations), ormuch moreexpensive if you are unlucky (have to actually read the datatwice fromdisk). I am not sure that it is worth it merely to reduce thenumber ofhashes/keys in the non-data blocks. Would be good to have somedata onthis, for various file sizes and platforms (to judge IO/RAMcachingeffects). As I said, I can't tell for sure if the 2nd pass isvirtuallyfree or quite expensive -- and that is an important detail.Especiallywith a larger block size, the overhead of an extra key in thenon-data
blocks could be quite acceptable.

I think the cost of the two-pass encoding in ERIS is quiteexpensive.Considering that the hash of the individual blocks also needs tobecomputed (as reference in parent nodes), I think ECRS will alwayswin

performance wise.

Maybe the answer is not ECRS or ERIS but ECRS and ERIS. ECRS forlargepieces of data, where it makes more sense to have large block sizeandsingle-pass encoding. And ERIS for (very many) small pieces ofdata

where a 3100% overhead is too much but the performance penalty is
acceptable and size of data is much smaller than memory.

There might be some heuristic that says: If data is larger than2MB use

ECRS, else use ERIS and you get the verification capability.

If using ECRS, you can add the verification capability by encodingalist of all the hash references to the ECRS block with ERIS. TheERIS

read capability of this list of ECRS block is enough to verify the

integrity of the original ECRS encoded content (without revealingthe

content).

What do you think?

For 3.4 Namespaces, I would urge you to look at the GNU NameSystem(GNS). My plan is to (eventually, when I have way too much timeandcould actually re-do FS...) replace SBLOCKS and KBLOCKS of ECRSwith
basically only GNS.

I have been looking into it. It does seem to be a perfectapplication of

GNS.

The crypto is way above my head and using readily available andalreadyimplemented primitives would make implementation much easier forme. ButI understand the need for "non-standard" crypto and am followingthe

ongoing discussions.

-pukkamustard

On 7/10/20 8:59 AM, pukkamustard wrote:
Hello GNUNet,
I'd like to request feedback, questions and comments on anencoding ofcontent very much inspired by ECRS that I have been working on:Encoding
for Robust Immutable Storage (ERIS)

https://openengiadina.net/papers/eris.html
The motivation is to use the encoding in a social network likesettingswhere short messages and interactions are encoded using ERIS(as RDF
[1]).
There is one major difference to ECRS (and a couple smallerones) that I
would like to highlight:


** Verification capability
ERIS adds a verification capability. Holders of theverificationcapability can enumerate all blocks required to decode thecontent andverify integrity of the blocks without being able to decode thecontent.
This enables peers to cache the entire content without beingable to
read the content.

The verification capability is enabled by using two keys:

1. A read key to encode the blocks holding content.
2. A verification key (which is deterministically derived fromthe read
  key) to encode the intermediary nodes of the Merkle tree.
This makes the scheme slightly more complicated than ECRS andalsorequires a two-pass encoding (when using convergentencryption).
Nevertheless I believe this is a very important feature thatmayberesults in a better privacy/complexity/availability trade-offas alluded
to in a previous thread
(https://lists.gnu.org/archive/html/gnunet-developers/2020-05/msg00015.html).



** Block size
Block size is chosen to be 4kB. This an optimization towardssmall
content (short messages and social interactions).


** URN
Encoded content can be referred to by a URN making it usablefrom
existing Web (and RDF) settings. This could be added to ECRS.


** No namespacing / keyword search
There are currently no SBlock or KBlock like features. The ideais thatthese features can be built on-top of the base encoding(including
SBlock and KBlock).



We have a little JavaScript demo:
https://openengiadina.gitlab.io/js-eris/ . As well asimplementation in
Guile [2].

I'd be very happy for your insight and feedback.

Thanks!

-pukkamustard
[1]https://openengiadina.net/papers/content-addressable-rdf.html
[2] https://gitlab.com/openengiadina/data-model/

[Prev in Thread]

Current Thread

[Next in Thread]

Encoding for Robust Immutable Storage (ERIS), pukkamustard, 2020/07/10
- Re: Encoding for Robust Immutable Storage (ERIS), Cy, 2020/07/11
- Re: Encoding for Robust Immutable Storage (ERIS), Christian Grothoff, 2020/07/18
  - Re: Encoding for Robust Immutable Storage (ERIS), pukkamustard <=

Prev by Date: Re: GNUnet-jour-fixe: Debian/DPKG
Next by Date: gnurl 7.7.1.1
Previous by thread: Re: Encoding for Robust Immutable Storage (ERIS)
Next by thread: GNUnet 0.13.1 released
Index(es):
- Date
- Thread