guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Identical files across subsequent package revisions


From: pukkamustard
Subject: Re: Identical files across subsequent package revisions
Date: Tue, 29 Dec 2020 21:01:33 +0100
User-agent: mu4e 1.4.13; emacs 27.1


Hi Ludo,


Thoughts?  :-)


Super cool! :)

Your research inspired me to do conduct some experiments towards
de-duplication.

For two similar packages (emacs-27.1 and emacs-26.3) I was able to
de-duplicate ~12% using EROFS and ERIS. Still far from the ~85%
similarity, but an attempt I'd like to share.

The two main ingredients:

- EROFS (Enhanced Read-Only File-System) is a read-only, compressed file-system comparable to SquashFS. It has some properties that make
 it more suitable than SquashFS (it aligns content to fixed block
 size). EROFS is in mainline Linux Kernel since v5.4.

- ERIS (Encoding for Robust Immutable Storage) is an encoding of content into uniformly sized blocks that I've been working on. It de-couples encoding of content from storage and transport layer. Transport layers can be things like IPFS, GNUNet, Named Data Network or just a plain
 old HTTP service.

I make EROFS images of the packages and encode them with ERIS, which
de-duplicates blocks as part of the encoding process.

With this I manage to de-duplicate between 12-17% (depending on some
parameters).

This could allow:

- Directly mounting packages instead of unarchiving (a la distri)
- Peer-to-peer distribution of packages (that's what ERIS is for)
- De-duplicating common content in packages to a certain extent (topic
 of this thread)

A more in-depth write-up:
https://gitlab.com/openengiadina/eris/-/tree/main/examples/dedup-fs

Happy Hacking!
-pukkamustard




reply via email to

[Prev in Thread] Current Thread [Next in Thread]