guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On raw strings in <origin> commit field


From: zimoun
Subject: Re: On raw strings in <origin> commit field
Date: Wed, 29 Dec 2021 09:39:19 +0100

Hi,

On Tue, 28 Dec 2021 at 21:55, Liliana Marie Prikler <liliana.prikler@gmail.com> 
wrote:

> Consider a package being added or updated in Guix.  At the time of
> commit, we have the tag v1.2.3 pointing towards commit deadbeef.  We
> therefore create a guix package with version "1.2.3" pointing to said
> commit (either directly or indirectly).  At this point, one of the
> following holds:
>   (1) Guix "1.2.3" -> upstream "v1.2.3" -> upstream "deadbeef"
>   (2) Guix "1.2.3" -> upstream "deadbeef" <- upstream "v1.2.3"
> From either, we can follow that Guix "1.2.3" = upstream "v1.2.3".  If
> upstream keeps their tags around, then both forms are equivalent, but
> (1) is more convenient; it allows us to derive commit from version,
> which is often done through an affine mapping.

No, tags and hash commit are not equivalent.  Hash commit is intrinsic:
it only depends on the content.  Whereas, tags are extrinsic, they
depend on external choice.

>From the content to the hash, three keys: 1) how to serialize and 2) how
to hash and 3) how to represent the hash.  For #1, Git uses their own
serializer and Guix, inheriting from Nix, uses another (Nar); although
the difference is minor.  For #2, Git uses by default SHA-1 as hash
function, although Guix uses SHA-256.  And for #3, Git uses hexadecimal
format and Guix uses nix-base32.

The subcommand “guix hash” with the options ’-S, -H’ and ’-f’ exposes
these 3 keys.  For instance:

        $ cat /tmp/foo.txt | git hash-object --stdin
        557db03de997c86a4a028e1ebd3a1ceb225be238
        $ ./pre-inst-env guix hash -S git -H sha1 -f hex /tmp/foo.txt
        557db03de997c86a4a028e1ebd3a1ceb225be238


To make it explicit, the checksum hash of ’git-reference’ could be
removed because it is somehow redundant with the commit hash.
Obviously, it cannot because security reason (SHA-1 is considered as
weak).


> Problems arise, when upstreams move or delete tags.  At this point,
> guix packages that use them break and are no longer able to fetch their
> source code.  Raw commits are in principle resilient to this kind of
> denial of service; instead upstreams would have to actually delete the
> commits themselves, including also possible backups such as SWH to
> break it.  There is certainly an argument for robustness to be made
> here, particularly concerning `guix time-machine', though as noted it
> is not infallible.  

SWH provides ’swh:id’ which is another triplet (really close to Git).
Basically, content means data and metadata and to make it short, SWH
deals their way with metadata for reason of large scale.  And SWH does
snapshots of Git repositories.

Therefore, to have something really robust, Guix has to rely on a map
from package definition to SWH.

Using Git commit hash instead of tag makes this map.  For tag, to have
something robust, we need an external map from checksum hash to SWH hash
via Git commit hash.  This “external” is done by Disarchive.


> Long-term, we might want to support having multiple <git-references> in
> git-fetch -- if the first one fails due to a hash mismatch, we would
> warn about that instead of producing an error and thereafter continue
> with the second, third, etc. similar to how we currently have mirror://
> urls for some well-known mirrored repositories.  That way, we have a
> system to warn us about naughty upstreams while also providing
> robustness for the time machine.

I think the long term is to completely remove tag and only use commit
hash; as done for ’guile-aiscm’.  But it will not happen for convenience
reasons, I guess.

What you are proposing is to mix extrinsic (tag, URL, etc.) with
intrinsic (commit hash, checksum hash, etc.).  Well, I do not know if
this proposed fallback mechanism would ease the maintenance and would
make Guix more robust.

To me, robustness means make a map from intrinsic values to content; as
Disarchive is doing for instance.


Cheers,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]