[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On raw strings in <origin> commit field

From: zimoun
Subject: Re: On raw strings in <origin> commit field
Date: Thu, 30 Dec 2021 13:43:40 +0100

Hi Liliana,

On Wed, 29 Dec 2021 at 21:25, Liliana Marie Prikler <> 
> Am Mittwoch, dem 29.12.2021 um 09:39 +0100 schrieb zimoun:
>> On Tue, 28 Dec 2021 at 21:55, Liliana Marie Prikler
>> <> wrote:

> The notion of equivalence I am using here is the same as in the
> statement "5 ≡ 2 mod 3", wherein the ≡ symbol is ironically called
> IDENTICAL TO in Unicode despite being used very differently in
> mathematics.  Perhaps there is a language barrier here; in German we
> read that as "5 is equivalent to 2 modulo 3" and logic equivalence
> functions similarly.

I do not understand against what you are arguing so I skip it. :-)

> For the record, one could argue that I should have used that symbol for
> comparing Guix "1.2.3" to upstream "v1.2.3" because they are in fact
> not equal, only equivalent, but that's besides the point.  The point
> is, with an upstream behaving as we want upstreams to behave (not just
> git ones, url-fetch suffers from the same issue with moving tarballs
> for instance), you can substitute one for the other without a change in
> meaning; both will fetch the same commit.

If I understand you correctly:

 - Guix "1.2.3" means the field ’version’
 - upstream “v1.2.3” means the upstream tag used by the field ’commit’
   of ’git-reference’.

and yes it is strongly expected that these both fields matches. :-) But
it is irrelevant, IMHO, to your initial message «commit tags are in
principle mutable and hence can not be relied on when fetching sources.
I do have a few issues with that explanation».  It is fortunate and not
robust that ’commit’ matches ’version’ via upstream ’tag’.

Because how ’commit’ and ’tag’ are defined is different.

I cannot tell it differently than: Git commit depends only on the
content, although ’tag’ not.

Version (or tag) is convenient names for humans.  It is easier to tell
version 0.23.1 than
09rdbcr8dinzijyx9h940ann91yjlbg0fangx365llhvy354n840.  And we can deduce
that 0.22.3 is older than 0.23.1, when it is impossible for commits.

If you prefer to keep the frame: «you can substitute one for the other
without a change in meaning», then, for what my opinion is worth on that
matter, my probably wrong understanding of your words is that perhaps
you are missing a point about content-addressability.

>> From the content to the hash, three keys: 1) how to serialize and 2)
>> how to hash and 3) how to represent the hash.  For #1, Git uses their
>> own serializer and Guix, inheriting from Nix, uses another (Nar);
>> although the difference is minor.  For #2, Git uses by default SHA-1 as
>> hash function, although Guix uses SHA-256.  And for #3, Git uses
>> hexadecimal format and Guix uses nix-base32.


>> To make it explicit, the checksum hash of ’git-reference’ could be
>> removed because it is somehow redundant with the commit hash.
>> Obviously, it cannot because security reason (SHA-1 is considered as
>> weak).
> The other way also works.  If Git used a secure hashing function such
> as SHA-256 (or SHA-512 or Keccak) and Guix supported that hash, we
> could generate a git hash from the Guix hash (assuming also we allow
> the origin serializer to be configured, which would be required either
> way).

Yes somehow.  To be on the same wavelength, we need to be precise when
we speak about hash here because hash means:

 - serializer: how to deal with all the bits making the full content
   (files, folder, tree, etc.)
 - hashing function
 - format

So yes, on principles, instead of NAR + SHA-256 + Nix-base32, the Guix
project could have chosen Git + SHA-1 + Hex, or Git + SHA-512 + Base64
or any other combinations.

(I think this choice inherited from Nix is rooted in daemon
implementations and another triplet would have been more changes when
starting Guix, I guess.)

However, knowing only the final Guix checksum hash (NAR + SHA-256 +
Nix-base32), say 09rdbcr8dinzijyx9h940ann91yjlbg0fangx365llhvy354n840,
you can easily replace by any other formats (Hex or Base64), but it is
not straightforward to compute the Git commit hash (here
c78b91edb7c17c6fbf3b294452f44e91d75e3c67) from this Guix checksum hash,
because the serializer NAR and Git have minor differences, and mainly
because one uses SHA-256 and the other SHA-1 – and it is generally not
possible to convert the hash from one hashing function to another
hashing function.

To make it short, my point is: a) a Git commit hash owns the same
properties as any checksum hash and b) a string tag is obviously not a

> I don't know too much about Disarchive here, so please enlighten me. 
> If it used a pair of origin file name + hash, whether or not the git-
> reference uses tags would be irrelevant, no?  Do we have to take values
> from the uri field?

I am not sure to understand the questions.  Maybe the thread starting
here is worth:


Otherwise, could you explain more what you have in mind?

>> To me, robustness means make a map from intrinsic values to content;
>> as Disarchive is doing for instance.
> See above, I don't understand why Disarchive would need more than the
> content hash as an intrinsic value to do so.

Basically nothing more, so nothing to understand. :-)

Your initial messages started with:

        when Ricardo recently added guile-aiscm to Guix, I was confused
        that both the version field of the package and the commit field
        of the git- reference used in its origin.  It turns out, that
        this is a rare pattern observed in less than 200 packages
        currently in Guix.  The reason to do so (as far as I understand
        and was explained to me in IRC) is that commit tags are in
        principle mutable and hence can not be relied on when fetching
        sources.  I do have a few issues with that explanation, but
        before that let's go a step back and discuss the relation of
        version and commit.

and my intent was to point the reason is not really the “mutable” part
but the reason is because it is better to rely on intrinsic values
(discussed in link above).  Obviously, intrinsic value is immutable but,
IMHO, intrinsic value is somehow a key-point for lookup in
content-address systems.  Git-commit hash is one way, SWH-ID is another,
IPFS uses another, GNUnet another, etc.  The recent ERIS [1,2] is an
attempt to bridge, IIUC.

Addressing ’origin’ by intrinsic values implies which ones and The Right
Thing is really hard to predict.

My opinion is that robust long-term – i.e., near future I want – is to
rely on more intrinsic values in ’source’ or ’origin’ and less tags,
urls, etc.  Well, I am fine if we disagree.  You asked «What do y'all
think?», now you know what I think. :-)

Last, sorry if I am misunderstanding you, back to your initial message.
You provided ’guile-aiscm’ as one example of something that confused
you.  Instead of the current definition, you would like this definition

--8<---------------cut here---------------start------------->8---
1 file changed, 1 insertion(+), 1 deletion(-)
gnu/packages/machine-learning.scm | 2 +-

modified   gnu/packages/machine-learning.scm
@@ -299,7 +299,7 @@ (define-public guile-aiscm
               (method git-fetch)
               (uri (git-reference
                     (url "";)
-                    (commit "c78b91edb7c17c6fbf3b294452f44e91d75e3c67")))
+                    (commit (string-append "v" version))))
               (file-name (git-file-name name version))
--8<---------------cut here---------------end--------------->8---

?  Or something like along these lines,

--8<---------------cut here---------------start------------->8---
(define-public guile-aiscm
  (let ((version "0.23.1")
        (commit "c78b91edb7c17c6fbf3b294452f44e91d75e3c67")
        (revision "0"))
      (name "guile-aiscm")
      (version (git-version version revision commit))
      (source (origin
                (method git-fetch)
                (uri (git-reference
                      (url "";)
                      (commit commit)))
                (file-name (git-file-name name version))
--8<---------------cut here---------------end--------------->8---

?  And your point is that “0.23.1” is redundant with
“c78b91edb7c17c6fbf3b294452f44e91d75e3c67” because Git so why not just
use “0.23.1” in ’origin’.  Right?

In the current matter of facts, I do not think any rationale can be made
in favor of one of the three main possible definitions (addressing by
tag, by commit, using let).  The only weak justification for addressing
using commit hash is that the lookup when fallbacking to SWH is easier,
i.e., it is easier when the Git-commit hash is known instead of URL+tag.

These 200 packages can also be seen as real-world experiments
complementing the other ways of addressing in order to find The Right
Way for robust addressing.

My personal preference, for what it is worth, is an explicit reference
to the commit, i.e., the current definition or the ’let’ one.  Note it
was also discussed this: have convenient things as url+tag for ’uri’ and
use checksum coupled to an external service as;
but the definitions would be not self-consistent anymore.  Heh, The
Right Thing is not obvious. :-)

Other said, version and tag are currently first-class when commit is
second-class, somehow.  As you said «it allows us to derive commit from
tag» (tag is mine).  And I think it is inherited from the long history
about releasing software which is now somehow inadequate these days.
Obviously, I do not know how to do but it should be the contrary: commit
first-class which allows us to derive version second-class.

1: <>
2: <>


PS: You said in initial email «(1) is more convenient; it allows us to
derive commit from version, which is often done through an affine

I do not understand the “affine mapping”.  Why would it be an affine
mapping?  Well, I miss what is the affine space here, I am able to
imagine the set but what would be the vector space?  Bah you are
probably referring to maths I have never studied. :-)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]