guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On raw strings in <origin> commit field


From: zimoun
Subject: Re: On raw strings in <origin> commit field
Date: Fri, 31 Dec 2021 14:15:11 +0100

Hi all,

On Fri, 31 Dec 2021 at 10:31, Ricardo Wurmus <rekado@elephly.net> wrote:

> I have no strong feelings for or against any of the proposed options.  I
> think that using raw commits might not be great for our tooling because
> we’re not reusing an existing version string and would need to remember
> to update the raw commit as well.  But other than that I don’t find the
> raw commit to introduce readability problems for humans.

By tooling, Ricardo, do you mean the ’importers’ and other ’updaters’?


Well, a general minor comment about readability and metadata.  The
anatomy of a package is:

--8<---------------cut here---------------start------------->8---
(define-public a-symbol
  (package
    (name "a-name")
    (version "1.2.3")
    (source (origin
              (method git-fetch)
              (uri (git-reference
                    (url "https://an-url.somewhere";)
                    (commit ????)))
              (file-name (git-file-name name version))
              (sha256
               (base32
                "09rdbcr8dinzijyx9h940ann91yjlbg0fangx365llhvy354n840"))))
    (build-system gnu-build-system)
    (home-page "https://another-url.somewhere)
    (synopsis "Guile extension for numerical arrays and tensors")
    (description "AIscm is a Guile extension for numerical arrays and tensors.
Performance is achieved by using the LLVM JIT compiler.")
    (license license:gpl3+)))
--8<---------------cut here---------------end--------------->8---

and here, a-symbol, a-name and various home-page, synopsis, description
are Guix specific.  They are metadata added by Guix packagers.

Version is also Guix specific.  Sometimes, we patch; for security
reasons, for fixing a bug, for quickly backporting something, for
removing non-free bits, for unbundling stuff, for making work with the
rest of Guix packages or for whatever other reasons – or we apply some
options for building specifically for Guix.  Then, the version “1.2.3”
is not always changed and therefore it does not necessary correspond to
what upstream refers as “1.2.3”, or what Debian calls “1.2.3”, etc.

The field ’version’ is Guix specific, at the same level of metadata as
’name’, ’home-page’, ’synopsis’ or ’description’.  Other said, these
fields only depend on choices made by the Guix packagers.


Then, the ’origin’ part is not Guix specific.  It is only upstream
specific.

Obviously, as packages distributor, the Guix specific ’version’ matches
as much as possible with what upstream refers as their version, most of
the time using the Git feature of tag.  This tag is upstream specific:
sometimes is “v1.2.3”, sometimes “1.2.3”, sometimes “release-1.2.3”,
sometimes “r1.2.3”, or whatever else.  We often map ’version’ to ’tag’
using ’string-append’.

For other methods that git-fetch, we also use a map, but instead, from
’version’ to URL, or from ’version’ to ’changeset’, or from ’version’ to
’revision’, etc.

On a side note, I miss why using commit hash is an issue for ’git-fetch’
– despite the fact of content-address advantages – when it seems not for
’svn-fetch’ as in:

--8<---------------cut here---------------start------------->8---
    (version "0.5.1")
    (source
     (origin
       (method svn-fetch)
       (uri (svn-reference
             (url (string-append
                   "https://code.call-cc.org/svn/chicken-eggs/";
                   "release/5/srfi-1/tags/"
                   version))
             (revision 39055)
             (user-name "anonymous")
             (password "")))
       (file-name (string-append "chicken-srfi-1" version "-checkout"))
       (sha256
        (base32
         "02940zsjrmn7c34rnp1rllm2nahh9jvszlzrw8ak4pf31q09cmq1"))))
--8<---------------cut here---------------end--------------->8---

or other example

--8<---------------cut here---------------start------------->8---
  (let ((revision 505)
        (release "1.09.01"))
    (package
      (name "fullswof-2d")
      (version release)
      (source (origin
               (method svn-fetch)
               (uri (svn-reference
                     (url (string-append "https://subversion.renater.fr/";
                                         "anonscm/svn/fullswof-2d/tags/"
                                         "release-" version))
                     (revision revision)))
               (file-name (string-append "fullswof-2d-" version "-checkout"))
               (sha256
                (base32
                 "16v08dx7h7n4wyddzbwimazwyj74ynis12mpjfkay4243npy44b8"))))
--8<---------------cut here---------------end--------------->8---


I let aside the readability point for git-fetch or any others since it
is only habits or more precisely collective conventions and a bit of
personal preferences. :-).


When we speak about robustness and long-term, the issue is the field
’uri’.  Having something extrinsic, i.e., which does not depend on the
content, as URL+tag or URL+revision or just URL leads to fragile
fetching methods depending on the Moon phase.

What Disarchive is currently doing for url-fetch is somehow to index by
integrity field, depending only on the content itself (sha256; usually
not using nix-base32 format referred as ’base32’ in ’origin’ but instead
’base16’ format, whatever).  In short and quickly said, Disarchive-DB
does 2 things more or less, first it somehow maps from this integrity
hash to swhid hash allowing to lookup in SWH archive and fetches the
data, and second it stores metadata, indexed by integrity field,
allowing to reassemble the content = data + metadata.

We were discussing to do this strategy for all the fetching methods.
And potentially add more than swhid hash as content-address systems;
somehow.

All the robustness now relies on the availability of the Disarchive
service.  Based on this context, what I miss in all the discussion is
that Git owns a built-in solution (commit hash) and the arguments for
not using it appears to me weak considering the easy advantage it
brings.

It is a difficult topic to know what information the ’uri’ field should
contain for robust long-term; a topic with a lot of unknowns, although
many solutions are around, they are a strong change of habits and
changing my own habits is already hard, so a collective change is a big
collective challenge. :-)

For instance, SWH promotes swhid instead of DOI for referencing the
publications.  I am not sure it is really popular outside a small French
subgroup. ;-)

Somehow, find some rationale –readability, matching versions, etc.– and
then find counter-measures of their flaws to keep extrinsic values –tag,
revision, etc.– is, for what my opinion is worth, not the correct level
or frame when thinking about robustness and long-term.


Cheers,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]