Re: On raw strings in <origin> commit field

guix-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On raw strings in <origin> commit field

From:	Liliana Marie Prikler
Subject:	Re: On raw strings in <origin> commit field
Date:	Fri, 31 Dec 2021 16:19:55 +0100
User-agent:	Evolution 3.42.1

Hi,

Am Freitag, dem 31.12.2021 um 14:15 +0100 schrieb zimoun:
> [...]
> Version is also Guix specific.  Sometimes, we patch; for security
> reasons, for fixing a bug, for quickly backporting something, for
> removing non-free bits, for unbundling stuff, for making work with
> the rest of Guix packages or for whatever other reasons – or we apply
> some options for building specifically for Guix.  Then, the version
> “1.2.3” is not always changed and therefore it does not necessary
> correspond to what upstream refers as “1.2.3”, or what Debian calls
> “1.2.3”, etc.
I think this is generally what you expect from a distro.  From my
personal experience with Debian-based distros and Gentoo, they do
sometimes need to use revisions for when they update their own patches,
but we do have the upper hand here with `guix describe'.

> [...]
> 
> On a side note, I miss why using commit hash is an issue for ’git-
> fetch’ – despite the fact of content-address advantages – when it
> seems not for ’svn-fetch’ as in:
> 
> --8<---------------cut here---------------start------------->8---
>     (version "0.5.1")
>     (source
>      (origin
>        (method svn-fetch)
>        (uri (svn-reference
>              (url (string-append
>                    "https://code.call-cc.org/svn/chicken-eggs/";
>                    "release/5/srfi-1/tags/"
>                    version))
>              (revision 39055)
>              (user-name "anonymous")
>              (password "")))
>        (file-name (string-append "chicken-srfi-1" version "-
> checkout"))
>        (sha256
>         (base32
>          "02940zsjrmn7c34rnp1rllm2nahh9jvszlzrw8ak4pf31q09cmq1"))))
> --8<---------------cut here---------------end--------------->8---
> 
> or other example
> 
> --8<---------------cut here---------------start------------->8---
>   (let ((revision 505)
>         (release "1.09.01"))
>     (package
>       (name "fullswof-2d")
>       (version release)
>       (source (origin
>                (method svn-fetch)
>                (uri (svn-reference
>                      (url (string-append
> "https://subversion.renater.fr/";
>                                          "anonscm/svn/fullswof-
> 2d/tags/"
>                                          "release-" version))
>                      (revision revision)))
>                (file-name (string-append "fullswof-2d-" version "-
> checkout"))
>                (sha256
>                 (base32
>                 
> "16v08dx7h7n4wyddzbwimazwyj74ynis12mpjfkay4243npy44b8"))))
> --8<---------------cut here---------------end--------------->8---
> 
> I let aside the readability point for git-fetch or any others since
> it is only habits or more precisely collective conventions and a bit
> of personal preferences. :-).
I don't think the SVN comparison here is fair.  For one, the examples
reference the tagged revision doubly -- once by revision, once by tag.
We don't have a way of doing that for git (currently).  Plus, an SVN
revision does have more intrinsic meaning than a git hash.

> When we speak about robustness and long-term, the issue is the field
> ’uri’.  Having something extrinsic, i.e., which does not depend on
> the content, as URL+tag or URL+revision or just URL leads to fragile
> fetching methods depending on the Moon phase.
> 
> What Disarchive is currently doing for url-fetch is somehow to index
> by integrity field, depending only on the content itself (sha256;
> usually not using nix-base32 format referred as ’base32’ in ’origin’
> but instead ’base16’ format, whatever).  In short and quickly said,
> Disarchive-DB does 2 things more or less, first it somehow maps from
> this integrity hash to swhid hash allowing to lookup in SWH archive
> and fetches the data, and second it stores metadata, indexed by
> integrity field, allowing to reassemble the content = data +
> metadata.
> 
> We were discussing to do this strategy for all the fetching methods.
> And potentially add more than swhid hash as content-address systems;
> somehow.
You're also missing the part in which it currently relies on a single
server to do all this, but there are plans to move it out to multiple
ones, i.e. adding fallbacks/redundancy to your fallback mechanism,
which for the record is a good idea to have.

> All the robustness now relies on the availability of the Disarchive
> service.  Based on this context, what I miss in all the discussion is
> that Git owns a built-in solution (commit hash) and the arguments for
> not using it appears to me weak considering the easy advantage it
> brings.
> 
> It is a difficult topic to know what information the ’uri’ field
> should contain for robust long-term; a topic with a lot of unknowns,
> although many solutions are around, they are a strong change of
> habits and changing my own habits is already hard, so a collective
> change is a big collective challenge. :-)
We're going back to Cantor's argument for raw commits.  I'm not opposed
to using commits as value of the commit field (let-bound commits
reflected in the version, that is), but let's not forget that this
robustness argument still presupposes that the (commit tag) binding is
the point of failure.  This probably holds to some degree for "npm-
something", but we also have a fair amount of e.g. GNOME-related
packages which we trust to have robust tags and the only reason we
don't use mirror://gnome to refer to them is because it's not in GNOME
mirrors (yet). 

> For instance, SWH promotes swhid instead of DOI for referencing the
> publications.  I am not sure it is really popular outside a small
> French subgroup. ;-)
Completely off-topic, but isn't part of the point of DOIs that you can
fetch the revised paper as well?  I can understand putting OpenData
behind an SWH ID rather than a DOI, but the paper itself?  Why?

> Somehow, find some rationale –readability, matching versions, etc.–
> and then find counter-measures of their flaws to keep extrinsic
> values –tag, revision, etc.– is, for what my opinion is worth, not
> the correct level or frame when thinking about robustness and long-
> term.
For what it's worth, I don't think content addressing everything
(particularly relying on a single service to do so) is robust in the
long term, it just introduces larger failure points.  The only robust
way of increasing robustness is to add more fallbacks and redundancies
(and actually use them).

Cheers

[Prev in Thread]

Current Thread

[Next in Thread]

Re: On raw strings in <origin> commit field, (continued)
- Re: On raw strings in <origin> commit field, Mark H Weaver, 2021/12/29
  - Re: On raw strings in <origin> commit field, zimoun, 2021/12/30
  - Re: On raw strings in <origin> commit field, Liliana Marie Prikler, 2021/12/30
    - Re: On raw strings in <origin> commit field, Taylan Kammer, 2021/12/31

Prev by Date: Re: On raw strings in <origin> commit field
Next by Date: Re: On raw strings in <origin> commit field
Previous by thread: Re: On raw strings in <origin> commit field
Next by thread: Re: On raw strings in <origin> commit field
Index(es):
- Date
- Thread