guix-devel
[Top][All Lists]

Re: On raw strings in <origin> commit field

 From: Liliana Marie Prikler Subject: Re: On raw strings in commit field Date: Sat, 01 Jan 2022 02:33:13 +0100 User-agent: Evolution 3.42.1

```Am Freitag, dem 31.12.2021 um 18:36 -0500 schrieb Mark H Weaver:
> Hi Liliana,
>
> Liliana Marie Prikler <liliana.prikler@gmail.com> writes:
> > In my personal opinion, the version+raw commit style can be
> > discredited using Cantor's diagonal argument.
>
> You've mentioned Cantor's diagonalization argument at least twice in
> this thread so far, but although I'm familiar with that kind of
> argument for showing that certain sets are uncountable, I don't
> understand how it applies here.  Can you please elaborate?
Okay, so let's write out the full argument.  At a certain date, we
package or update P to version V through reference to tag T (at commit
C).  Because we can't trust T to remain valid, we only keep (V, C)
around for "robustness".

Now notice, how version V is generated by referring to T.  Without loss
of generality, assume that T is invalidated, otherwise nothing to
prove.  Since V is created through reference to T, it is also
invalidated as being the canonical V, whichever it is.  A similar
argument can be made for C as well.  So both (V, C) are invalidated and
the only thing we can claim is "yeah, upstream did tag that T at some
point".

Let us now assume, that T is never invalidated.  In this case (V, C)
remain robust for all observable time, but so would (V, T).  Hence
there is no robustness to be gained in this scenario.

Now what if we were to instead define V' := (B, N, C') with N being a
number to order the different Cs under B and C' being the first few
bytes of C.  Since V' clearly points to C, there is a clear link
established between the two even if T is lost at some point and we
coincidentally have B := clean(T) for some cleaning function clean.

Now obviously V' is exactly what git-version does and there are some
problems with it if we move back to the real world.  For one, I don't
think our updater would currently detect that upstream moved T to a
newer commit, whereas using tag for commit makes us notice breakages
loudly (too loudly as some argue, hence the move away from it).
However, since I'm a "people first, machines second" girl, I am willing
to ignore this minor inconvenience and take the robustness if that's
the extent of the issues it brings.

To state something that probably hasn't gotten enough attention here,
my main problem is not that we are adding robustness by using commits
in the commit field more often, my problem is that we're using raw
commits when the version field would suggest we're using a tag.  One
could raise the issue that long versions would become unreadable and
this is largely a non-issue on the command line, but assuming that it
is, I did provide other potential solutions.

So the main question here is: Do we really want raw strings in the
commit field?  Is there no better way of providing "robustness"?

```