guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On raw strings in <origin> commit field


From: Liliana Marie Prikler
Subject: Re: On raw strings in <origin> commit field
Date: Sat, 01 Jan 2022 20:52:25 +0100
User-agent: Evolution 3.42.1

Hi Timothy,

Am Samstag, dem 01.01.2022 um 12:45 -0500 schrieb Timothy Sample:
> If you want a concrete example to think through, there’s ‘eclib’. 
> Our package says it’s version “20190909”, but that’s not what
> upstream calls version “20190909”.  It looks like when we packaged
> ‘eclib’, that tag pointed to commit
> 19e7e3e74268bf78bd9a1c4ba07597d5434fb166, but now
> it points to bfbbd7c414521e1bf5e718a2925ea8ad845a2e87.
> 
> If you try to build ‘eclib’, everything will work great, since we can
> grab the checkout from our servers.  If you use
> 
>     $ guix build --check -S eclib
> 
> you get a hash mismatch.  We have CI jobs for sources, but they
> aren’t checking this: <https://ci.guix.gnu.org/build/319/details>. 
> That job succeeds after downloading the checkout from our servers.
With the robustness framework that is talked about here, this is only
partially robust.  If you have substitutes disabled, then a normal
`guix build -S eclib' also fails, and if CI eventually garbage collects
the source, the same happens for everyone.

If we simply hardcoded the hash on the other hand, none of that would
happen at all, you couldn't even use `guix build --check -S' as an
oracle.

> There are two things I can highlight from this case.
> 
> First, as expected, finding the original commit was painful.  SWH did
> not record the old version of the tag.  Comparing it with the
> checkout from our servers showed that the differences were very
> minor.  With that in mind, I moved backwards through the commit
> history with ‘guix hash’ until I found a match.  As pointed out many
> times, if I had the original commit, I could just ask SWH for it
> directly.
> 
> Second, these cases are very, very rare.  (I’ve essentially checked
> every Git origin since Guix version 1.0.0, and this problem is not
> one that worries me).  “Tricking Peer Review”-style problems seem to
> be much more prevalent.  When tracking down a “difficult” Git origin,
> the first thing I do is grep the Guix Git history for a “oops I
> committed the wrong hash” message.  I recommend we focus our energies
> there before worrying too much about replacing tags with commits or
> using both or whatever.
Since you are our expert on preservation, would you mind if I ask you
for some estimates on how painful it is to track down such commits in
general, if it could be made easier were you to record tag → commit
(alternatively file-name x sha256 → SWHID) maps periodically (or if you
already have such a map and those arise while creating it), and how
many “Tricking Peer Review”-style problems you think are currently
around?

> > > Regarding "Tricking Peer Review": I think it would be ideal for
> > > package definitions to include both the git tag _and_ the git
> > > commit hash, and to teach our linter to raise an alarm when the
> > > expected tags are missing or fail to match the expected commit
> > > hash.
> > 
> > That is among the solutions I've proposed here, so naturally I'd be
> > fine with it.
> 
> Given what I wrote above, maybe we could start by updating the linter
> so that ‘check-source’ actually checks that it gets the right result.
> Right now it uses a few heuristics to check that the result looks
> okay (for instance, it checks if the result is suspiciously small). 
> Maybe it should just go through the whole download process and verify
> the hash?  Alternatively (or additionally), the CI “source”
> specification could be configured to avoid using our servers as a
> fallback when checking sources.
I think substitutes should be disabled for the source download of a
"check-source".  Even if a substitute or SWH fallback exists, that's
not what we want to check here, no?

> I agree that adding more identifiers (commit hashes or whatever)
> makes things more robust, but the cost is more work when creating,
> updating, and reviewing packages.  I think we should start by
> verifying the identifiers we already have (i.e., checking that the
> URI and method of the origin produce the right output).  It would
> solve many existing problems and would serve as a nice foundation for
> future improvements.
Is this something we can reasonably expect our current CI or CI in
general to handle (assuming we tweaked the linter to behave as you
intend?)  Or would it make more sense to implement this as a
weekly/monthly cronjob?

> And as a bonus, if you want to be really kind to future time
> travellers, when fixing an errant hash, please include a nice hint as
> to what the original hash was for (like a commit hash).  We have
> commit ca5a791f6285b08506ccd662d5911ccf0c4d1ece in our repo, which
> says:
> 
> > The previous hash was from the "dev" branch of the repository.
> 
> I can’t find the source for the previous hash, and if I could
> actually travel through time, I would change the commit message to:
> 
> > The previous hash was from commit abcd0123..., which comes from the
> > "dev" branch of the repository.
+1 from me for useful commit messages.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]