guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Preservation of Guix Report 2021-12-06


From: Timothy Sample
Subject: Preservation of Guix Report 2021-12-06
Date: Mon, 06 Dec 2021 14:59:20 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi Guix,

This is an update to the preservation of Guix report.  There are no new
commits or fixed-output derivations in this report, but I spent some
time cleaning up the results, and I think the improvements are worth
sharing.  The last report generated a lot of questions.  This one
doesn’t answer all of them, but it’s a big improvement:

    <https://ngyro.com/pog-reports/2021-12-06/>

Since the last report, I added many more reference categories and moved
them to the database.  The new categories are 'hg', 'svn', 'cvs', 'bzr',
'tar-bz2', 'tar', 'zip', and 'text'.  Of these, only 'tar' and 'text'
are being processed.  The rest are currently unsupported by my scripts.
Moving the categories to the database allows me to make manual
corrections when needed.  It also encouraged me to look through the
references a bit more carefully to track down some of the weirder 'text'
sources (like Bash patches) and fix up some other ones (in the style of
“/tar_gz?download=yes”).

I also made the fetching code more tenacious.  Now it uses the
content-addressed mirrors from Guix and Nix to find regular files, and
will recover “easy” Git references from SWH (“easy” means the commit is
specified).

Between improving the fetching code and adding 'tar' and 'text'
processing, I’ve computed another 2.5K SWHIDs.  We now have SWHIDs for
86% of our fixed-output derivations.  There are only 51 “unknown”
non-recursive Git sources now (the list is attached).

But that’s not all!

The scripts now categorize failures, so we have a better idea of what’s
going on with the remaining 14% “unknown” sources:

  no-ref:        13
  disarchive:   863
  fetch:       1262
  bail:        3324
  -----------------
  total:       5462

The “bail” category is all the stuff my scripts don’t yet process, like
Mercurial repositories and bzip2 tarballs.

The “fetch” category is everything the scripts couldn’t track down.

The “disarchive” category is all the tarballs Disarchive failed to
process.  An interesting thing here is that most of them are from Cargo.
Long story short: older versions of Cargo used the “miniz”
implementation of DEFLATE (rewritten in Rust) to compress tarballs.
Disarchive doesn’t support this (yet...?).  There are 686
old-Cargo-produced tarballs in the “disarchive” category.

The “no-ref” category covers a few fixed-output derivations used in
bootstrapping that do not come from an origin record.  I will probably
just load them by hand eventually.

(In the future I hope to put some of this in the report itself.)

One last thing to add is that the SWH folks were very quick to fix the
loading error, so the increase in missing sources for recent commits is
now gone.


-- Tim

Attachment: git-missing.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]