[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Preservation of Guix Report
From: |
zimoun |
Subject: |
Re: Preservation of Guix Report |
Date: |
Thu, 21 Oct 2021 09:39:27 +0200 |
Hi Timothy,
On Wed, 20 Oct 2021 at 15:48, Timothy Sample <samplet@ngyro.com> wrote:
> Early this summer I did a bunch of work trying to figure out which Guix
> sources are preserved by the SWH archive. I’m finally ready to share
> some preliminary results!
>
> https://ngyro.com/pog-reports/2021-10-20/
Cool! Really interesting.
> What’s cool is that the report is automated. Next on my list is to
> update the database and generate a new report. Then, we can compare the
> results and see if we are improving. (My read on the results so far is
> that improving “sources.json” will yield big improvements, but we might
> not be able to get to that before the next report.)
Here two minor comments:
1. Since a couple of days, I run:
$ GUIX_SWH_TOKEN=$TOKEN guix lint -c archival
where $TOKEN is provided by the SWH Authentication service [1].
Instead of a rate limit at 120, it is 1200. Therefore, more
’git-fetch’ packages are added. I am in the process to automate
that but do not hold your breath. :-)
2. For still unknown reasons, the bridge between SWH and Disarchive has
some holes. For instance,
$ guix lint -c archive znc
gnu/packages/messaging.scm:996:12: znc@1.8.2: Disarchive entry refers
to non-existent SWH directory '33a3b509b5ff8e9039626d11b7a800281884cf2a'
$ wget https://guix.gnu.org/sources.json
$ cat sources.json | jq | grep znc
"integrity": "sha256-IwbxlQzncsWlmlf1SG1Zu5yrmEl8RfxJy8RawN7BGbs="
"integrity": "sha256-q0jatpd+j0PW//szIo0ViGX2jd5wJtEjxpPXcznc8rs="
"https://znc.in/releases/archive/znc-1.8.2.tar.gz"
$ guix download https://znc.in/releases/archive/znc-1.8.2.tar.gz
Starting download of /tmp/guix-file.hnjWTE
From https://znc.in/releases/archive/znc-1.8.2.tar.gz...
znc-1.8.2.tar.gz 2.0MiB 599KiB/s
00:03 [##################] 100.0%
/gnu/store/58khbiwp2ghhzg00gnzdy2jlfv49vajm-znc-1.8.2.tar.gz
03fyi0j44zcanj1rsdx93hkdskwfvhbywjiwd17f9q1a7yp8l8zz
Therefore, something is wrong somewhere. Because of #1, I detect
many of such examples. I do not know if SWH-ID computed by
Disarchive is incorrect or if SWH has not ingested. Investigations
required. :-)
1: <https://archive.softwareheritage.org/api/>
> It’s surprising to me that SWH is not already getting these from
> “sources.json”. I picked an arbitrary one, “rust-quote-0.6”, and it’s
> simply not in “sources.json”. On the other hand, I bet SWH would like a
> crates.io (and CRAN, etc.) loader, too.
>From the SWH doc, there is a CRAN lister [2] but I have not checked what
they ingest concretely. Because on our side, we are using ’url-fetch’
and it appears to me possible to have a tiny mismatch between what is
inside the release tarball (what we concretely use) vs what SWH ingests
directly from CRAN.
2:
<https://docs.softwareheritage.org/devel/apidoc/swh.lister.cran.html?highlight=cran#module-swh.lister.cran>
And answering to your question [3] about “sources.json”, I think the
ingestion started after this commit
35bb77108fc7f2339da0b5be139043a5f3f21493 from guix-artwork. Other said,
SWH started to ingest from “sources.json” after July 2020; probably
around September 2020.
3: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00141.html>
> One other way to help would be to suggest improvements to the report. I
> don’t want to fiddle with it too much, but if there is some simple graph
> or table or list that should be there, I’m happy to give it a go.
For the Missing and Unknown fields, could you distinguish the kind of
origin? Is it mainly git-fetch or url-fetch or others?
It would help to spot the issues to work on it (sources.json, SWH side,
Disarchive, etc.).
Cheers,
simon