guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to compute SWHID? (with Guix/Disarchive)


From: Timothy Sample
Subject: Re: How to compute SWHID? (with Guix/Disarchive)
Date: Mon, 06 Dec 2021 10:18:38 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> zimoun <zimon.toutoune@gmail.com> skribis:
>
>> Giving a look at Disarchive, I found how to compute Git-based
>> serialization hash and somehow serialization methods of "guix hash"
>> needs some clearning; considering '--recursive' is 'nar' serialization
>> which is a better name.  Anyway, see [1]. :-)
>
> Neat!
>
>> I would like to add SWH-based serialization hash but I do not find if
>> a function already does the hard work.  Any pointer?
>
> I think it’s ‘git-hash-directory’ in (disarchive git-hash).

That’s the one.  I only know what SWH does for a few cases:

  • directory: Use their version of ‘git-hash-directory’.

  • file: Use their version of ‘git-hash-file’ (resulting in a ID like
    “swh:1:cnt:...”).  I don’t know if they ingest regular files like
    this, but if they ingested the file through another means, it will
    have that ID.

  • git: Read the directory ID from the Git database.  This is
    essentially ‘git rev-parse HEAD:’, where the colon at the end tells
    Git to get the “tree” (directory) ID rather than the commit ID.
    (I’m not sure if guile-git supports this; so far I’ve just been
    shelling out to Git.)

  • hg: Use their version of ‘git-hash-directory’ excluding the “.hg”
    directory.

In my work, I’ve been strict about keeping the Git directory IDs based
on the Git database (“.git”) rather than computing them using
‘git-hash-directory’.  Since Guix deletes the Git database before
putting a checkout in the store, that option may not be available to you
(unless you download the repository again).  I’m not sure how much of
problem this would be in practice.  There may be a few edge cases with
submodules and “.gitattributes” to watch out for.

My guess is that as it stands, if a repo has a “.gitattributes” file,
running ‘git-hash-directory’ on the checkout will produce a directory ID
that SWH doesn’t have (they will ignore it, but we will include it).  A
corollary of this guess is that our SWH fallback code for Git will fail
for a repo that has a “.gitattributes” file, since we include it in the
nar hash, but SWH will not provide it.  (I say “guess” because this is
based on some stuff I observed when writing that procedure several
months ago – I haven’t verified any of this.  See also
<https://issues.guix.gnu.org/48540>, which is the same problem but with
submodules instead of “.gitattributes”).

Sorry but all I have to offer is doom and gloom on this one.  :(  You
might be able to get ‘git-hash-directory’ to work well enough on the Git
checkouts that Guix puts in the store, but you’ll have to be careful!

> The other day I learned that the Git CLI ignores empty directories, but
> the Git format itself has nothing against empty directories.  Thus SWH
> serializes in exactly the same way as Git.
>
> (Can you confirm, Timothy?)

I can confirm that a Git tree node of the form

    40000 empty-directory 4b825dc642cb6eb9a060e54bf8d69288fbee4904

theoretically represents an empty directory named “empty-directory”.
The hash is computed like this:

    $ printf 'tree 0\0' | sha1sum
    4b825dc642cb6eb9a060e54bf8d69288fbee4904  -

I don’t know anything about where Git excludes this or what would happen
if you manually constructed a Git repo with empty directories, though!


-- Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]