bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNULIB_REVISION


From: Simon Josefsson
Subject: Re: GNULIB_REVISION
Date: Thu, 25 Apr 2024 18:26:23 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Bruno Haible <bruno@clisp.org> writes:

> Hi Simon,
>
>> you can ... via
>> GNULIB_REVISION pick out exactly the gnulib git revision that libpaper
>> needs. ...
>> [1] 
>> https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/
>> [2] https://salsa.debian.org/auth-team/libntlm/-/tree/master/debian
>
> I see GNULIB_REVISION as an obsolete alternative to git submodules, and
> would therefore discourage rather than propagate its use.

I think it will be challenging for gnulib to insists on always being
used as a git submodule, and I would prefer if we continue support
multiple ways of working.  Personally I have been migrating towards
gnulib git submodules because most other projects use gnulib like that,
but I've never really felt comfortable with them.  Some of the concerns
I have:

- git submodules leads to -- in my subjective opinion -- complexity
  which leads to a worse user experience for developers.  I have learned
  to work with git submodules over the years, but it was a hurdle that I
  don't want to force on everyone.

- the gnulib git submodule is huge.  Not rarely I get out of memory
  errors during 'git clone' in CI/CD jobs.  I can restart the jobs
  manually, but this indicate that there is a resource drain here.  For
  a tiny project like libntlm the imbalance if the small project code
  and large gnulib is troubling.

- often CI/CD platforms have different ways of working with git
  submodules which adds complexity which leads to bugs.  Allowing
  maintainers to decide if they want to work with git submodules or not
  seems like a good thing.

- we don't offer any way for people receiving tarballs to learn which
  gnulib git commit was used (you noticed this too below) but with a
  GNULIB_REVISION approach this is part of the tarball, just like any
  other versioned dependency on autoconf, automake etc

- I think gnulib could be regarded as any other external dependency,
  just like autoconf, automake, libtool etc that also generate files in
  my build tree during bootstrapping.  I don't put autoconf as a git
  submodule, why should I put gnulib as one?

Granted, these concerns are a bit vague and subjective.

> Currently libntlm has this in its bootstrap.conf:
>
>   GNULIB_REVISION=dfb71172a46ef41f8cf8ab7ca529c1dd3097a41d
>
> and GNU make has this:
>
>   GNULIB_REVISION=stable-202307

Interesting.  This suggests the GNULIB_REVISION approach isn't the
entire solution either.

I think it is useful to record the gnulib git commit used to prepare a
tarball, and have that git commit id be part of the shipped tarball, and
stored inside the git repository.  The first use above achieve this, but
the second one doesn't (branches/tags are moving targets).

If I download the gzip tarball I can't find anywhere what gnulib commit
was used for bootstrapping.  It is quite cumbersome to verify that the
tarball didn't contain any modified gnulib code.  This is even harder
when projects INTENTIONALLY modify gnulib code compared to what's in
gnulib git, which coreutils and several others projects does through
gnulib *.diff/*.patch files.

Ultimately, I think there is an important use-case to build projects
directly from source code without having tarballs with pre-generated
files that are not reproduced by the user.

> The differences between both approaches are:
>
>   - GNULIB_REVISION works only with the 'bootstrap' program. The submodules
>     approach works also without 'bootstrap'.

What use case are you thinking of?  The gnulib git commit information
consumers that I can think of are gnulib-aware.

>   - For GNULIB_REVISION, the user is on their own regarding tooling, aside
>     from 'bootstrap'. In the submodules approach, the 'git' suite provides
>     the tooling, and many developers are familiar with it.

Yes, but developers also like flexibility, and in some situations I
think the git approach is not the best way of working.

>   - .tar.gz files created by the gitweb "snapshot" link, by the cgit "refs >
>     Download" section, or the GitHub "Download ZIP" button contain an empty
>     directory in place of the submodule, and no information about the 
> revision.
>     Whereas they contain the file with the GNULIB_REVISION assignment.

Indeed, this was the main challenge for me.  That is critical
information for anyone who wants to avoid touching tarballs with
pre-generated content.

>> I should write a post to debian-devel describing this pattern on
>> how to use gnulib in Debian packages
>
> It feels wrong to me if, in order to get meta-information about required
> dependencies of a package, Debian tools grep a particular file for a specific
> string. This approach is simply too limited.

Meta-information about dependencies are normally always hand-curated in
Debian (the Build-Depends: header).  The simplest solution is for the
Debian package maintainer to figure out which gnulib git commit version
was used for a release and pin that manually in the debian/rules
makefile.  If the information is available in bootstrap.conf via
GNULIB_REVISION that saves time.  I believe this is conceptually the
same thing as pinning version information for any other dependency.

> The correct way, IMO, would be that 'git' provides this meta-information,
> either embedded in the .tar.gz generated by the web tooling, or in a
> separate .tar.gz. AFAICT, 'git' currently does not have this ability.
> Therefore we need to approach the 'git' team, in order to find a solution
> that scales across the whole set of software package — not specific to
> gnulib and not specific to 'bootstrap'.

Yes I was quite disappointed when I realized that 'git archive' doesn't
record the git submodule git commit anywhere.

Couldn't the .gitmodules file be extended to allow specifying the git
commit of the submodule?

I think GitLab/GitHub/etc use 'git archive' under the hood, so we could
ask that the .gitmodules file is extended to hold the commit (just like
it holds branch name now).

Alternatively, get 'git archive' to somehow record the submodule commit
in some other way.

I suppose we could recommend a practice for gnulib users that use gnulib
git submodules to put this in .gitmodules:

    # GNULIB_REVISION=dfb71172a46ef41f8cf8ab7ca529c1dd3097a41d

Then this will be part of the 'git archive' output (I think?) and we
would have to also recommend to 'EXTRA_DIST += .gitmodules' so this
information is included in the tarballs.

However.  I want to deploy a solution that works now while we wait for
git to add this feature (or not).

I think we also may find that requiring 'git' for building packages
cause a cyclic dependency for boostrap people.  Requiring 'tar' is okay
because 'tar' has very few other pre-dependencies.  I don't know how to
solve this problem though: some way to ship all gnulib git commits in a
compact way that can be extracted easily without complex tools are
necessary.  Maybe a minimal git like 'bootstrap-git' could be written
that just supports cloning from a git bundle and nothing else.

/Simon


> Bruno
>
> [1] 
> https://stackoverflow.com/questions/1777854/how-can-i-specify-a-branch-tag-when-adding-a-git-submodule
> [2] https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=tree
> [3] https://git.savannah.gnu.org/cgit/coreutils.git/tree/
>
>
>
>
>

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]