bug-libsigsegv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libsigsegv] [PATCH] Optimize gnulib checkout.


From: Eric Blake
Subject: Re: [bug-libsigsegv] [PATCH] Optimize gnulib checkout.
Date: Tue, 16 Nov 2010 07:50:05 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101103 Fedora/1.0-0.33.b2pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.6

On 11/16/2010 01:52 AM, Bruno Haible wrote:
> However, can you explain this part of "man git-clone" to me?
> 
>            NOTE: this is a possibly dangerous operation; do not use it unless
>            you understand what it does. If you clone your repository using
>            this option and then delete branches (or use any other git command
>            that makes any existing commit unreferenced) in the source
>            repository, some objects may become unreferenced (or dangling).

> 
> I can understand the NOTE for --shared. But what is the danger with
> --reference? The cloned repository contains no symlinks, no hardlinks, and
> no mention of the original referenced repository except in
> .git/objects/info/alternates. So what can go wrong when using this option?

Exactly what the man page said - the clone has fewer objects, because it
is dereferencing .git/objects/info/alternates to find the bulk of its
objects.  If you then delete the original reference repository, the
clone will be incomplete.

In practice, this is not a problem - if you set up a reference
repository that faithfully tracks upstream, upstream never rewinds, and
you never delete your reference repository, then you cannot ever get
into the situation that 'git clone --help' was warning about.

Where you can get into problems is if you set up a reference repository
that tracks something that rewinds regularly (for example, the pu branch
of git.git); then, your clone might have a commit abc that descended
from commit def, but def was in the reference repository at the time.
Then upstream rewinds the branch, and object def is no longer necessary.
 If you then repack your reference repository to do garbage collection
of unreferenced objects, the reference repository has no idea that
commit abc in the clone is referencing def in the reference, so it
deletes def.  At which point commit abc in the clone is broken, because
it did not save def locally.  But again, this can only happen if gnulib
were to ever rewind commits.

> 
> And should it use an environment variable GNULIB_SRCDIR?

Because gnulib/build-aux/bootstrap uses it for the same reason.  That
is, it is well-defined among multiple GNU projects that this is the
environment variable to set for a reference repository.  And I already
have 'export GNULIB_SRCDIR=$HOME/gnulib' in my ~/.bashrc, so it worked
with minimal effort for me.

>   - For the purpose of libsigsegv alone, the use of the --reference option
>     is only needed the first time a user runs autogen.sh. In the subsequent
>     runs, a gnulib subdirectory is already present.
>   - The variable GNULIB_SRCDIR is a misnomer, as it makes people think that
>     the sources that it contains would matter. No, it's only the contents
>     of the .git/objects subdirectory that matter, and in a semantically
>     irrelevant way.

It's quite relevant - it's the reference sources that make it so that
libsigsegv/gnulib/.git/objects can contain MUCH less disk space, because
it sets up libsigsegv/gnulib/.git/objects/info/alternates to point to
$GNULIB_SRCDIR.

>     As a reminder, gnulib's 'bootstrap' script only handles 2 out of 5
>     reasonable use-cases, and the confusion about GNULIB_SRCDIR is part of
>     the problem.
>     <http://lists.gnu.org/archive/html/bug-gnulib/2010-03/msg00105.html>
>     <http://lists.gnu.org/archive/html/bug-gnulib/2010-03/msg00165.html>
>     
> <http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=5990ad9e6ab590626a42764bf21dba6d7a05f237>

If we clean up gnulib's bootstrap to handle more of those cases, it will
not be by overloading GNULIB_SRCDIR any further, but by adding a new
environment variable name (if an environment variable is even the
appropriate solution for those other cases that you are arguing for).
Meanwhile, I argue that bootstrap is the wrong place for upgrading a git
submodule of gnulib to the latest gnulib upstream; that should be an
independent action, explicitly invoked by the maintainer; so it makes
more sense as a make target in maint.mk or friends.

> An environment variable makes sense, however, when a single user checks
> out libsigsegv, libiconv, gettext, libunistring, and others. But since
> GNULIB_SRCDIR is such a bad name, and even GNULIB_REFERENCE or GNULIB_CHECKOUT
> would be ambiguous, we need a different name. What about GNULIB_ANYCHECKOUT
> or GNULIB_DOWNLOADCACHE?

Only if you can convince gnulib's bootstrap to settle on a better name.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]