coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] maint: RFC: add lzip tarballs


From: Eric Blake
Subject: Re: [PATCH] maint: RFC: add lzip tarballs
Date: Wed, 11 Jan 2017 19:13:43 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

On 01/11/2017 06:29 PM, Assaf Gordon wrote:
> Hello Eric and all,
> 
>> On Wed, Jan 11, 2017 at 3:21 PM, Eric Blake <address@hidden> wrote:
>>> there are strong arguments for including .lzip
>>> distributions, either in addition or in place of .xz:
> 
> I would ask at least to keep XZ and not switch solely to lzip.

I certainly agree with this point - it is too early to make lzip the
sole distribution format.  The question is whether shipping BOTH xz and
lzip, and letting people choose which format they prefer, is worth the
extra effort (which, as shown by this patch is not that much - a
one-line addition to automake's options, and the possible installation
of lzip on the maintainer's machines if it was not already there).  I
know there was a window of time where we were shipping .gz, .bz2, and
.xz, while waiting for distros to catch up; now that .xz is a lot more
widely supported, we were able to easily justify dropping .bz2 due to
clear differences in levels of compression, and effort required to
uncompress.  But xz and lzip are much closer in levels of compression,
making that less of a clear winner.

I don't know if there is any (easy) way to count how many downloads of
.gz vs. .xz happen, to get a feel for what percentages of the
consumption prefers a particular format; if such metrics exist it would
also let us track how popular .lzip turns out to be during a trial of
running both xz and lzip tarballs in parallel.  But most likely that's a
pipe dream, as GNU encourages the use of mirrors for getting tarballs,
making it harder to centrally track what got downloaded where.

> 
> While I am in no position to evaluate lzip benchmark/robustness/format claims,

I'm also in this boat.  I merely proposed the patch as an RFC to start
the discussion, so I appreciate the points being made.

> I do have some concerns about the lzip program:
> 
> First,
> It is written in C++. Not a problem by itself, but seems a bit at odds as a 
> requirement for system-level package like coreutils.

coreutils depends on gperf, which is written in C++.  Then again, the
dependency on gperf is at maintainer time (it does not have to be
present on the tarball user's machine).  Having lzip as the ONLY
distribution format is a very strong burden on ALL downstream users to
have lzip installed (unlike the gperf case); but having lzip and xz in
parallel means that users that can't build lzip can use the xz tarball.
So that strengthens my claim that this patch (if taken) is additive, and
not replacement, in nature.

> 
> Second,
> I'm not sure how portable and well-tested the program is on the large number 
> of platforms that coreutils aim to cater to.
> Being a C++ program, I'm not even sure if all these system could easily build 
> it or provide it as package.

That's certainly a valid point against a sole distribution format, but
doesn't rule it out as a parallel distribution format.

> 
> Third,
> I'm a bit wary of the closed development model: there is no public git 
> repository, only published tarballs, and not clear how active the development 
> or the community are.

For years, GNU bash had a very closed development model. Only recently
has the bash maintainer started posting weekly snapshots via a git
repository (by no means as fine-grained as most git projects are used to
having), so that is not necessarily a showstopper, but it does bear
consideration.  And yes, I concur that it is harder to work with
software that is harder to clone and tweak.  My RFC patch proposal even
highlighted the fact that lzip has a CVS repository, but not a git
repository, at savannah - I really wanted to point to a git repo but
could not quickly find one.

> 
> Lastly,
> I think the test suite is a bit lacking, especially compared to all the 
> claims about recovery and robustness of the lzip format.

I have not made any personal investigations on this front.

And everything we require of lzip should also be applied to any
consideration of whether to use zstd (with the additional hurdle that
automake does not yet have a 'dist-zstd' option), since that is another
up-and-coming compression format that may or may not have a win in
(de)compression speeds and size.

> 
> ---
> 
> I'm not saying 'xz' is perfect or that it answers all the above issues. But 
> it has a "community buy-in" which can't be denied compared to lzip.

Let's stop and consider how much of the community buy-in is a
side-effect of coreutils being one of the early adopters of dist-xz?
From personal experience, the only reason Cygwin started considering the
inclusion of xz in the distro years ago was because the coreutils
tarball came in xz; and now Cygwin uses xz for all of its distribution
files (it used to use bz2).  Then again, lessons learned from Cygwin's
switch from bzip2 to xz will help ease any future transition from xz to
(lzip/zstd/compression-of-the-day), if such a future switch is
warranted.  But at the same time, it can take years to prove whether a
new format has enough going for it to make it a primary format.

That said, if coreutils starts shipping lzip packages, wouldn't that
alone be a way to kickstart some more activity on the lzip front?

> If coreutils switches, I think it should switch to something that is provably 
> superior not only in benchmark/robustness.

I wrote this email based on an IRC conversation with Matias (selk),
mainly because I wanted the discussion archived for public consumption,
and not something done in private with just me.  At this point, I would
really love for selk and/or Antonia to chime in with the arguments I am
unable to provide.  Private conversations are not the way to instigate
change; and even if a public conversation doesn't change the status quo,
hopefully it at least raises some talking points and ideas for future
improvements.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]