bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] tar -S with --use-compress-prog=pbzip2 does not keep spars


From: Chris Mitchell
Subject: Re: [Bug-tar] tar -S with --use-compress-prog=pbzip2 does not keep sparseness
Date: Fri, 6 Sep 2019 15:41:42 -0300

On Thu, 5 Sep 2019 14:42:07 -0400
Nathan Stratton Treadway <address@hidden> wrote:

> On Wed, Sep 04, 2019 at 13:48:33 -0300, Chris Mitchell wrote:
> 
> I don't have any specific answer to your question, but here are some
> general comments on the topic:

As it turns out, I think the specific answer to my question is "user
error". I spent a lot of this morning trying to recreate the behaviour
(on the same system!), but with every variation, tar stubbornly kept
preserving and restoring the sparse file exactly the way it's supposed
to.

> * The tar info page (e.g. 
>     https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html#SEC137
>   section "8.1.2 Archiving Sparse Files") explains that the "-S"
> option is only meaningful on archive creation (or update), not on
> extraction. <...>

That certainly makes life easier! 

> * In general, tar creates the archive "file" first, and then pipes the
>   contents of the archive to the compression program -- so that
>   sparse-file-detection step should happen before the compression
> program is involved in any way.  

Yes, that certainly sounds like what one would expect to happen, and
apparently it is indeed happening the way it should.

> It might be helpful to post the exact commands  you used to test these
> various scenarios, etc.

Just for the sake of thoroughness, I'll post the results I got.

First, in order to closely reproduce what I was working with before,
here's a freshly created .qcow2 volume with a 20 GiB quota and a
minimal Debian install in it:
$ ls -ls --block-size=1
1912025088 -rw------- 1 chris chris 21478375424 Sep  6 10:05
  testvol.qcow2

Then I make some compressed tar archives:
$ tar --use-compress-prog=pbzip2 -cSf pbz2archive.tar.bz2 testvol.qcow2
$ tar -cS testvol.qcow2 | pbzip2 -c > pipearchive.tar.bz2
$ tar -cSjf bz2archive.tar.bz2 testvol.qcow2
$ ls -ls --block-size=1 *.tar.*
514617344 -rw-r--r-- 1 chris chris 514615245 Sep  6 14:18
  bz2archive.tar.bz2 
518164480 -rw-r--r-- 1 chris chris 518163660 Sep  6 14:07
  pbz2archive.tar.bz2 
518164480 -rw-r--r-- 1 chris chris 518163660 Sep  6 14:12
  pipearchive.tar.bz2

The bzip2 archive being a little smaller than the two pbzip2 archives
is unsurprising. The pbzip manpage explains that it stores the data in
chunks in the file, which bzip doesn't, so there's a little overhead.
The fact that the two pbzip variants are *exactly* the same size was my
first hint that I was wrong. Looking closer:
$ if cmp pbz2archive.tar.bz2 pipearchive.tar.bz2; then echo "Yup,
  they're identical!"; fi 
Yup, they're identical!

That looks pretty conclusive to me. But, just for completeness' sake:
$ pbzip2 -dk pbz2archive.tar.bz2
$ pbzip2 -dk pipearchive.tar.bz2
$ bzip2 -dk bz2archive.tar.bz2
$ ls -ls --block-size=1 *.tar
1912057856 -rw-r--r-- 1 chris chris 1912053760 Sep  6 14:18
  bz2archive.tar
1912057856 -rw-r--r-- 1 chris chris 1912053760 Sep  6 14:07
  pbz2archive.tar
1912057856 -rw-r--r-- 1 chris chris 1912053760 Sep  6 14:12
  pipearchive.tar
$ if cmp bz2archive.tar pbz2archive.tar; then echo "Yup, they're
  identical!"; fi
Yup, they're identical!
$ if cmp bz2archive.tar pipearchive.tar; then echo "Yup, they're
  identical!"; fi
Yup, they're identical!
$ if cmp pbz2archive.tar pipearchive.tar; then echo "Yup, they're
  identical!"; fi
Yup, they're identical!
$ tar -xf bz2archive.tar -C bzip2/
$ tar -xf pbz2archive.tar -C pbzip2/
$ tar -xf pipearchive.tar -C piped/
$ ls -ls --block-size=1 bzip2/* pbzip2/* piped/*
1912025088 -rw------- 1 chris chris 21478375424 Sep  6 10:05
  bzip2/testvol.qcow2 
1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05
  pbzip2/testvol.qcow2 
1912025088 -rw------- 1 chris chris 21478375424 Sep  6 10:05
  piped/testvol.qcow2
$ if cmp bzip2/testvol.qcow2 pbzip2/testvol.qcow2; then echo "Yup,
  they're identical!"; fi
Yup, they're identical!
$ if cmp bzip2/testvol.qcow2 piped/testvol.qcow2; then echo "Yup,
  they're identical!"; fi
Yup, they're identical!

And, if I do the decompress and untar in one step:
$ rm bzip2/* pbzip2/* piped/*
$ tar --use-compress-program=pbzip2 -xf pbz2archive.tar.bz2 -C pbzip2/
$ pbzip2 -dc pipearchive.tar.bz2 | tar x -C piped/
$ tar -xjf bz2archive.tar.bz2 -C bzip2/
$ ls -ls --block-size=1 bzip2/* pbzip2/* piped/*
$ ls -ls --block-size=1 bzip2/* pbzip2/* piped/*
1912025088 -rw------- 1 chris chris 21478375424 Sep  6 10:05
  bzip2/testvol.qcow2 
1912025088 -rw------- 1 chris chris 21478375424 Sep  6 10:05
  pbzip2/testvol.qcow2 
1912025088 -rw------- 1 chris chris 21478375424 Sep  6 10:05
  piped/testvol.qcow2
$ if cmp bzip2/testvol.qcow2 pbzip2/testvol.qcow2; then echo "Yup,
  they're identical!"; fi
Yup, they're identical!
$ if cmp bzip2/testvol.qcow2 piped/testvol.qcow2; then echo "Yup,
  they're identical!"; fi
Yup, they're identical!
$ if cmp pbzip2/testvol.qcow2 piped/testvol.qcow2; then echo "Yup,
  they're identical!"; fi
Yup, they're identical!

...so, at this point I'd say I have quite definitively disproved my own
thesis, and the only mystery is whether I grabbed a fully-allocated
.qcow volume thinking I had grabbed a sparse one, or missed a `-S`
somewhere the last time around.

Consider my bug report withdrawn, and sorry to waste everyone's time.

> p.s. I have found that "ls -sl --block-size=1" is a handy way to see
> in one command whether a file is sparse or not:
>   $ ls -sl --block-size=1 temp.sparse
>   4096 -rw-r--r-- 1 root root 16896 Mar 30  2014 temp.sparse

Thanks for the tip!

Cheers!
-- 
Chris Mitchell [they/them/their]

Say hi on Matrix chat!  My handle is @radine:matrix.org
      and you can get the app at https://riot.im



reply via email to

[Prev in Thread] Current Thread [Next in Thread]