lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Creating a chroot for cross-building lmi


From: Vadim Zeitlin
Subject: Re: [lmi] Creating a chroot for cross-building lmi
Date: Sun, 17 May 2020 18:34:01 +0200

On Sat, 16 May 2020 22:03:21 +0000 Greg Chicares <address@hidden> wrote:

GC> I thought it would be interesting to measure the speed of both.

 Thanks, this is definitely interesting. I'm trying to become more modern
and switch to using Docker instead of chroots, but for now I still create
them quite often, so it's useful to see what's the best way to do it.

GC> Details:
GC>  - I chose 'sid' because I know I don't have it cached anywhere.
GC>  - I specified '--variant=minbase' because it should be smallest (and
GC>      who knows, it might be as good for building lmi as the default).
GC>  - I added '--include=zsh' because that seems better than running a
GC>      separate step to fetch zsh (etc.) later.
GC> 
GC> *** The two-step tarball approach
GC> 
GC> time debootstrap --arch=amd64 --make-tarball=/var/cache/sid_bootstrap.tar 
--variant=minbase --include=zsh sid /tmp/eraseme
GC> 
GC> mkdir -p /tmp/sid-chroot
GC> time debootstrap --arch=amd64 --unpack-tarball=/var/cache/sid_bootstrap.tar 
--variant=minbase --include=zsh sid /tmp/sid-chroot
GC> 
GC> 1:11.31 total --make-tarball
GC> 45.225 total  --unpack-tarball
GC> 
GC> rm -rf /tmp/sid-chroot
GC> mkdir -p /tmp/sid-chroot
GC> [repeat --unpack-tarball command: same speed as above]
GC> 44.692 total
GC> 
GC> BTW, disk usage for tarball "cache":
GC> 
GC> du -sb /var/cache/sid_bootstrap.tar
GC> 48465802        /var/cache/sid_bootstrap.tar
GC> 
GC> *** A single-step approach, with --cache-dir
GC> 
GC> mkdir /tmp/sidcache
GC> time debootstrap --arch=amd64 --cache-dir=/tmp/sidcache --variant=minbase 
--include=zsh sid /tmp/sid-chroot http://deb.debian.org/debian/
GC> 
GC> 1:53.07 total
GC> 
GC> repeat the debootstrap command:
GC> 1:05.26 total
GC> 
GC> BTW, disk usage for '--cache-dir' (and the chroot):
GC> 
GC> du -sb /tmp/sidcache
GC> 37383120        /tmp/sidcache
GC> du -sb /tmp/sid-chroot
GC> 218032473       /tmp/sid-chroot

 I was initialize surprised that this took so much less than
/var/cache/sid_bootstrap.tar, but then I realized that this was na
uncompressed tarball (whereas package archives in /tmp/sidcache are
compressed). It looks like compressing the ~46MiB tarball should bring it
even below the ~35MiB taken by this directory, so it should be more disk
efficient if you care about this.

GC> *** A single-step approach, with no caching:
GC> 
GC> As in the last example above, but without '--cache-dir':
GC> 
GC> 1:52.12 total
GC> repeating:
GC> rm -rf /tmp/sid-chroot
GC> 1:52.75 total
GC> 
GC> ...so '--cache-dir' matters greatly.

 It certainly helps, but I find that creating the new chroot without the
cache in 2 minutes is already pretty fast. I.e. use of the cache reduces
the time by almost 50%, but it's still less than 50s, so it's not a huge
deal.

GC> *** Summary
GC> 
GC> I recreate a tarball or --cache-dir cache rarely, but use it often,
GC> so only the final step (after caching) is important.
GC> 
GC>    44.692 total --unpack-tarball
GC>   1:05.26 total --cache-dir
GC> 
GC> It looks like '--unpack-tarball' wins, with 150% the speed of '--cache-dir'.
GC> However, a tarball is write-once-read-often, whereas a cache directory
GC> should be updated whenever needed. That shifts the balance: the true cost
GC> of the tarball approach is increased by having to run 'apt-get dist-upgrade'
GC> after unpacking (and retrieving all updates since the --make-tarball step),
GC> but the single-step '--cache-dir' approach shouldn't retrieve the same file
GC> twice.

 Yes, using the cache dir should be faster, which intuitively makes sense
because it should also cost more in disk space (and progressively more and
more, although you could clean up the old versions of the packages, if you
really cared about reducing its footprint).

GC> Vadim--Given that /var/cache/apt/archives/ and the above /tmp/sidcache/ are
GC> just collections of '.deb' files, is there any good reason to keep them
GC> separate?

 I must admit I'm puzzled by this question. I'd like to turn it around:
what good reasons do you have to merge them?

GC> It would seem simpler to throw all those '.deb' files into a
GC> single location. Certainly apt already knows which of these to use for
GC> my 'buster' base system:
GC>   /var/cache/apt/archives/base-files_10.3+deb10u1_amd64.deb
GC>   /var/cache/apt/archives/base-files_10.3+deb10u2_amd64.deb
GC>   /var/cache/apt/archives/base-files_10.3+deb10u3_amd64.deb
GC>   /var/cache/apt/archives/base-files_10.3+deb10u4_amd64.deb
GC> so would it do any harm if I moved this 'bullseye' file:
GC>   /var/cache/bullseye/base-files_11_amd64.deb
GC> into the same directory?

 I don't think so, there should never be any conflict between the package
names from different distributions.

 But I still don't know why would you want to mix and match them like this,
instead of using separate directories for them. I admit I don't have any
good explanation for my gut feeling of wrongness of doing it, the only
thing I can think of is that if you run "apt clean" in either the main
system or chroot this will remove all files, not just the ones for the
current system, but I guess you could just never run "apt clean"...

GC> I think I've stumbled upon an even tidier approach, above: just bind-mount
GC> the chroot's /var/cache/apt/archives/ to some directory in the host system

 Yes, this is definitely a nice hack (pretty similar to Docker volumes
idea, in fact).

GC> (something like /var/cache/debian-testing/ does work,

 ... And would be the perfectly logical choice for me.

GC> and above I speculate that the host's own /var/cache/apt/archives/
GC> might also work).

 It might but, again, why?


On Sun, 17 May 2020 15:40:04 +0000 Greg Chicares <address@hidden> wrote:

GC> '--variant=minbase' does have some benefit: for example, it
GC> excludes cron, fdisk, and nano, none of which are really
GC> needed in a chroot.

 This is nice, but AFAICS it excludes all the packages with the priority of
"important", which also includes things like less and readline, that I
definitely want to have on any interactive system, and also more or less
everything network-related (iproute2, iptables, netbase), but also procps
and udev which might be expected to exist by a number of programs. I'm not
sure I like the odds of having to track down some weird problems in the
future just because some of the almost-always-available system components
turns out not to be there.

GC> Incidentally:
GC>   time some_command 2>&1 |less
GC> gives only the total time, including time spent in 'less', so
GC> it's not very useful;

 I don't know why, but I don't see this here:

        % time echo foo | less
        foo
        echo foo  0.00s user 0.01s system 78% cpu 0.009 total
        less  0.00s user 0.00s system 9% cpu 0.043 total
        % time sleep 1 | less
        sleep 1  0.00s user 0.01s system 0% cpu 1.028 total
        less  0.00s user 0.00s system 0% cpu 1.026 total

i.e. I do get time for both commands.

GC> but with
GC>   time some_command 2>&1 |tee log_file |less -S
GC> zsh prints a line with 'time' output for the command only.

 And this gives me

        % time echo foo | tee bar | less -S
        foo
        echo foo  0.00s user 0.01s system 59% cpu 0.012 total
        tee bar  0.00s user 0.00s system 9% cpu 0.053 total
        less -S  0.00s user 0.00s system 8% cpu 0.046 total

 This is with both zsh 5.7.1 (Buster) and 5.8 (Sid).

GC> I'm really glad I've stumbled across that.

 If I really wanted to measure just the time of a single pipeline
component, I'd do this:

        % {time sleep 1}|less
        sleep 1  0.00s user 0.01s system 0% cpu 1.015 total

 Regards,
VZ

Attachment: pgpE7zJldO9so.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]