listhelper-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Uncompressed mbox format storage


From: wrotycz
Subject: Uncompressed mbox format storage
Date: Wed, 11 Dec 2024 21:54:44 +0100
User-agent: GWP-Draft

Just wanted to point out that for search option it does not matter whether these files are or are not compressed.

Here is a test on rather old hardware and on HDD where search greps query 'matrix' and the corresponding times of execution.
Yes, hardware is old but it is deliberate to show difference, or lack of it, for the given task.
Also `grep -i' was used as that is what search does - it clearly ignores the letters case.
Tools used are grep, zgrep = zutils/zgrep, gzgrep = gzip/zgrep.

~~~
## unbuffered $ grep -i

$ sync; echo 3 > /proc/sys/vm/drop_caches
$ time grep -i matrix bug-gnuzilla/* > /dev/null

real    0m8.178s
user    0m6.477s
sys     0m0.250s

$ sync; echo 3 > /proc/sys/vm/drop_caches
$ time ./zgrep -i matrix bug-gnuzilla-gz/*.gz > /dev/null

real    0m8.882s
user    0m7.722s
sys     0m1.140s

$ sync; echo 3 > /proc/sys/vm/drop_caches
$ time ./gzgrep -i matrix bug-gnuzilla-gz/*.gz > /dev/null

real    0m9.196s
user    0m8.021s
sys     0m1.372s

## buffered $ grep -i

$ time grep -i matrix bug-gnuzilla/* > /dev/null

real    0m8.077s
user    0m6.480s
sys     0m0.056s

$ time ./zgrep -i matrix bug-gnuzilla-gz/*.gz > /dev/null

real    0m8.151s
user    0m7.692s
sys     0m0.883s

$ time ./gzgrep -i matrix bug-gnuzilla-gz/*.gz > /dev/null

real    0m8.477s
user    0m7.964s
sys     0m1.083s

~~~

As can be seen z/grep -i does not make a difference between uncompressed and gz-compressed searches.
It is because decompression is faster than this search and compressed data wait in pipe anyway.
In case of first time use (unbuffered) using compressed files can even beneficial as there is less to (slow) read and decompression is many times faster anyway, so the bottleneck is in read that can be alleviated by less (compressed) data to read.
In case of SSD drive the difference goes towards buffered case, but overall difference is marginal.

For reference 'same' buffered grep in tmpfs:

~~~
$ time grep -i matrix bug-gnuzilla/* > /dev/null

real    0m6.607s
user    0m6.565s
sys     0m0.044s

$ time ./zgrep matrix bug-gnuzilla-gz/*.gz > /dev/null

real    0m1.977s
user    0m1.304s
sys     0m0.809s

$ time ./gzgrep matrix bug-gnuzilla-gz/*.gz > /dev/null

real    0m2.389s
user    0m1.533s
sys     0m1.064s

~~~

This shows that using compressed files can be faster, even much faster than uncompressed.


--

Last comparison of zgrep between compressors.

~~~
$ time ./zgrep -i matrix bug-gnuzilla/* > /dev/null
     66     794    6253

real    0m8.017s
user    0m7.046s
sys     0m0.907s

$ time ./zgrep -i matrix bug-gnuzilla-gz/*.gz > /dev/null
# gzip VmPeak:     2516 kB

real    0m8.115s
user    0m7.709s
sys     0m0.853s

$ time ./zgrep -i matrix bug-gnuzilla-bz2/*.bz2 > /dev/null
# bzip2 VmPeak:     5596 kB

real    0m9.654s
user    0m10.654s
sys     0m1.009s

$ time ./zgrep -i matrix bug-gnuzilla-lz/*.lz > /dev/null
# lzip VmPeak:     4224 kB - 7204 kB

real    0m9.355s
user    0m9.215s
sys     0m1.005s

~~~

The reasons I uses gzip as show case are these: it is decent compressor, good enough for the job, uses little memory, the least of all three, and is fastest of all those.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]