findutils-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Add sort option to find


From: Diego Ongaro
Subject: Re: Add sort option to find
Date: Fri, 4 Sep 2020 18:41:21 -0700

On Wed, Aug 26, 2020 at 1:13 AM Bernhard Voelker
<mail@bernhard-voelker.de> wrote:
>
>   - The elapsed time for 'find -s' is by a small factor larger than
>     that of the classic 'find | sort' method.

Hi all,

As promised, I ran a more comprehensive set of benchmarks on `find -s`.

First, I've attached two more patches, stacking on my earlier set:
 - 0004-find-Set-FTS_DEFER_STAT-flag-for-sort-performance.patch fixes a
   performance issue introduced with `find -s` regarding the FTS_DEFER_STAT
   flag.
 - 0005-find-Handle-possible-errors-from-strcoll.patch handles the potential
   clobbering of errno and possible strcoll failures that Berny pointed out.
   Both patches were applied before running the benchmarks.

Here's the benchmark setup:

Modes:
 - 'unsorted' is `sudo find | wc -l`.
 - 'softflag' is `sudo find -s | wc -l`.
 - 'softcmd' is `sudo find | sort | wc -l`.

Trees:
 - 'full' is `find / -xdev', including root and home but not mnt, 1.068M files.
 - 'root' is `find / -xdev -path /home -prune -o -print', 277K files.
 - 'home' is `find /home', 791K files.
 - 'mnt' is `find /mnt/testdir` with sequentially named files on a tmpfs from
   Berny's benchmark, 8.000M files.

root and home are on the same ext4 filesystem atop a LUKS2 encrypted partition.

Caches:
 - 'warm' means a normal find run was executed before the tests.
 - 'cold' means `sync; echo 3 | sudo tee /proc/sys/vm/drop_caches` was executed
   before each test run.

The LC_ALL variable was varied between C (behaves like strcmp) and en_US.UTF-8
(requires full strcoll).

The system is about one month old with a clean OS install. My home directory
includes a folder named "archive" with 243K files that I've copied in using
rsync, containing historical files of any value. The OS is Debian 10 with
backports enabled running Linux 5.6.14. The machine is a modern desktop with an
AMD Ryzen 7 3700X CPU, 32 GB of RAM, and a Crucial P1 1TB NVMe drive

The benchmark ran with the attached `bench.py`, which produced the attached
`results.csv`. The CSV file was processed with the attached `process.py` to
produce the attached `results.png` and the table below. I don't expect for
these Python scripts to be included in the upstream project under the GPL, but
please consider them released into the public domain.

                                  elapsed  ru_utime  ru_stime  ru_maxrss_mb
caches LC_ALL      tree mode
cold   C           full unsorted    23.29      0.80      5.14          9.44
                        sortcmd     23.46      1.09      5.29         16.81
                        sortflag    18.71      0.89      5.15          9.42
                   home unsorted    18.11      0.62      4.19          9.40
                        sortcmd     18.33      0.88      4.29         15.59
                        sortflag    15.24      0.77      4.09          9.41
                   mnt  unsorted     2.83      1.26      1.88         28.59
                        sortcmd      4.50      2.82      2.15         28.55
                        sortflag     5.10      3.02      2.36       2199.64
                   root unsorted     4.27      0.19      0.98          8.73
                        sortcmd      4.34      0.25      0.99         12.87
                        sortflag     3.63      0.23      0.97          8.74
       en_US.UTF-8 full unsorted    22.29      0.73      5.22         10.00
                        sortcmd     32.05     10.71      5.29         16.32
                        sortflag    19.26      1.40      5.12          9.85
                   home unsorted    18.13      0.64      4.25          9.91
                        sortcmd     26.45      9.07      4.32         16.34
                        sortflag    15.55      1.05      4.13         10.02
                   mnt  unsorted     2.81      1.28      1.84         28.92
                        sortcmd     20.94     19.35      2.20         28.93
                        sortflag     8.78      6.71      2.36       2199.99
                   root unsorted     4.31      0.28      0.96          8.76
                        sortcmd      5.79      1.69      1.01         13.36
                        sortflag     3.95      0.42      1.04          8.77
warm   C           full unsorted     1.06      0.32      0.89          9.50
                        sortcmd      1.45      0.66      0.98         15.55
                        sortflag     1.15      0.48      0.82          9.45
                   home unsorted     0.85      0.28      0.69          9.51
                        sortcmd      1.17      0.55      0.77         16.95
                        sortflag     0.90      0.32      0.69          9.41
                   mnt  unsorted     2.79      1.26      1.86         28.52
                        sortcmd      4.49      2.82      2.17         28.57
                        sortflag     5.05      3.06      2.29       2199.62
                   root unsorted     0.23      0.09      0.16          8.36
                        sortcmd      0.32      0.16      0.19         13.62
                        sortflag     0.25      0.13      0.15          8.37
       en_US.UTF-8 full unsorted     1.07      0.38      0.83         10.03
                        sortcmd     11.04     10.34      1.01         16.08
                        sortflag     1.42      0.73      0.84          9.92
                   home unsorted     0.85      0.28      0.68          9.89
                        sortcmd      9.36      8.80      0.82         15.93
                        sortflag     1.09      0.45      0.76          9.98
                   mnt  unsorted     2.78      1.24      1.86         29.01
                        sortcmd     21.01     19.48      2.16         28.97
                        sortflag     8.69      6.69      2.31       2200.03
                   root unsorted     0.27      0.13      0.17          8.41
                        sortcmd      1.74      1.59      0.20         13.28
                        sortflag     0.38      0.23      0.18          8.42


Memory usage:

As Berny's simpler benchmark demonstrated, sortflag requires 2.2 GB of RAM to
sort the mnt tree. It does not require any more memory than sortcmd on the
other trees.


Locales:

sortcmd performance depends largely on the locale. In the C locale, it uses
memcmp (__memcmp_avx2), but in the en_US.UTF-8 locale, it must actually do the
work of strcoll. This slows the sort command down by a large factor (about 5x)
for most of the CPU-bound scenarios (warm cache or tmpfs). The effect is
smaller for the root tree, probably because the path lengths are shorter.

sortflag seems less affected by the locale. I suspect this is because the
number of comparisons it needs to do is lower and the number of bytes it
compares for each is lower (name vs path), as explored in my prior email.

So, one takeaway from this is that, if you don't mind the C locale's sort
order, you should probably set LC_COLLATE=C on your machines.


Speed:

For the mnt tree, sortflag performs slightly slower than sortcmd in the C
locale but drastically faster in the en_US.UTF-8 locale.

For the full, root, and home trees, sortflag performs modestly faster than
sortcmd in the C locale and drastically faster than sortcmd in the en_US.UTF-8
locale.


Can sorting improve performance?:

In all the scenarios that actually do I/O (cold cache and not tmpfs), sortflag
outperforms unsorted. This was surprising.

To investigate, I ran an additional mode ("revsortflag") where I simply
reversed the result of strcoll in `find -s`, so the find would traverse the
directory structure in descending order. For the home tree with a cold cache in
the C locale, revsortflag took about 19 seconds, compared to 18 for unsorted
and 15 for ascending sortflag. I ran the same under blktrace and plotted a
density plot of the block accesses, attached as `blk.png`.

This may be devolving into palm-reading, but it looks to me like the sortflag
mode gets really nice locality of access for the "/"-shaped part of the run.
The revsortflag mode is nearly the reverse image of the sortflag, as I'd
expect, but it's widened (slower).

So it seems that the order of the I/O actually matters, even on a modern NVMe
drive, and that ascending sort by name helps on my system. I don't know how
much of this is an effect of my system being relatively new or the exact tools
I used to create files on it.


Overall:

In summary, the performance of `find -s` seems pretty good. Except in the
presence of extremely large directories (with over 1M direct children), I'd
recommend `find -s` over `find | sort`, and I doubt many people would notice a
performance penalty with `find -s` compared to plain `find`.


-Diego

Attachment: 0004-find-Set-FTS_DEFER_STAT-flag-for-sort-performance.patch
Description: Text Data

Attachment: 0005-find-Handle-possible-errors-from-strcoll.patch
Description: Text Data

Attachment: bench.py
Description: Text Data

Attachment: process.py
Description: Text Data

Attachment: results.csv
Description: Text Data

Attachment: results.png
Description: PNG image

Attachment: blk.png
Description: PNG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]