|
From: | Diego Ongaro |
Subject: | Re: Add sort option to find |
Date: | Fri, 4 Sep 2020 18:41:21 -0700 |
On Wed, Aug 26, 2020 at 1:13 AM Bernhard Voelker <mail@bernhard-voelker.de> wrote: > > - The elapsed time for 'find -s' is by a small factor larger than > that of the classic 'find | sort' method. Hi all, As promised, I ran a more comprehensive set of benchmarks on `find -s`. First, I've attached two more patches, stacking on my earlier set: - 0004-find-Set-FTS_DEFER_STAT-flag-for-sort-performance.patch fixes a performance issue introduced with `find -s` regarding the FTS_DEFER_STAT flag. - 0005-find-Handle-possible-errors-from-strcoll.patch handles the potential clobbering of errno and possible strcoll failures that Berny pointed out. Both patches were applied before running the benchmarks. Here's the benchmark setup: Modes: - 'unsorted' is `sudo find | wc -l`. - 'softflag' is `sudo find -s | wc -l`. - 'softcmd' is `sudo find | sort | wc -l`. Trees: - 'full' is `find / -xdev', including root and home but not mnt, 1.068M files. - 'root' is `find / -xdev -path /home -prune -o -print', 277K files. - 'home' is `find /home', 791K files. - 'mnt' is `find /mnt/testdir` with sequentially named files on a tmpfs from Berny's benchmark, 8.000M files. root and home are on the same ext4 filesystem atop a LUKS2 encrypted partition. Caches: - 'warm' means a normal find run was executed before the tests. - 'cold' means `sync; echo 3 | sudo tee /proc/sys/vm/drop_caches` was executed before each test run. The LC_ALL variable was varied between C (behaves like strcmp) and en_US.UTF-8 (requires full strcoll). The system is about one month old with a clean OS install. My home directory includes a folder named "archive" with 243K files that I've copied in using rsync, containing historical files of any value. The OS is Debian 10 with backports enabled running Linux 5.6.14. The machine is a modern desktop with an AMD Ryzen 7 3700X CPU, 32 GB of RAM, and a Crucial P1 1TB NVMe drive The benchmark ran with the attached `bench.py`, which produced the attached `results.csv`. The CSV file was processed with the attached `process.py` to produce the attached `results.png` and the table below. I don't expect for these Python scripts to be included in the upstream project under the GPL, but please consider them released into the public domain. elapsed ru_utime ru_stime ru_maxrss_mb caches LC_ALL tree mode cold C full unsorted 23.29 0.80 5.14 9.44 sortcmd 23.46 1.09 5.29 16.81 sortflag 18.71 0.89 5.15 9.42 home unsorted 18.11 0.62 4.19 9.40 sortcmd 18.33 0.88 4.29 15.59 sortflag 15.24 0.77 4.09 9.41 mnt unsorted 2.83 1.26 1.88 28.59 sortcmd 4.50 2.82 2.15 28.55 sortflag 5.10 3.02 2.36 2199.64 root unsorted 4.27 0.19 0.98 8.73 sortcmd 4.34 0.25 0.99 12.87 sortflag 3.63 0.23 0.97 8.74 en_US.UTF-8 full unsorted 22.29 0.73 5.22 10.00 sortcmd 32.05 10.71 5.29 16.32 sortflag 19.26 1.40 5.12 9.85 home unsorted 18.13 0.64 4.25 9.91 sortcmd 26.45 9.07 4.32 16.34 sortflag 15.55 1.05 4.13 10.02 mnt unsorted 2.81 1.28 1.84 28.92 sortcmd 20.94 19.35 2.20 28.93 sortflag 8.78 6.71 2.36 2199.99 root unsorted 4.31 0.28 0.96 8.76 sortcmd 5.79 1.69 1.01 13.36 sortflag 3.95 0.42 1.04 8.77 warm C full unsorted 1.06 0.32 0.89 9.50 sortcmd 1.45 0.66 0.98 15.55 sortflag 1.15 0.48 0.82 9.45 home unsorted 0.85 0.28 0.69 9.51 sortcmd 1.17 0.55 0.77 16.95 sortflag 0.90 0.32 0.69 9.41 mnt unsorted 2.79 1.26 1.86 28.52 sortcmd 4.49 2.82 2.17 28.57 sortflag 5.05 3.06 2.29 2199.62 root unsorted 0.23 0.09 0.16 8.36 sortcmd 0.32 0.16 0.19 13.62 sortflag 0.25 0.13 0.15 8.37 en_US.UTF-8 full unsorted 1.07 0.38 0.83 10.03 sortcmd 11.04 10.34 1.01 16.08 sortflag 1.42 0.73 0.84 9.92 home unsorted 0.85 0.28 0.68 9.89 sortcmd 9.36 8.80 0.82 15.93 sortflag 1.09 0.45 0.76 9.98 mnt unsorted 2.78 1.24 1.86 29.01 sortcmd 21.01 19.48 2.16 28.97 sortflag 8.69 6.69 2.31 2200.03 root unsorted 0.27 0.13 0.17 8.41 sortcmd 1.74 1.59 0.20 13.28 sortflag 0.38 0.23 0.18 8.42 Memory usage: As Berny's simpler benchmark demonstrated, sortflag requires 2.2 GB of RAM to sort the mnt tree. It does not require any more memory than sortcmd on the other trees. Locales: sortcmd performance depends largely on the locale. In the C locale, it uses memcmp (__memcmp_avx2), but in the en_US.UTF-8 locale, it must actually do the work of strcoll. This slows the sort command down by a large factor (about 5x) for most of the CPU-bound scenarios (warm cache or tmpfs). The effect is smaller for the root tree, probably because the path lengths are shorter. sortflag seems less affected by the locale. I suspect this is because the number of comparisons it needs to do is lower and the number of bytes it compares for each is lower (name vs path), as explored in my prior email. So, one takeaway from this is that, if you don't mind the C locale's sort order, you should probably set LC_COLLATE=C on your machines. Speed: For the mnt tree, sortflag performs slightly slower than sortcmd in the C locale but drastically faster in the en_US.UTF-8 locale. For the full, root, and home trees, sortflag performs modestly faster than sortcmd in the C locale and drastically faster than sortcmd in the en_US.UTF-8 locale. Can sorting improve performance?: In all the scenarios that actually do I/O (cold cache and not tmpfs), sortflag outperforms unsorted. This was surprising. To investigate, I ran an additional mode ("revsortflag") where I simply reversed the result of strcoll in `find -s`, so the find would traverse the directory structure in descending order. For the home tree with a cold cache in the C locale, revsortflag took about 19 seconds, compared to 18 for unsorted and 15 for ascending sortflag. I ran the same under blktrace and plotted a density plot of the block accesses, attached as `blk.png`. This may be devolving into palm-reading, but it looks to me like the sortflag mode gets really nice locality of access for the "/"-shaped part of the run. The revsortflag mode is nearly the reverse image of the sortflag, as I'd expect, but it's widened (slower). So it seems that the order of the I/O actually matters, even on a modern NVMe drive, and that ascending sort by name helps on my system. I don't know how much of this is an effect of my system being relatively new or the exact tools I used to create files on it. Overall: In summary, the performance of `find -s` seems pretty good. Except in the presence of extremely large directories (with over 1M direct children), I'd recommend `find -s` over `find | sort`, and I doubt many people would notice a performance penalty with `find -s` compared to plain `find`. -Diego
0004-find-Set-FTS_DEFER_STAT-flag-for-sort-performance.patch
Description: Text Data
0005-find-Handle-possible-errors-from-strcoll.patch
Description: Text Data
bench.py
Description: Text Data
process.py
Description: Text Data
results.csv
Description: Text Data
results.png
Description: PNG image
blk.png
Description: PNG image
[Prev in Thread] | Current Thread | [Next in Thread] |