[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Poor performance of man -K for uncompressed pages (was: man -K finds rep
From: |
Alejandro Colomar |
Subject: |
Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) |
Date: |
Sun, 9 Apr 2023 17:20:43 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 |
Hi Colin,
On 4/9/23 16:55, Colin Watson wrote:
> On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
>> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>> 3 /opt/local/man/share/man/man2/dup.2
>> 2 /opt/local/man/share/man/man2/fcntl.2
>> 5 /opt/local/man/share/man/man2/getrlimit.2
>> 3 /opt/local/man/share/man/man2/open.2
>> 1 /opt/local/man/share/man/man2/pidfd_getfd.2
>> 1 /opt/local/man/share/man/man2/pidfd_open.2
>> 2 /opt/local/man/share/man/man2/poll.2
>> 1 /opt/local/man/share/man/man2/seccomp_unotify.2
>> 4 /opt/local/man/share/man/man2/select.2
>>
>> Those numbers coincide with 1+ the number of symlinks for each of the
>> pages. For example, see select.2:
>
> Thanks for the report. Fixed by this commit:
>
>
> https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993
Heh, that was fast :)
As a side effect of not reading too many files, performance improved
considerably for bzip2 (~3x), and for gzip (~2x).
I built man from source (tweaking with -O3, so I cheated a little bit),
and here are the results:
$ export MANPATH=/tmp/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d -
<\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14
$ export MANPATH=/tmp/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
3.05
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d -
<\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.20
$ export MANPATH=/tmp/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.52
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE
| wc -l"
17
0.01
Please consider this a new bug report, about performance. See the last
block of commands. man(1) takes half a second, while my loop with
find(1) and grep(1) is almost non-measurable. I could understand that
man(1) has some overhead, but 52x feels like there's some serious
performance problem; especially when man(1) is faster reading
uncompressed pages (see at the top).
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
OpenPGP_signature
Description: OpenPGP digital signature