bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?


From: Morgan Weetman
Subject: Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?
Date: Wed, 24 Jan 2018 13:09:29 +1100

Hi Peng,

INAE .. but I think this is a case of using the wrong tool for the job.

- Find is designed to locate files based on specific attributes, and as
such is performing a stat on every file in the target directory so that the
file information is available to it.
- Echo is simply outputting a list of files in the target directory based
on the file name glob, it has no need to obtain information on every file,
or any ability to use that information.

If you want to locate files based on a specific attribute/s, use find. If
you just want to list files in a directory based on their name, use the
shell globbing feature.

hth


On 24 January 2018 at 11:44, Peng Yu <address@hidden> wrote:

> The attached files are the strace results for `echo` and `find`. Can
> anybody check if there is a way to improve the performance of `find`
> so that it can work as efficient as `echo` in this test case? Thanks.
>
> $ cat main.sh
> #!/usr/bin/env bash
> # vim: set noexpandtab tabstop=2:
>
> echo *.txt
> $ strace ./main.sh 2>/tmp/echo_strace.txt
> $ strace find -name '*.txt' > /dev/null 2>/tmp/find_strace.txt
>
> On Sun, Jan 21, 2018 at 7:49 AM, James Youngman <address@hidden> wrote:
> > On Sat, Jan 20, 2018 at 10:16 AM, Peng Yu <address@hidden> wrote:
> >>
> >> Hi,
> >>
> >> There are ~7000 .txt files in a directory on glusterfs. Here are the run
> >> time of the following two commands. Does anybody know why the find
> command
> >> is much slower than *.txt? Is there a way to change the API that `find`
> >> uses to search files so that it can be more friendly to
> >> glusterfs?
> >>
> >> $ time echo *.txt > /dev/null
> >>
> >> real    0m2.206s
> >> user    0m0.039s
> >> sys     0m0.056s
> >> $ time find -name '*.txt' > /dev/null
> >>
> >> real    0m18.558s
> >> user    0m0.317s
> >> sys     0m0.663s
> >
> >
> >
> > Is this an apples-to-apples comparison?   For example does . contain sub
> > directories?    A comparison of the output of strace -c for both commands
> > will probably be illuminating.   Perhaps stat calls are relatively
> expensive
> > on glusterfs (this happens on at least some other cluster filesystems
> > because obtaining a correct value fort st_size requires finding the
> > consensus answer for the current length of the file, while obtaining the
> > list of items in a directory may not require the same amount of locking
> or
> > consensus work
> >
> > James.
> >
>
>
>
> --
> Regards,
> Peng
>



-- 
Morgan Weetman
Services Content Architect
M: +61 439 469 793
https://www.redhat.com/en/services/training


reply via email to

[Prev in Thread] Current Thread [Next in Thread]