[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29921: O(n^2) performance of rm -r
From: |
Niklas Hambüchen |
Subject: |
bug#29921: O(n^2) performance of rm -r |
Date: |
Mon, 1 Jan 2018 18:13:21 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 |
On 01/01/2018 17.40, Pádraig Brady wrote:
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.
My understanding of commit 24412edeaf556a was that "waiting for the
device" is what's claimed to be nonlinear with the patch -- it
explicitly mentions that it fixes behaviour that's only present on
seeking devices, so that would be real time, not user/sys, right?
Also, the test added at the end of the patch measures real time, not
user/sys.
So I'm relatively sure that what Jim Meyering was measuring/fixing when
he wrote the patch and commit message was real time.
But it would be nice if he could confirm that.
> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.
In the patch I linked, it says specifically
"RAM-backed file systems (tmpfs) are not affected,
since there is no seek penalty"
So I am not sure what checking on a ramdisk gains us here, unless you
think that this statement in the patch was incorrect.
> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.
While a nice idea, this is not very practical advice, as when an
application goes crazy and writes millions of files onto the disk, I
can't just wipe the entire file system.
A O(n^2) wall-time `rm -r` is a big problem.
Of course it may not be coreutils's fault, but since coreutils was
clearly aware of the issue in the past and claimed it was fixed, I think
we should investigate to be really sure this isn't a regression anywhere
in the ecosystem.
> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.
I don't follow this reasoning.
Should not any form of caching eventually follow linear behaviour?
Also, if it were as you said, shouldn't the `touch` also eventually
become quadratic?
It seems not so, creating the dir entries seems perfectly linear.