coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: false-positive failure of the root-removal test


From: Pádraig Brady
Subject: Re: false-positive failure of the root-removal test
Date: Thu, 15 Oct 2015 02:44:40 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

On 14/10/15 18:43, Jim Meyering wrote:
> Running a massively parallel "make very-expensive-check"
> (-j73 on a 48-core system), the rm/r-root.sh test would fail
> about 1-in-2 or 1-in-3 trials due to expiration of the 2-second
> timeout here:
> 
> diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
> index c06332a..4e645e6 100755
> --- a/tests/rm/r-root.sh
> +++ b/tests/rm/r-root.sh
> @@ -88,7 +88,7 @@ exercise_rm_r_root ()
>      skip_exit='CU_TEST_SKIP_EXIT=1'
>    fi
> 
> -  timeout --signal=KILL 2 \
> +  timeout --signal=KILL 5 \
>      env LD_PRELOAD=$LD_PRELOAD:./k.so $skip_exit \
>        rm -rv --one-file-system "$@" > out 2> err
> 
> I made the above change and observed that the whole test then
> succeeded 6 times in a row. Then I read the comment above that change:
> 
> # exercise_rm_r_root: shell function to test "rm -r '/'"
> # The caller must provide the FILE to remove as well as any options
> # which should be passed to 'rm'.
> # Paranoia mode on:
> # For the worst case where both rm(1) would fail to refuse to process the "/"
> # argument (in the cases without the --no-preserve-root option), and
> # intercepting the unlinkat(1) system call would fail (which actually already
> # has been proven to work above), and the current non root user has
> # write access to "/", limit the damage to the current file system via
> # the --one-file-system option.
> # Furthermore, run rm(1) via timeout(1) that kills that process after
> # a maximum of 2 seconds.
> 
> So maybe compromise at 3 seconds (with that, it's passed 4 times so far)?
> Probably better still: I'll remember this and decrease -j's argument from
> 1+3N/2 to something slightly less abusive.

The comment above that is a bit more explicit, i.e.

# This isn't terribly expensive, but it must not be run under heavy load.
# The reason is the conservative 'timeout' setting below to limit possible
# damage in the worst case which yields a race under heavy load.
# Marking this test as "expensive" therefore is a compromise, i.e., adding
# this test to the list ensures it still gets _some_ (albeit minimal)
# coverage while not causing false-positive failures in day to day runs.

It would be better to avoid the race of course.
timeout(1) isn't great protection anyway
as 2 seconds allows for a lot of unlink calls.
Perhaps leveraging gdb to limit the number of unlink calls is better.

...

So I had a look and ended up debugging the debugger :/
https://sourceware.org/bugzilla/show_bug.cgi?id=10079

...

anyway the attached leverages gdb to add the protection
which should be better and free of races.

cheers,
Pádraig.

Attachment: coreutils-rm-root-test-race.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]