[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: false-positive failure of the root-removal test
From: |
Pádraig Brady |
Subject: |
Re: false-positive failure of the root-removal test |
Date: |
Thu, 15 Oct 2015 10:32:41 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 |
On 15/10/15 09:15, Bernhard Voelker wrote:
> On 10/15/2015 03:44 AM, Pádraig Brady wrote:
>> From 1686e5cb23a8b2494bc7c3096fc8447d79421f21 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <address@hidden>
>> Date: Wed, 14 Oct 2015 23:26:22 +0100
>> Subject: [PATCH] tests: avoid race under load for tests/rm/r-root.sh
>
> nice work!
>
>> * tests/rm/r-root.sh: Use gdb rather than timeout(1) as the
>> last resort protection against unlinkat() calls. The timeout
>> of 2s was susceptible to false positives under load, and
>> gdb is stronger protection in any case. We remove the
>> "expensive" tag on this test also since it should be robust.
>> Reported by Jim Meyering.
>> ---
>> tests/rm/r-root.sh | 94
>> +++++++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 68 insertions(+), 26 deletions(-)
>>
>> diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
>> index c06332a..e0564b8 100755
>> --- a/tests/rm/r-root.sh
>> +++ b/tests/rm/r-root.sh
>> @@ -32,13 +32,19 @@ skip_if_root_
>> # LD_PRELOAD environment variable. This requires shared libraries to work.
>> require_gcc_shared_
>>
>> -# This isn't terribly expensive, but it must not be run under heavy load.
>> -# The reason is the conservative 'timeout' setting below to limit possible
>> -# damage in the worst case which yields a race under heavy load.
>> -# Marking this test as "expensive" therefore is a compromise, i.e., adding
>> -# this test to the list ensures it still gets _some_ (albeit minimal)
>> -# coverage while not causing false-positive failures in day to day runs.
>> -expensive_
>
> FWIW: the test now is really now taking longer (~3.3s instead of ~1.4s
> before).
> Do we have a threshold/guideline as when to mark tests 'expensive_'?
Anything over around 5s on a modern machine I would think.
>> +# Use gdb to provide further protection by limiting calls to unlinkat().
>> +( timeout 10s gdb --version ) > gdb.out 2>&1
>> +case $(cat gdb.out) in
>> + *'GNU gdb'*) ;;
>> + *) skip_ "can't run gdb";;
>> +esac
>> +
>> +# Break on a line rather than a symbol, to cater for inline functions
>> +break_src="$abs_top_srcdir/src/remove.c"
>> +break_line=$(grep -n ^excise "$break_src") || framework_failure_
>> +break_line=$(echo "$break_line" | cut -d: -f1) || framework_failure_
>> +break_line="$break_src:$break_line"
>> +
>
> this doesn't double-check that grep+cut above really find the source
> line (imagine someone would rename that source/function in future), and
> if $break_line is something usable. Well, GDB may complain with
> "malformed linespec error: unexpected end of input"
> when e.g. the line number is missing, but I wouldn't rely on that;
> maybe add another tiny pattern check here.
I was relying on the witness file "excise.break" to check
that the line, breakpoint, python support, etc. was working.
>
>> cat > k.c <<'EOF' || framework_failure_
>> #include <stdio.h>
>> @@ -63,6 +69,26 @@ EOF
>> gcc_shared_ k.c k.so \
>> || framework_failure_ 'failed to build shared library'
>>
>> +# Note breakpoint commands don't work in batch mode
>> +# https://sourceware.org/bugzilla/show_bug.cgi?id=10079
>> +# So we use python to script behavior upon hitting the breakpoint
>> +cat > bp.py <<'EOF.py' || framework_failure_
>> +def breakpoint_handler (event):
>> + if not isinstance(event, gdb.BreakpointEvent):
>> + return
>> + hit_count = event.breakpoints[0].hit_count
>> + if hit_count == 1:
>> + gdb.execute('shell touch excise.break')
>> + gdb.execute('continue')
>> + elif hit_count > 2:
>> + gdb.write('breakpoint hit twice already')
>
> do we see this output somewhere?
Without --batch-silent it is output.
>> + gdb -nx --batch-silent -return-child-result
>> \
>> + --eval-command="set exec-wrapper
>> \
>> + env 'LD_PRELOAD=$LD_PRELOAD:./k.so' $skip_exit"
>> \
>
> Would you mind folding the env LD_PRELOAD + $skip_exit stuff into one line?
> It a) looks a bit odd in the test source and the log, and b) maybe some
> strange GDB version may not like newlines in there (still paranoia from
> my side).
The newline is not passed to gdb.
It's more awkward to adjust while staying within the mandated line length.
>> +# rm: it is dangerous to operate recursively on 'FILE' (same as '/')
>> +# Strip that part off for the following comparison.
>> +clean_rm_err()
>> +{
>> + sed "s/.*rm: /rm: /; \
>> + s/\(rm: it is dangerous to operate recursively on\).*$/\1 '\/'/"
>> +}
>> +
>
> It seems that this is not enough. At least on openSUSE:13.2,
> I'm getting this:
>
> + compare_ exp err2
> + diff -u exp err2
> --- exp 2015-10-15 09:42:47.956590048 +0200
> +++ err2 2015-10-15 09:42:48.051591245 +0200
> @@ -1,2 +1,3 @@
> +Got object file from memory but can't read symbols: File truncated.
> rm: it is dangerous to operate recursively on '/'
> rm: use --no-preserve-root to override this failsafe
> + fail=1
>
> This GDB bug seems to be fixed meanwhile, but I can't check on a newer
> system right now.
Ugh. I'll add an initial check to skip in this case.
> BTW: I suggest moving the sed call into exercise_rm_r_root: nw, there
> are many pieces playing together (returns_, gdb, python, LD_PRELOAD,
> CU_TEST_SKIP_EXIT, the witness files, ect.), so the move would make
> reading easier.
Done
>> - grep "^rm: refusing to remove '\.' or '\.\.' directory: skipping" err \
>> + grep "rm: refusing to remove '\.' or '\.\.' directory: skipping" err \
> There's another place where to remove the '^' from the pattern (line 286):
>
> grep "^rm: it is dangerous to operate recursively on '/'" err && fail=1
Good catch.
I already pushed unfortunately.
How about the attached on top?
thanks for the review!
Pádraig
coreutils-rm-root-gdb-fix.patch
Description: Text Data
- false-positive failure of the root-removal test, Jim Meyering, 2015/10/14
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15
- Re: false-positive failure of the root-removal test,
Pádraig Brady <=
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15