[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: false-positive failure of the root-removal test
From: |
Bernhard Voelker |
Subject: |
Re: false-positive failure of the root-removal test |
Date: |
Thu, 15 Oct 2015 10:15:23 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 10/15/2015 03:44 AM, Pádraig Brady wrote:
> From 1686e5cb23a8b2494bc7c3096fc8447d79421f21 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <address@hidden>
> Date: Wed, 14 Oct 2015 23:26:22 +0100
> Subject: [PATCH] tests: avoid race under load for tests/rm/r-root.sh
nice work!
> * tests/rm/r-root.sh: Use gdb rather than timeout(1) as the
> last resort protection against unlinkat() calls. The timeout
> of 2s was susceptible to false positives under load, and
> gdb is stronger protection in any case. We remove the
> "expensive" tag on this test also since it should be robust.
> Reported by Jim Meyering.
> ---
> tests/rm/r-root.sh | 94
> +++++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 68 insertions(+), 26 deletions(-)
>
> diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
> index c06332a..e0564b8 100755
> --- a/tests/rm/r-root.sh
> +++ b/tests/rm/r-root.sh
> @@ -32,13 +32,19 @@ skip_if_root_
> # LD_PRELOAD environment variable. This requires shared libraries to work.
> require_gcc_shared_
>
> -# This isn't terribly expensive, but it must not be run under heavy load.
> -# The reason is the conservative 'timeout' setting below to limit possible
> -# damage in the worst case which yields a race under heavy load.
> -# Marking this test as "expensive" therefore is a compromise, i.e., adding
> -# this test to the list ensures it still gets _some_ (albeit minimal)
> -# coverage while not causing false-positive failures in day to day runs.
> -expensive_
FWIW: the test now is really now taking longer (~3.3s instead of ~1.4s before).
Do we have a threshold/guideline as when to mark tests 'expensive_'?
> +# Use gdb to provide further protection by limiting calls to unlinkat().
> +( timeout 10s gdb --version ) > gdb.out 2>&1
> +case $(cat gdb.out) in
> + *'GNU gdb'*) ;;
> + *) skip_ "can't run gdb";;
> +esac
> +
> +# Break on a line rather than a symbol, to cater for inline functions
> +break_src="$abs_top_srcdir/src/remove.c"
> +break_line=$(grep -n ^excise "$break_src") || framework_failure_
> +break_line=$(echo "$break_line" | cut -d: -f1) || framework_failure_
> +break_line="$break_src:$break_line"
> +
this doesn't double-check that grep+cut above really find the source
line (imagine someone would rename that source/function in future), and
if $break_line is something usable. Well, GDB may complain with
"malformed linespec error: unexpected end of input"
when e.g. the line number is missing, but I wouldn't rely on that;
maybe add another tiny pattern check here.
> cat > k.c <<'EOF' || framework_failure_
> #include <stdio.h>
> @@ -63,6 +69,26 @@ EOF
> gcc_shared_ k.c k.so \
> || framework_failure_ 'failed to build shared library'
>
> +# Note breakpoint commands don't work in batch mode
> +# https://sourceware.org/bugzilla/show_bug.cgi?id=10079
> +# So we use python to script behavior upon hitting the breakpoint
> +cat > bp.py <<'EOF.py' || framework_failure_
> +def breakpoint_handler (event):
> + if not isinstance(event, gdb.BreakpointEvent):
> + return
> + hit_count = event.breakpoints[0].hit_count
> + if hit_count == 1:
> + gdb.execute('shell touch excise.break')
> + gdb.execute('continue')
> + elif hit_count > 2:
> + gdb.write('breakpoint hit twice already')
do we see this output somewhere?
> + gdb.execute('quit 1')
> + else:
> + gdb.execute('continue')
> +
> +gdb.events.stop.connect(breakpoint_handler)
> +EOF.py
> +
hmm, python - I didn't see an explicit check for GDB python support,
yet the following test seems to be sufficient.
>
> #-------------------------------------------------------------------------------
> # exercise_rm_r_root: shell function to test "rm -r '/'"
> # The caller must provide the FILE to remove as well as any options
> @@ -74,13 +100,13 @@ gcc_shared_ k.c k.so \
> # has been proven to work above), and the current non root user has
> # write access to "/", limit the damage to the current file system via
> # the --one-file-system option.
> -# Furthermore, run rm(1) via timeout(1) that kills that process after
> -# a maximum of 2 seconds.
> +# Furthermore, run rm(1) via gdb that limits the number of unlinkat() calls.
> exercise_rm_r_root ()
> {
> - # Remove the evidence file "x"; verify that.
> - rm -f x || framework_failure_
> + # Remove the evidence files; verify that.
> + rm -f x excise.break || framework_failure_
> test -f x && framework_failure_
> + test -f excise.break && framework_failure_
>
> local skip_exit=
> if [ "$CU_TEST_SKIP_EXIT" = 1 ]; then
> @@ -88,9 +114,14 @@ exercise_rm_r_root ()
> skip_exit='CU_TEST_SKIP_EXIT=1'
> fi
>
> - timeout --signal=KILL 2 \
> - env LD_PRELOAD=$LD_PRELOAD:./k.so $skip_exit \
> - rm -rv --one-file-system "$@" > out 2> err
> + gdb -nx --batch-silent -return-child-result
> \
> + --eval-command="set exec-wrapper \
> + env 'LD_PRELOAD=$LD_PRELOAD:./k.so' $skip_exit" \
Would you mind folding the env LD_PRELOAD + $skip_exit stuff into one line?
It a) looks a bit odd in the test source and the log, and b) maybe some
strange GDB version may not like newlines in there (still paranoia from
my side).
> + --eval-command="break '$break_line'" \
> + --eval-command='source bp.py' \
> + --eval-command="run -rv --one-file-system $*" \
> + --eval-command='quit' \
> + rm < /dev/null > out 2> err
>
> return $?
> }
> @@ -111,10 +142,11 @@ for file in dir file ; do
> exercise_rm_r_root "$file" || skip=1
> test -e "$file" || skip=1
> test -f x || skip=1
> + test -f excise.break || skip=1 # gdb works and breakpoint hit
>
> - test $skip = 1 \
> + test "$skip" = 1 \
> && { cat out; cat err; \
> - skip_ "internal test failure: maybe LD_PRELOAD doesn't work?"; }
> + skip_ "internal test failure: maybe LD_PRELOAD or gdb doesn't
> work?"; }
> done
>
> # "rm -r /" without --no-preserve-root should output the following
> @@ -124,6 +156,19 @@ rm: it is dangerous to operate recursively on '/'
> rm: use --no-preserve-root to override this failsafe
> EOD
>
> +# In order of the sed expressions below, this cleans:
> +#
> +# 1. gdb uses the full path when running rm, so remove the leading dirs.
> +# 2. For some of the "/" synonyms, the error diagnostic slightly differs from
> +# that of the basic "/" case (see gnulib's fts_open' and ROOT_DEV_INO_WARN):
> +# rm: it is dangerous to operate recursively on 'FILE' (same as '/')
> +# Strip that part off for the following comparison.
> +clean_rm_err()
> +{
> + sed "s/.*rm: /rm: /; \
> + s/\(rm: it is dangerous to operate recursively on\).*$/\1 '\/'/"
> +}
> +
It seems that this is not enough. At least on openSUSE:13.2,
I'm getting this:
+ compare_ exp err2
+ diff -u exp err2
--- exp 2015-10-15 09:42:47.956590048 +0200
+++ err2 2015-10-15 09:42:48.051591245 +0200
@@ -1,2 +1,3 @@
+Got object file from memory but can't read symbols: File truncated.
rm: it is dangerous to operate recursively on '/'
rm: use --no-preserve-root to override this failsafe
+ fail=1
This GDB bug seems to be fixed meanwhile, but I can't check on a newer
system right now.
BTW: I suggest moving the sed call into exercise_rm_r_root: nw, there
are many pieces playing together (returns_, gdb, python, LD_PRELOAD,
CU_TEST_SKIP_EXIT, the witness files, ect.), so the move would make
reading easier.
>
> #-------------------------------------------------------------------------------
> # Exercise "rm -r /" without and with the --preserve-root option.
> # Exercise various synonyms of "/" including symlinks to it.
> @@ -145,12 +190,7 @@ for opts in \
>
> returns_ 1 exercise_rm_r_root $opts || fail=1
>
> - # For some of the synonyms, the error diagnostic slightly differs from that
> - # of the basic "/" case (see gnulib's fts_open' and ROOT_DEV_INO_WARN):
> - # rm: it is dangerous to operate recursively on 'FILE' (same as '/')
> - # Strip that part off for the following comparison.
> - sed "s/\(rm: it is dangerous to operate recursively on\).*$/\1 '\/'/" err \
> - > err2 || framework_failure_
> + clean_rm_err < err > err2 || framework_failure_
>
> # Expect nothing in 'out' and the above error diagnostic in 'err2'.
> # As rm(1) should have skipped the "/" argument, it does not call
> unlinkat().
> @@ -179,6 +219,8 @@ CU_TEST_SKIP_EXIT=1
>
> returns_ 1 exercise_rm_r_root --preserve-root file1 '/' file2 || fail=1
>
> +clean_rm_err < err > err2 || framework_failure_
> +
> unset CU_TEST_SKIP_EXIT
>
> cat <<EOD > out_removed
> @@ -190,9 +232,9 @@ EOD
> # Both 'file1' and 'file2' should be removed. Simply verify that in the
> # "out" file, as the replacement unlinkat() dummy did not remove them.
> # Expect the evidence file "x" to exist.
> -compare out_removed out || fail=1
> -compare exp err || fail=1
> -test -f x || fail=1
> +compare out_removed out || fail=1
> +compare exp err2 || fail=1
> +test -f x || fail=1
>
> # Do nothing more if this test failed.
> test $fail = 1 && { cat out; cat err; Exit $fail; }
> @@ -219,7 +261,7 @@ for file in \
>
> returns_ 1 exercise_rm_r_root --preserve-root "$file" || fail=1
>
> - grep "^rm: refusing to remove '\.' or '\.\.' directory: skipping" err \
> + grep "rm: refusing to remove '\.' or '\.\.' directory: skipping" err \
> || fail=1
>
> compare /dev/null out || fail=1
> -- 2.5.0
There's another place where to remove the '^' from the pattern (line 286):
grep "^rm: it is dangerous to operate recursively on '/'" err && fail=1
Thanks & have a nice day,
Berny
- false-positive failure of the root-removal test, Jim Meyering, 2015/10/14
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test,
Bernhard Voelker <=
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15