odd test failure: misc/sort-spinlock-abuse

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

odd test failure: misc/sort-spinlock-abuse

From:	Jim Meyering
Subject:	odd test failure: misc/sort-spinlock-abuse
Date:	Sat, 09 Apr 2011 12:34:58 +0200

The misc/sort-spinlock-abuse test,

  http://git.sv.gnu.org/cgit/coreutils.git/tree/tests/misc/sort-spinlock-abuse

fails regularly when it is run in parallel with others (make -j25 check,
ext4+SSD, 6/12-core, F15).  It fails because an output-restrained sort
(writing to a FIFO with a slow consumer) takes more than 1 second of CPU
time to process the regular file, "in", created by "seq 100000 > in".

In fact, it may take even more than 4 seconds of CPU time.
At first I thought it was a regression.  But no:
the problem arises even when running sort with --parallel=1.
The original bug involved a parallel-specific busy-wait
triggered by the blocked output.

What's going on?
I have traced it back to an fstat syscall that is consuming
lots of CPU time.

Here's strace -r output, where FD 3 refers to the regular input file, "in":

    3.263544 fstat(3, {st_mode=S_IFREG|0600, st_size=588895, ...}) = 0

When I run this test in isolation, it always completes successfully.
In that case, the fstat takes 30-40 microseconds.

    make check -C tests TESTS=misc/sort-spinlock-abuse VERBOSE=yes

But when I run it via "make -j25 check", it fails ~40% of the time.

Next step is probably to see if oprofile can shed some light.

[Prev in Thread]

Current Thread

[Next in Thread]

odd test failure: misc/sort-spinlock-abuse, Jim Meyering <=
- Re: odd test failure: misc/sort-spinlock-abuse, Jim Meyering, 2011/04/13

Prev by Date: Re: ready for release of coreutils-8.11?
Next by Date: Re: ready for release of coreutils-8.11?
Previous by thread: ready for release of coreutils-8.11?
Next by thread: Re: odd test failure: misc/sort-spinlock-abuse
Index(es):
- Date
- Thread