[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
odd test failure: misc/sort-spinlock-abuse
From: |
Jim Meyering |
Subject: |
odd test failure: misc/sort-spinlock-abuse |
Date: |
Sat, 09 Apr 2011 12:34:58 +0200 |
The misc/sort-spinlock-abuse test,
http://git.sv.gnu.org/cgit/coreutils.git/tree/tests/misc/sort-spinlock-abuse
fails regularly when it is run in parallel with others (make -j25 check,
ext4+SSD, 6/12-core, F15). It fails because an output-restrained sort
(writing to a FIFO with a slow consumer) takes more than 1 second of CPU
time to process the regular file, "in", created by "seq 100000 > in".
In fact, it may take even more than 4 seconds of CPU time.
At first I thought it was a regression. But no:
the problem arises even when running sort with --parallel=1.
The original bug involved a parallel-specific busy-wait
triggered by the blocked output.
What's going on?
I have traced it back to an fstat syscall that is consuming
lots of CPU time.
Here's strace -r output, where FD 3 refers to the regular input file, "in":
3.263544 fstat(3, {st_mode=S_IFREG|0600, st_size=588895, ...}) = 0
When I run this test in isolation, it always completes successfully.
In that case, the fstat takes 30-40 microseconds.
make check -C tests TESTS=misc/sort-spinlock-abuse VERBOSE=yes
But when I run it via "make -j25 check", it fails ~40% of the time.
Next step is probably to see if oprofile can shed some light.
- odd test failure: misc/sort-spinlock-abuse,
Jim Meyering <=