coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

strange, unreproducible coreutils tests/mkdir/t-slash failure on F16


From: Jim Meyering
Subject: strange, unreproducible coreutils tests/mkdir/t-slash failure on F16
Date: Tue, 25 Oct 2011 14:26:48 +0200

Just an FYI:

On my multi-core F16 desktop (linux-3.1.0-0.rc9.git0.0.fc16.x86_64),
I ran coreutils' "make -j15 distcheck", and saw this sole test failure:

    FAIL: mkdir/t-slash (exit: 134)
    ===============================

    dispose_command: bad command type: 20
    Aborting...

I've never seen that before.  That diagnostic comes from bash,
yet I see no way in which the simple t-slash script could evoke it.

Seconds later, I reran that command and got segfaults from msgmerge and make:

    make[5]: *** [all] Segmentation fault (core dumped)

Here are the lines from dmesg:

    [519319.155222] msgmerge[25257]: segfault at 18 ip 00000038c561599e sp 
00007ffffe6428b0 error 4 in libgettextsrc-0.18.1.so[38c5600000+3e000]
    [519386.303692] make[3382]: segfault at 10 ip 0000000000407fb2 sp 
00007fffd74d2f80 error 4 in make[400000+29000]

Repeating one more time, I see this:

    ...
    esac
    tar: Skipping to next header
    xz: coreutils-8.14.15-22f3b.tar.xz: Compressed data is corrupt
    tar: Exiting with failure status due to previous errors
    gtar: This does not look like a tar archive
    gtar: Exiting with failure status due to previous errors

I reran the "make distcheck" command a few more times, and got those same
tar/xz diagnostics consistently.  Thinking that finally I might be able to
debug easily, since it's all serial...  Wrong.  I realized that I am using
a version of xz (built from git) that does multithreaded compression.

Thinking threading could be the problem, I reran it like this:

    (export OMP_NUM_THREADS=1; make distcheck)

Now, the tar/xz failures are gone, but I see this ominous error
from gcc (with nothing prior):

    comm.c:186: confused by earlier errors, bailing out
    The bug is not reproducible, so it is likely a hardware or OS problem.

Finally, one more attempt (nothing else changed), and it succeeded,
even without OMP_NUM_THREADS=1:

And a 2nd success.
And a third success.
I'm going to put it in a loop and run "make distcheck" for a few hours...

Two possible explanations.
Some system-related problem, like whatever is causing this:

    http://bugzilla.redhat.com/747377
    (but note I'm using the earlier glibc-2.14.90-10.x86_64,
     so that git threading/heap-corruption bug doesn't affect me)

Or maybe it's bad memory.  But since 747377 is reproducible (thanks,
Rich Jones), and glibc may have merely changed something to amplify the
likelihood of triggering an existing bug, I'm not going to spend hours
running memtest86+ just yet.  BTW, last week when I began investigating
747377, I successfully bootstrapped gcc from git and it passed most of
its test cases.  That's usually a good indication that RAM is ok.

I've waited a while, just in case...
Now I'm up to 9 consecutive "make bootstrap" successes.  No failure.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]