bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35521: Mariadb test suite failures on x86_64-linux


From: Chris Marusich
Subject: bug#35521: Mariadb test suite failures on x86_64-linux
Date: Tue, 09 Jul 2019 23:18:57 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Hi,

I've been encountering this failure off and on for a few weeks now, and
I'd like to help fix it.  In short, it seems like non-deterministic test
failures, to me.  I think we should gather data and report the issue
upstream, and maybe disable the offending tests in the meantime.

Mariadb failed for me earlier today with a different error than the ones
observed in this bug report so far.  My error was the following (when
building mariadb 10.1.40 on an x86_64-linux system using Guix 9b2644c):

  Failure: Failed 1/1990 tests, 99.95% were successful.

  Failing test(s): tokudb_bugs.5733_innodb

  The log files in var/log may give you some hint of what went wrong.

  If you want to report this error, please read first the documentation
  at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

  558 tests were skipped, 169 by the test itself

I kept the failed build directory, but there is no "var" directory to be
found there.  I guess they meant system logs; I am not sure where such
logs would go when emitted from within a derivation.

The MySQL website suggested running mysql-test-run.pl with the --force
option, which I casually tried after invoking ". environment-variables"
from the failed build directory; however, it promptly failed because it
could not find 'my_safe_process' - maybe I didn't have everything set up
just so to run the tests manually.

Curiously, on a different x86_64-linux machine, using Guix commit
6c83c48 (which is only a few commits ahead of 9b2644c), I was able to
build mariadb successfully, although I am not sure when I built it
(running "guix build mariadb" currently results in quick success for me,
so on this machine I probably built or substituted it some time ago).
The derivation (without grafts) was identical to the one that failed to
build on the other machine, which is strange because I would normally
expect the same derivation to succeed on both machines.  For the record,
this was the derivation:

  $ guix build --no-grafts -d mariadb
  /gnu/store/9yw33r8r84qrsic7fiq0lqqkbzisv1cj-mariadb-10.1.40.drv

Perhaps these tests fail non-deterministically?  Or perhaps they fail in
a way that is specific something not isolated from the build process by
Guix, such as the kernel, the file system, or the hardware?

I tried to check the status of mariadb in Cuirass.  However, I only
found the following information:

  https://ci.guix.gnu.org/search?query=mariadb-10.1.40

For x86_64-linux, build 1304242 supposedly failed at 10 May 20:32 +0200
after about 3 hours of runtime:

  https://ci.guix.gnu.org/build/1304242/details

I say "supposedly failed" because I'm not sure why it failed.  The build
log seems to indicate no problems:

  https://ci.guix.gnu.org/build/1304242/log/raw

Has Cuirass tried to build mariadb since then?  May 10th was a long time
ago, and I am surprised there is not another build of it from master.

Mark H Weaver <address@hidden> writes:

> Mark H Weaver <address@hidden> writes:
>
>> The same build also failed twice in a row on my Thinkpad X200, and with
>> the same error each time, although it's a different error than happens
>> on hydra.gnunet.org.  On my X200, I get this instead:
>>
>>> Failure: Failed 1/1091 tests, 99.91% were successful.
>>> 
>>> Failing test(s): tokudb_bugs.mdev4533
>
> and it just failed a third time on my X200, again with the same error.

It seems like the tests may be flaky.  The test failure I saw was
different from yours.  And in my case, I actually was able to build (or
substitute) mariadb once.  So maybe what we need to do is gather enough
data to report the problem upstream, to enlist their help?

Platoxia <address@hidden> writes:

> This problem persists and is preventing sucessful completion of guix system 
> reconfigure for pre-1.0.0 systems (at least mine which is still at kernel 
> 4.20), not only for those using mariadb but also for anyone using any of the 
> 544 packages that depend on it; as per the command guix graph 
> --type=reverse-package mariadb | grep -c label).
>
> This could, potentially, be fixed by simply adding this test to the list of 
> disabled tests in the package definition:
>
> --- snip ---
> (add-after 'unpack 'adjust-tests
>            (lambda _
>              (let ((disabled-tests
>                     '(;; These fail because root@hostname == root@localhost in
>                       ;; the build environment, causing a user count mismatch.
>                       ;; See <https://jira.mariadb.org/browse/MDEV-7761>.
>                       "main.join_cache"
>                       "main.explain_non_select"
>                       "main.stat_tables_innodb"
>                       "roles.acl_statistics"
>
>                       ;; This file contains a time bomb which makes it fail 
> after
>                       ;; 2030-12-31.  See <https://bugs.gnu.org/34351> for 
> details.
>                       "main.mysqldump"
>
>                       ;; XXX: Fails sporadically.
>                       "innodb_fts.crash_recovery"
>
>                       ;; FIXME: This test fails on i686:
>                       ;; -myisampack: Can't create/write to file (Errcode: 17 
> "File exists")
>                       ;; +myisampack: Can't create/write to file (Errcode: 17 
> "File exists)
>                       ;; When running "myisampack --join=foo/t3 foo/t1 foo/t2"
>                       ;; (all three tables must exist and be identical)
>                       ;; in a loop it produces the same error around 1/240 
> times.
>                       ;; montywi on #maria suggested removing the real_end 
> check in
>                       ;; "strings/my_vsnprintf.c" on line 503, yet it still 
> does not
>                       ;; reach the ending quote occasionally.  Disable it for 
> now.
>                       "main.myisampack"
>                       ;; FIXME: This test fails on armhf-linux:
>                       "mroonga/storage.index_read_multiple_double"))
>
>                    ;; This file contains a list of known-flaky tests for this
>                    ;; release.  Append our own items.
>                    (unstable-tests (open-file "mysql-test/unstable-tests" 
> "a")))
>                (for-each (lambda (test)
>                            (format unstable-tests "~a : ~a\n"
>                                    test "Disabled in Guix"))
>                          disabled-tests)
>                (close-port unstable-tests)
> --- snip ---
>
> I say "potentially" because after getting this failure I happened to notice 
> that approximately one and a half minutes after beginning the build of 
> /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv the kernel 
> throws this message: "traps: cmTC_35af5[27766] trap invalid opcode 
> ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]".
>
> I have retested this several times and confirmed that this occurs each and 
> every time mariadb-10.1.38.drv tries to build and in approximately the same 
> amount of time after starting the build. I say approximately because the 
> closest I could get to a timeframe on this kernel message in relation to the 
> mariadb build is by sending the stdout from guix system reconfigure through 
> logger so that it gets printed with a timestamp to the kernel messages 
> terminal (alt-F12).
>
> Specifically, the message sequence is always as follows, without deviation 
> (other than the cmTC_#), with no related messages in between; as per the 
> command cat /dev/vcs12:
>
> --- snip ---
> May  9 16:36:35 localhost root cmd: guix system reconfigure: building 
> /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv...
> May  9 16:38:08 localhost vmunix: [ 9169.050496] traps: cmTC_35af5[27766] 
> trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in 
> cmTC_35af5[555555555000+1000]
> --- snip ---
>
> I really suggest trying to simply add the tokudb_alter_table.hcad_all_add 
> test to the package definition before trying to solve the overall problem, 
> though. Maybe we can get this in for 1.0.1?
>
> I would be willing to do this myself and report the results here but I'm 
> baffled at how to achieve this simple task. Perhaps someone could walk me 
> through it?

I'm not sure about the kernel error.  I haven't seen an error like that
myself.  But perhaps this is yet another test which is failing
non-deterministically?

I think we need more data.  It would be nice if we could build this
repeatedly on Cuirass.  When the build is 3 hours long, it is difficult
to test it on my machine, and I often forget about it by the time it is
done running.

If I get more time, I will try to dig in more.  In the meantime, any
thoughts about this would be welcome.

-- 
Chris

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]