unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Chris Marusich <cmmarusich@gmail.com>
To: Mark H Weaver <mhw@netris.org>, Platoxia <platoxia@protonmail.com>
Cc: 35521@debbugs.gnu.org
Subject: bug#35521: Mariadb test suite failures on x86_64-linux
Date: Tue, 09 Jul 2019 23:18:57 -0700	[thread overview]
Message-ID: <87pnmil8dq.fsf_-_@gmail.com> (raw)
In-Reply-To: <87h8aemrow.fsf@netris.org> (Mark H. Weaver's message of "Wed, 01 May 2019 05:47:32 -0400, Fri, 10 May 2019 01:33:50 +0000")

[-- Attachment #1: Type: text/plain, Size: 8801 bytes --]

Hi,

I've been encountering this failure off and on for a few weeks now, and
I'd like to help fix it.  In short, it seems like non-deterministic test
failures, to me.  I think we should gather data and report the issue
upstream, and maybe disable the offending tests in the meantime.

Mariadb failed for me earlier today with a different error than the ones
observed in this bug report so far.  My error was the following (when
building mariadb 10.1.40 on an x86_64-linux system using Guix 9b2644c):

  Failure: Failed 1/1990 tests, 99.95% were successful.

  Failing test(s): tokudb_bugs.5733_innodb

  The log files in var/log may give you some hint of what went wrong.

  If you want to report this error, please read first the documentation
  at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

  558 tests were skipped, 169 by the test itself

I kept the failed build directory, but there is no "var" directory to be
found there.  I guess they meant system logs; I am not sure where such
logs would go when emitted from within a derivation.

The MySQL website suggested running mysql-test-run.pl with the --force
option, which I casually tried after invoking ". environment-variables"
from the failed build directory; however, it promptly failed because it
could not find 'my_safe_process' - maybe I didn't have everything set up
just so to run the tests manually.

Curiously, on a different x86_64-linux machine, using Guix commit
6c83c48 (which is only a few commits ahead of 9b2644c), I was able to
build mariadb successfully, although I am not sure when I built it
(running "guix build mariadb" currently results in quick success for me,
so on this machine I probably built or substituted it some time ago).
The derivation (without grafts) was identical to the one that failed to
build on the other machine, which is strange because I would normally
expect the same derivation to succeed on both machines.  For the record,
this was the derivation:

  $ guix build --no-grafts -d mariadb
  /gnu/store/9yw33r8r84qrsic7fiq0lqqkbzisv1cj-mariadb-10.1.40.drv

Perhaps these tests fail non-deterministically?  Or perhaps they fail in
a way that is specific something not isolated from the build process by
Guix, such as the kernel, the file system, or the hardware?

I tried to check the status of mariadb in Cuirass.  However, I only
found the following information:

  https://ci.guix.gnu.org/search?query=mariadb-10.1.40

For x86_64-linux, build 1304242 supposedly failed at 10 May 20:32 +0200
after about 3 hours of runtime:

  https://ci.guix.gnu.org/build/1304242/details

I say "supposedly failed" because I'm not sure why it failed.  The build
log seems to indicate no problems:

  https://ci.guix.gnu.org/build/1304242/log/raw

Has Cuirass tried to build mariadb since then?  May 10th was a long time
ago, and I am surprised there is not another build of it from master.

Mark H Weaver <mhw@netris.org> writes:

> Mark H Weaver <mhw@netris.org> writes:
>
>> The same build also failed twice in a row on my Thinkpad X200, and with
>> the same error each time, although it's a different error than happens
>> on hydra.gnunet.org.  On my X200, I get this instead:
>>
>>> Failure: Failed 1/1091 tests, 99.91% were successful.
>>> 
>>> Failing test(s): tokudb_bugs.mdev4533
>
> and it just failed a third time on my X200, again with the same error.

It seems like the tests may be flaky.  The test failure I saw was
different from yours.  And in my case, I actually was able to build (or
substitute) mariadb once.  So maybe what we need to do is gather enough
data to report the problem upstream, to enlist their help?

Platoxia <platoxia@protonmail.com> writes:

> This problem persists and is preventing sucessful completion of guix system reconfigure for pre-1.0.0 systems (at least mine which is still at kernel 4.20), not only for those using mariadb but also for anyone using any of the 544 packages that depend on it; as per the command guix graph --type=reverse-package mariadb | grep -c label).
>
> This could, potentially, be fixed by simply adding this test to the list of disabled tests in the package definition:
>
> --- snip ---
> (add-after 'unpack 'adjust-tests
>            (lambda _
>              (let ((disabled-tests
>                     '(;; These fail because root@hostname == root@localhost in
>                       ;; the build environment, causing a user count mismatch.
>                       ;; See <https://jira.mariadb.org/browse/MDEV-7761>.
>                       "main.join_cache"
>                       "main.explain_non_select"
>                       "main.stat_tables_innodb"
>                       "roles.acl_statistics"
>
>                       ;; This file contains a time bomb which makes it fail after
>                       ;; 2030-12-31.  See <https://bugs.gnu.org/34351> for details.
>                       "main.mysqldump"
>
>                       ;; XXX: Fails sporadically.
>                       "innodb_fts.crash_recovery"
>
>                       ;; FIXME: This test fails on i686:
>                       ;; -myisampack: Can't create/write to file (Errcode: 17 "File exists")
>                       ;; +myisampack: Can't create/write to file (Errcode: 17 "File exists)
>                       ;; When running "myisampack --join=foo/t3 foo/t1 foo/t2"
>                       ;; (all three tables must exist and be identical)
>                       ;; in a loop it produces the same error around 1/240 times.
>                       ;; montywi on #maria suggested removing the real_end check in
>                       ;; "strings/my_vsnprintf.c" on line 503, yet it still does not
>                       ;; reach the ending quote occasionally.  Disable it for now.
>                       "main.myisampack"
>                       ;; FIXME: This test fails on armhf-linux:
>                       "mroonga/storage.index_read_multiple_double"))
>
>                    ;; This file contains a list of known-flaky tests for this
>                    ;; release.  Append our own items.
>                    (unstable-tests (open-file "mysql-test/unstable-tests" "a")))
>                (for-each (lambda (test)
>                            (format unstable-tests "~a : ~a\n"
>                                    test "Disabled in Guix"))
>                          disabled-tests)
>                (close-port unstable-tests)
> --- snip ---
>
> I say "potentially" because after getting this failure I happened to notice that approximately one and a half minutes after beginning the build of /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv the kernel throws this message: "traps: cmTC_35af5[27766] trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]".
>
> I have retested this several times and confirmed that this occurs each and every time mariadb-10.1.38.drv tries to build and in approximately the same amount of time after starting the build. I say approximately because the closest I could get to a timeframe on this kernel message in relation to the mariadb build is by sending the stdout from guix system reconfigure through logger so that it gets printed with a timestamp to the kernel messages terminal (alt-F12).
>
> Specifically, the message sequence is always as follows, without deviation (other than the cmTC_#), with no related messages in between; as per the command cat /dev/vcs12:
>
> --- snip ---
> May  9 16:36:35 localhost root cmd: guix system reconfigure: building /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv...
> May  9 16:38:08 localhost vmunix: [ 9169.050496] traps: cmTC_35af5[27766] trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]
> --- snip ---
>
> I really suggest trying to simply add the tokudb_alter_table.hcad_all_add test to the package definition before trying to solve the overall problem, though. Maybe we can get this in for 1.0.1?
>
> I would be willing to do this myself and report the results here but I'm baffled at how to achieve this simple task. Perhaps someone could walk me through it?

I'm not sure about the kernel error.  I haven't seen an error like that
myself.  But perhaps this is yet another test which is failing
non-deterministically?

I think we need more data.  It would be nice if we could build this
repeatedly on Cuirass.  When the build is 3 hours long, it is difficult
to test it on my machine, and I often forget about it by the time it is
done running.

If I get more time, I will try to dig in more.  In the meantime, any
thoughts about this would be welcome.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2019-07-10  6:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-01  9:18 bug#35521: Mariadb test suite failures on x86_64-linux Mark H Weaver
2019-05-01  9:47 ` Mark H Weaver
2019-07-10  6:18   ` Chris Marusich [this message]
2019-07-10 17:30     ` Marius Bakke
2019-07-10 21:32       ` Mark H Weaver
2019-07-11 20:18         ` Ludovic Courtès
2019-07-12  8:02           ` Giovanni Biscuolo
2019-07-11 22:01         ` Marius Bakke
2019-07-12  8:24           ` Giovanni Biscuolo
2019-07-12 14:58           ` Marius Bakke
2019-07-13 17:29             ` Mark H Weaver
2019-07-13 22:42               ` Marius Bakke
2019-07-14  2:35                 ` Mark H Weaver
2019-07-14 11:10                   ` Arne Babenhauserheide
2019-07-14 15:58                     ` Gábor Boskovits
2019-07-14  4:42                 ` Mark H Weaver
2019-07-14 17:17                   ` Marius Bakke
2019-07-14 18:34                     ` Mark H Weaver
2019-07-14 21:15                       ` Marius Bakke
2019-07-13 18:38             ` Mark H Weaver
2019-05-10  1:33 ` bug#35521: /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv - Failing test(s): tokudb_alter_table.hcad_all_add Platoxia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnmil8dq.fsf_-_@gmail.com \
    --to=cmmarusich@gmail.com \
    --cc=35521@debbugs.gnu.org \
    --cc=mhw@netris.org \
    --cc=platoxia@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).