From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Marusich Subject: bug#35521: Mariadb test suite failures on x86_64-linux Date: Tue, 09 Jul 2019 23:18:57 -0700 Message-ID: <87pnmil8dq.fsf_-_@gmail.com> References: <87tveemt19.fsf@netris.org> <87tveemt19.fsf@netris.org> <87h8aemrow.fsf@netris.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:40666) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hl5xW-0000ii-RZ for bug-guix@gnu.org; Wed, 10 Jul 2019 02:20:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hl5xU-0007xF-6t for bug-guix@gnu.org; Wed, 10 Jul 2019 02:20:06 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:54163) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hl5xS-0007ux-Nf for bug-guix@gnu.org; Wed, 10 Jul 2019 02:20:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hl5xS-0001Su-9d for bug-guix@gnu.org; Wed, 10 Jul 2019 02:20:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87h8aemrow.fsf@netris.org> (Mark H. Weaver's message of "Wed, 01 May 2019 05:47:32 -0400, Fri, 10 May 2019 01:33:50 +0000") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver , Platoxia Cc: 35521@debbugs.gnu.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi, I've been encountering this failure off and on for a few weeks now, and I'd like to help fix it. In short, it seems like non-deterministic test failures, to me. I think we should gather data and report the issue upstream, and maybe disable the offending tests in the meantime. Mariadb failed for me earlier today with a different error than the ones observed in this bug report so far. My error was the following (when building mariadb 10.1.40 on an x86_64-linux system using Guix 9b2644c): Failure: Failed 1/1990 tests, 99.95% were successful. Failing test(s): tokudb_bugs.5733_innodb The log files in var/log may give you some hint of what went wrong. If you want to report this error, please read first the documentation at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html 558 tests were skipped, 169 by the test itself I kept the failed build directory, but there is no "var" directory to be found there. I guess they meant system logs; I am not sure where such logs would go when emitted from within a derivation. The MySQL website suggested running mysql-test-run.pl with the --force option, which I casually tried after invoking ". environment-variables" from the failed build directory; however, it promptly failed because it could not find 'my_safe_process' - maybe I didn't have everything set up just so to run the tests manually. Curiously, on a different x86_64-linux machine, using Guix commit 6c83c48 (which is only a few commits ahead of 9b2644c), I was able to build mariadb successfully, although I am not sure when I built it (running "guix build mariadb" currently results in quick success for me, so on this machine I probably built or substituted it some time ago). The derivation (without grafts) was identical to the one that failed to build on the other machine, which is strange because I would normally expect the same derivation to succeed on both machines. For the record, this was the derivation: $ guix build --no-grafts -d mariadb /gnu/store/9yw33r8r84qrsic7fiq0lqqkbzisv1cj-mariadb-10.1.40.drv Perhaps these tests fail non-deterministically? Or perhaps they fail in a way that is specific something not isolated from the build process by Guix, such as the kernel, the file system, or the hardware? I tried to check the status of mariadb in Cuirass. However, I only found the following information: https://ci.guix.gnu.org/search?query=3Dmariadb-10.1.40 For x86_64-linux, build 1304242 supposedly failed at 10 May 20:32 +0200 after about 3 hours of runtime: https://ci.guix.gnu.org/build/1304242/details I say "supposedly failed" because I'm not sure why it failed. The build log seems to indicate no problems: https://ci.guix.gnu.org/build/1304242/log/raw Has Cuirass tried to build mariadb since then? May 10th was a long time ago, and I am surprised there is not another build of it from master. Mark H Weaver writes: > Mark H Weaver writes: > >> The same build also failed twice in a row on my Thinkpad X200, and with >> the same error each time, although it's a different error than happens >> on hydra.gnunet.org. On my X200, I get this instead: >> >>> Failure: Failed 1/1091 tests, 99.91% were successful. >>>=20 >>> Failing test(s): tokudb_bugs.mdev4533 > > and it just failed a third time on my X200, again with the same error. It seems like the tests may be flaky. The test failure I saw was different from yours. And in my case, I actually was able to build (or substitute) mariadb once. So maybe what we need to do is gather enough data to report the problem upstream, to enlist their help? Platoxia writes: > This problem persists and is preventing sucessful completion of guix syst= em reconfigure for pre-1.0.0 systems (at least mine which is still at kerne= l 4.20), not only for those using mariadb but also for anyone using any of = the 544 packages that depend on it; as per the command guix graph --type=3D= reverse-package mariadb | grep -c label). > > This could, potentially, be fixed by simply adding this test to the list = of disabled tests in the package definition: > > --- snip --- > (add-after 'unpack 'adjust-tests > (lambda _ > (let ((disabled-tests > '(;; These fail because root@hostname =3D=3D root@loc= alhost in > ;; the build environment, causing a user count mism= atch. > ;; See . > "main.join_cache" > "main.explain_non_select" > "main.stat_tables_innodb" > "roles.acl_statistics" > > ;; This file contains a time bomb which makes it fa= il after > ;; 2030-12-31. See fo= r details. > "main.mysqldump" > > ;; XXX: Fails sporadically. > "innodb_fts.crash_recovery" > > ;; FIXME: This test fails on i686: > ;; -myisampack: Can't create/write to file (Errcode= : 17 "File exists") > ;; +myisampack: Can't create/write to file (Errcode= : 17 "File exists) > ;; When running "myisampack --join=3Dfoo/t3 foo/t1 = foo/t2" > ;; (all three tables must exist and be identical) > ;; in a loop it produces the same error around 1/24= 0 times. > ;; montywi on #maria suggested removing the real_en= d check in > ;; "strings/my_vsnprintf.c" on line 503, yet it sti= ll does not > ;; reach the ending quote occasionally. Disable it= for now. > "main.myisampack" > ;; FIXME: This test fails on armhf-linux: > "mroonga/storage.index_read_multiple_double")) > > ;; This file contains a list of known-flaky tests for = this > ;; release. Append our own items. > (unstable-tests (open-file "mysql-test/unstable-tests"= "a"))) > (for-each (lambda (test) > (format unstable-tests "~a : ~a\n" > test "Disabled in Guix")) > disabled-tests) > (close-port unstable-tests) > --- snip --- > > I say "potentially" because after getting this failure I happened to noti= ce that approximately one and a half minutes after beginning the build of /= gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv the kernel t= hrows this message: "traps: cmTC_35af5[27766] trap invalid opcode ip:555555= 555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]". > > I have retested this several times and confirmed that this occurs each an= d every time mariadb-10.1.38.drv tries to build and in approximately the sa= me amount of time after starting the build. I say approximately because the= closest I could get to a timeframe on this kernel message in relation to t= he mariadb build is by sending the stdout from guix system reconfigure thro= ugh logger so that it gets printed with a timestamp to the kernel messages = terminal (alt-F12). > > Specifically, the message sequence is always as follows, without deviatio= n (other than the cmTC_#), with no related messages in between; as per the = command cat /dev/vcs12: > > --- snip --- > May 9 16:36:35 localhost root cmd: guix system reconfigure: building /gn= u/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv... > May 9 16:38:08 localhost vmunix: [ 9169.050496] traps: cmTC_35af5[27766]= trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[= 555555555000+1000] > --- snip --- > > I really suggest trying to simply add the tokudb_alter_table.hcad_all_add= test to the package definition before trying to solve the overall problem,= though. Maybe we can get this in for 1.0.1? > > I would be willing to do this myself and report the results here but I'm = baffled at how to achieve this simple task. Perhaps someone could walk me t= hrough it? I'm not sure about the kernel error. I haven't seen an error like that myself. But perhaps this is yet another test which is failing non-deterministically? I think we need more data. It would be nice if we could build this repeatedly on Cuirass. When the build is 3 hours long, it is difficult to test it on my machine, and I often forget about it by the time it is done running. If I get more time, I will try to dig in more. In the meantime, any thoughts about this would be welcome. =2D-=20 Chris --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAl0lg1EACgkQ3UCaFdgi Rp0DZg/+MeOEmURNkPsIrkLyrgjG/okg06VfOkPezht2CnXEwIy+cSjccKN/6l9H eRGUQLEXyqhMNueYGIiMXYGgVixknNIJ97/Twx8qVc+0mrY4h7t2lDif4QXF5ytV IRmOd3eRB/tN5eA3CoNAV/VSsEGvqXpvvfa3XJPvYjKso78WvZP4qGlwyhmPyeBc W77MInNSod7pdpcvy1BceB5vORrAsQmFixrnZb/mb239JyOLq448C7sqC501kt3Q 1SS0qVUc1yf1vdRqxyZek5XKSAJaS3Y+EfgZRkpHjzkoSaeLf0kbaz+9DfIqqszk y0PtrLSPt5rMqGxsWHni6YzdkNcP8v7As9ZDN60HCH0SdJLBeSQ6R3zrtOd1T2Qj CTArMfzyKFQPHe62WdrLYLo4a8NFGKQobbPLJKP5ipnmmGCu1mbhvmi258NkQjul 0IXlx6STUzfywwDp1rCXz/nrM0g+FRESxAr2ejOWuhgluxE7xRAaE1qzP5wLC/yX RpndTqkkVu8Qiy3RJMVkvsQvfD8xksNALeSToYX6qoBfaF0brwCTgfcf1g+ktLTa MlK5FnvS4YBg9dtOHOYzzQwiw7gzuEh604eBzC+Mn7Q5B+FBg4p4Q14Ec0s4kO5y HVeQWTLLgUQmMsmbBUd+7sxKyYk8LEqJ9Gkx20Z8MS1x3WV/fCI= =9zjG -----END PGP SIGNATURE----- --=-=-=--