* bug#35521: Mariadb test suite failures on x86_64-linux @ 2019-05-01 9:18 Mark H Weaver 2019-05-01 9:47 ` Mark H Weaver 2019-05-10 1:33 ` bug#35521: /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv - Failing test(s): tokudb_alter_table.hcad_all_add Platoxia 0 siblings, 2 replies; 21+ messages in thread From: Mark H Weaver @ 2019-05-01 9:18 UTC (permalink / raw) To: 35521 hydra.gnunet.org has failed to build mariadb on x86_64-linux twice in a row: https://hydra.gnu.org/build/3475081#tabs-buildsteps The same test failed both times: > Failure: Failed 1/5075 tests, 99.98% were successful. > > Failing test(s): tokudb_alter_table.hcad_all_add The same build also failed twice in a row on my Thinkpad X200, and with the same error each time, although it's a different error than happens on hydra.gnunet.org. On my X200, I get this instead: > Failure: Failed 1/1091 tests, 99.91% were successful. > > Failing test(s): tokudb_bugs.mdev4533 hydra.gnunet.org successfully built mariadb for i686-linux on its first attempt: https://hydra.gnu.org/build/3473640 Here's the coresponding armhf-linux build, which has not yet been attempted as I write this: https://hydra.gnu.org/build/3481309 Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-05-01 9:18 bug#35521: Mariadb test suite failures on x86_64-linux Mark H Weaver @ 2019-05-01 9:47 ` Mark H Weaver 2019-07-10 6:18 ` Chris Marusich 2019-05-10 1:33 ` bug#35521: /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv - Failing test(s): tokudb_alter_table.hcad_all_add Platoxia 1 sibling, 1 reply; 21+ messages in thread From: Mark H Weaver @ 2019-05-01 9:47 UTC (permalink / raw) To: 35521 Mark H Weaver <mhw@netris.org> writes: > The same build also failed twice in a row on my Thinkpad X200, and with > the same error each time, although it's a different error than happens > on hydra.gnunet.org. On my X200, I get this instead: > >> Failure: Failed 1/1091 tests, 99.91% were successful. >> >> Failing test(s): tokudb_bugs.mdev4533 and it just failed a third time on my X200, again with the same error. Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-05-01 9:47 ` Mark H Weaver @ 2019-07-10 6:18 ` Chris Marusich 2019-07-10 17:30 ` Marius Bakke 0 siblings, 1 reply; 21+ messages in thread From: Chris Marusich @ 2019-07-10 6:18 UTC (permalink / raw) To: Mark H Weaver, Platoxia; +Cc: 35521 [-- Attachment #1: Type: text/plain, Size: 8801 bytes --] Hi, I've been encountering this failure off and on for a few weeks now, and I'd like to help fix it. In short, it seems like non-deterministic test failures, to me. I think we should gather data and report the issue upstream, and maybe disable the offending tests in the meantime. Mariadb failed for me earlier today with a different error than the ones observed in this bug report so far. My error was the following (when building mariadb 10.1.40 on an x86_64-linux system using Guix 9b2644c): Failure: Failed 1/1990 tests, 99.95% were successful. Failing test(s): tokudb_bugs.5733_innodb The log files in var/log may give you some hint of what went wrong. If you want to report this error, please read first the documentation at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html 558 tests were skipped, 169 by the test itself I kept the failed build directory, but there is no "var" directory to be found there. I guess they meant system logs; I am not sure where such logs would go when emitted from within a derivation. The MySQL website suggested running mysql-test-run.pl with the --force option, which I casually tried after invoking ". environment-variables" from the failed build directory; however, it promptly failed because it could not find 'my_safe_process' - maybe I didn't have everything set up just so to run the tests manually. Curiously, on a different x86_64-linux machine, using Guix commit 6c83c48 (which is only a few commits ahead of 9b2644c), I was able to build mariadb successfully, although I am not sure when I built it (running "guix build mariadb" currently results in quick success for me, so on this machine I probably built or substituted it some time ago). The derivation (without grafts) was identical to the one that failed to build on the other machine, which is strange because I would normally expect the same derivation to succeed on both machines. For the record, this was the derivation: $ guix build --no-grafts -d mariadb /gnu/store/9yw33r8r84qrsic7fiq0lqqkbzisv1cj-mariadb-10.1.40.drv Perhaps these tests fail non-deterministically? Or perhaps they fail in a way that is specific something not isolated from the build process by Guix, such as the kernel, the file system, or the hardware? I tried to check the status of mariadb in Cuirass. However, I only found the following information: https://ci.guix.gnu.org/search?query=mariadb-10.1.40 For x86_64-linux, build 1304242 supposedly failed at 10 May 20:32 +0200 after about 3 hours of runtime: https://ci.guix.gnu.org/build/1304242/details I say "supposedly failed" because I'm not sure why it failed. The build log seems to indicate no problems: https://ci.guix.gnu.org/build/1304242/log/raw Has Cuirass tried to build mariadb since then? May 10th was a long time ago, and I am surprised there is not another build of it from master. Mark H Weaver <mhw@netris.org> writes: > Mark H Weaver <mhw@netris.org> writes: > >> The same build also failed twice in a row on my Thinkpad X200, and with >> the same error each time, although it's a different error than happens >> on hydra.gnunet.org. On my X200, I get this instead: >> >>> Failure: Failed 1/1091 tests, 99.91% were successful. >>> >>> Failing test(s): tokudb_bugs.mdev4533 > > and it just failed a third time on my X200, again with the same error. It seems like the tests may be flaky. The test failure I saw was different from yours. And in my case, I actually was able to build (or substitute) mariadb once. So maybe what we need to do is gather enough data to report the problem upstream, to enlist their help? Platoxia <platoxia@protonmail.com> writes: > This problem persists and is preventing sucessful completion of guix system reconfigure for pre-1.0.0 systems (at least mine which is still at kernel 4.20), not only for those using mariadb but also for anyone using any of the 544 packages that depend on it; as per the command guix graph --type=reverse-package mariadb | grep -c label). > > This could, potentially, be fixed by simply adding this test to the list of disabled tests in the package definition: > > --- snip --- > (add-after 'unpack 'adjust-tests > (lambda _ > (let ((disabled-tests > '(;; These fail because root@hostname == root@localhost in > ;; the build environment, causing a user count mismatch. > ;; See <https://jira.mariadb.org/browse/MDEV-7761>. > "main.join_cache" > "main.explain_non_select" > "main.stat_tables_innodb" > "roles.acl_statistics" > > ;; This file contains a time bomb which makes it fail after > ;; 2030-12-31. See <https://bugs.gnu.org/34351> for details. > "main.mysqldump" > > ;; XXX: Fails sporadically. > "innodb_fts.crash_recovery" > > ;; FIXME: This test fails on i686: > ;; -myisampack: Can't create/write to file (Errcode: 17 "File exists") > ;; +myisampack: Can't create/write to file (Errcode: 17 "File exists) > ;; When running "myisampack --join=foo/t3 foo/t1 foo/t2" > ;; (all three tables must exist and be identical) > ;; in a loop it produces the same error around 1/240 times. > ;; montywi on #maria suggested removing the real_end check in > ;; "strings/my_vsnprintf.c" on line 503, yet it still does not > ;; reach the ending quote occasionally. Disable it for now. > "main.myisampack" > ;; FIXME: This test fails on armhf-linux: > "mroonga/storage.index_read_multiple_double")) > > ;; This file contains a list of known-flaky tests for this > ;; release. Append our own items. > (unstable-tests (open-file "mysql-test/unstable-tests" "a"))) > (for-each (lambda (test) > (format unstable-tests "~a : ~a\n" > test "Disabled in Guix")) > disabled-tests) > (close-port unstable-tests) > --- snip --- > > I say "potentially" because after getting this failure I happened to notice that approximately one and a half minutes after beginning the build of /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv the kernel throws this message: "traps: cmTC_35af5[27766] trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]". > > I have retested this several times and confirmed that this occurs each and every time mariadb-10.1.38.drv tries to build and in approximately the same amount of time after starting the build. I say approximately because the closest I could get to a timeframe on this kernel message in relation to the mariadb build is by sending the stdout from guix system reconfigure through logger so that it gets printed with a timestamp to the kernel messages terminal (alt-F12). > > Specifically, the message sequence is always as follows, without deviation (other than the cmTC_#), with no related messages in between; as per the command cat /dev/vcs12: > > --- snip --- > May 9 16:36:35 localhost root cmd: guix system reconfigure: building /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv... > May 9 16:38:08 localhost vmunix: [ 9169.050496] traps: cmTC_35af5[27766] trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000] > --- snip --- > > I really suggest trying to simply add the tokudb_alter_table.hcad_all_add test to the package definition before trying to solve the overall problem, though. Maybe we can get this in for 1.0.1? > > I would be willing to do this myself and report the results here but I'm baffled at how to achieve this simple task. Perhaps someone could walk me through it? I'm not sure about the kernel error. I haven't seen an error like that myself. But perhaps this is yet another test which is failing non-deterministically? I think we need more data. It would be nice if we could build this repeatedly on Cuirass. When the build is 3 hours long, it is difficult to test it on my machine, and I often forget about it by the time it is done running. If I get more time, I will try to dig in more. In the meantime, any thoughts about this would be welcome. -- Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-10 6:18 ` Chris Marusich @ 2019-07-10 17:30 ` Marius Bakke 2019-07-10 21:32 ` Mark H Weaver 0 siblings, 1 reply; 21+ messages in thread From: Marius Bakke @ 2019-07-10 17:30 UTC (permalink / raw) To: Chris Marusich, Mark H Weaver, Platoxia; +Cc: 35521 [-- Attachment #1.1: Type: text/plain, Size: 593 bytes --] Chris Marusich <cmmarusich@gmail.com> writes: > Hi, > > I've been encountering this failure off and on for a few weeks now, and > I'd like to help fix it. In short, it seems like non-deterministic test > failures, to me. I think we should gather data and report the issue > upstream, and maybe disable the offending tests in the meantime. I agree. I notice many of these failing tests are for the TokuDB backend, which I doubt anyone is using in Guix anyway. Here is a patch that disables all tests mentioned in this report. I would like to push it to core-updates. Are there others? [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: mariadb.diff --] [-- Type: text/x-patch, Size: 908 bytes --] diff --git a/gnu/packages/databases.scm b/gnu/packages/databases.scm index 578670e3c1..778c70eed0 100644 --- a/gnu/packages/databases.scm +++ b/gnu/packages/databases.scm @@ -704,8 +704,12 @@ Language.") ;; 2030-12-31. See <https://bugs.gnu.org/34351> for details. "main.mysqldump" - ;; XXX: Fails sporadically. + ;; XXX: These tests may fail on some hardware configurations, + ;; see <https://bugs.gnu.org/35521> et al. "innodb_fts.crash_recovery" + "tokudb_alter_table.hcad_all_add" + "tokudb_bugs.mdev4533" + "tokudb_bugs.5733_innodb" ;; FIXME: This test fails on i686: ;; -myisampack: Can't create/write to file (Errcode: 17 "File exists") [-- Attachment #1.3: Type: text/plain, Size: 101 bytes --] WDYT? Note that the latest MariaDB is 10.4.x, and these tests may well be fixed in later versions. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply related [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-10 17:30 ` Marius Bakke @ 2019-07-10 21:32 ` Mark H Weaver 2019-07-11 20:18 ` Ludovic Courtès 2019-07-11 22:01 ` Marius Bakke 0 siblings, 2 replies; 21+ messages in thread From: Mark H Weaver @ 2019-07-10 21:32 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 Hi, Marius Bakke <mbakke@fastmail.com> writes: > Chris Marusich <cmmarusich@gmail.com> writes: > >> Hi, >> >> I've been encountering this failure off and on for a few weeks now, and >> I'd like to help fix it. In short, it seems like non-deterministic test >> failures, to me. I think we should gather data and report the issue >> upstream, and maybe disable the offending tests in the meantime. > > I agree. I notice many of these failing tests are for the TokuDB > backend, which I doubt anyone is using in Guix anyway. > > Here is a patch that disables all tests mentioned in this report. I > would like to push it to core-updates. Are there others? I'm concerned by how frequently and casually we simply disable failing tests. What is the utility of running test suites at all, if this is how we respond? It makes me wonder how many programs are subtly broken on my Guix system because of this widespread practice. Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-10 21:32 ` Mark H Weaver @ 2019-07-11 20:18 ` Ludovic Courtès 2019-07-12 8:02 ` Giovanni Biscuolo 2019-07-11 22:01 ` Marius Bakke 1 sibling, 1 reply; 21+ messages in thread From: Ludovic Courtès @ 2019-07-11 20:18 UTC (permalink / raw) To: Mark H Weaver; +Cc: Platoxia, 35521 Hi Mark, Mark H Weaver <mhw@netris.org> skribis: > Marius Bakke <mbakke@fastmail.com> writes: > >> Chris Marusich <cmmarusich@gmail.com> writes: >> >>> Hi, >>> >>> I've been encountering this failure off and on for a few weeks now, and >>> I'd like to help fix it. In short, it seems like non-deterministic test >>> failures, to me. I think we should gather data and report the issue >>> upstream, and maybe disable the offending tests in the meantime. >> >> I agree. I notice many of these failing tests are for the TokuDB >> backend, which I doubt anyone is using in Guix anyway. >> >> Here is a patch that disables all tests mentioned in this report. I >> would like to push it to core-updates. Are there others? > > I'm concerned by how frequently and casually we simply disable failing > tests. What is the utility of running test suites at all, if this is > how we respond? I don’t think anyone is happy with that. The alternative seems to be: keeping an older version that perhaps didn’t have these problems but may have known bugs and security issues, or keeping a package that fails to build for a possibly long time. I think disabling specific tests is the least bad of these options. In this case, we know that the offending tests relate to a specific backend, and one can at least assume that potential issues are in that area. So I do think that this is an appropriate response. Of course, in any such case, we should report the issue upstream, even if we all too well know that non-deterministic test failures are hard to address… Ludo’. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-11 20:18 ` Ludovic Courtès @ 2019-07-12 8:02 ` Giovanni Biscuolo 0 siblings, 0 replies; 21+ messages in thread From: Giovanni Biscuolo @ 2019-07-12 8:02 UTC (permalink / raw) To: Ludovic Courtès, Mark H Weaver; +Cc: 35521 [-- Attachment #1: Type: text/plain, Size: 932 bytes --] Hi all, for what it counts I Ludovic Courtès <ludo@gnu.org> writes: > Mark H Weaver <mhw@netris.org> skribis: [...] >> I'm concerned by how frequently and casually we simply disable failing >> tests. I disagree here: disabling in Guix tests is _never_ done casually AFAIS (as far as I see) but always ponderated and discussed, like in this case ;-) [...] > I think disabling specific tests is the least bad of these options. Also: automated software testing is better than nothing but... who test tests? *Sometime* it happens that tests introduces "collateral test bugs" that have nothing to do with actual software issues, including secutiry ones. So IMHO neither upstream nor us should "blindly obey" to tests and disable proved unreliable ones :-D More on this specific issue in my next repy... :-) [...] Happy hacking! Gio' -- Giovanni Biscuolo Xelera IT Infrastructures [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-10 21:32 ` Mark H Weaver 2019-07-11 20:18 ` Ludovic Courtès @ 2019-07-11 22:01 ` Marius Bakke 2019-07-12 8:24 ` Giovanni Biscuolo 2019-07-12 14:58 ` Marius Bakke 1 sibling, 2 replies; 21+ messages in thread From: Marius Bakke @ 2019-07-11 22:01 UTC (permalink / raw) To: Mark H Weaver; +Cc: Platoxia, 35521 [-- Attachment #1: Type: text/plain, Size: 1917 bytes --] Mark H Weaver <mhw@netris.org> writes: > Hi, > > Marius Bakke <mbakke@fastmail.com> writes: > >> Chris Marusich <cmmarusich@gmail.com> writes: >> >>> Hi, >>> >>> I've been encountering this failure off and on for a few weeks now, and >>> I'd like to help fix it. In short, it seems like non-deterministic test >>> failures, to me. I think we should gather data and report the issue >>> upstream, and maybe disable the offending tests in the meantime. >> >> I agree. I notice many of these failing tests are for the TokuDB >> backend, which I doubt anyone is using in Guix anyway. >> >> Here is a patch that disables all tests mentioned in this report. I >> would like to push it to core-updates. Are there others? > > I'm concerned by how frequently and casually we simply disable failing > tests. What is the utility of running test suites at all, if this is > how we respond? I had no idea this issue was so widespread until I noticed Berlins builders hit it more often than not. I have not been able to reproduce these failures on my machines. So it was kind of a panic reaction, being the person responsible for running these tests and all. Looking further into the changes between 10.1.37 and 10.1.38, I notice the 'tokudb.*' tests were enabled: https://github.com/MariaDB/server/commit/4c490d6df63695dc97b2c808e59954e6877d3a51 Watching the build on Berlin in real time, I also see that the test output grind nearly to a halt while running those. 'tokudb.hotindex-insert-2' took 2700439 milliseconds, or 45 minutes, if I'm reading the test output correctly. The default test case timeout is 40 minutes (as specified in the Guix package), but I'm using 80 for this build (60 was insufficient). I suspect the problem is that the 'tokudb.*' tests put a lot of strain on the file system, which causes these other tests to fail. It's interesting that disabling parallel build was insufficient though. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-11 22:01 ` Marius Bakke @ 2019-07-12 8:24 ` Giovanni Biscuolo 2019-07-12 14:58 ` Marius Bakke 1 sibling, 0 replies; 21+ messages in thread From: Giovanni Biscuolo @ 2019-07-12 8:24 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 [-- Attachment #1: Type: text/plain, Size: 2243 bytes --] Hi Marius, Marius Bakke <mbakke@fastmail.com> writes: [...] > Looking further into the changes between 10.1.37 and 10.1.38, I notice > the 'tokudb.*' tests were enabled: > > https://github.com/MariaDB/server/commit/4c490d6df63695dc97b2c808e59954e6877d3a51 The very first thing I noticed lookng at that commit is it's subject: "Updated list of unstable tests for 10.1.38 release" The first comments of that file states: --8<---------------cut here---------------start------------->8--- # List the test cases which, unlike tests from disabled.def files, # can still be run on the current tree meaningfully, but are known # or suspected to fail sporadically on different reasons. # # Most common reasons are either test failures observed in buildbot, # or recent modifications to the tests which make their stability # unknown. # # Tests included due to recent modifications are later removed from the # list, if during a certain period they do not fail (and are not # modified again). Tests included due to intermittent failures are # removed when corresponding bug reports are closed. # # Separate the test case name and the comment with ':'. # # <suitename>.<testcasename> : MDEV-xxxxx - <comment> # # '*' wildcard in testcase names is supported. # # To use the list, run MTR with --skip-test-list=unstable-tests option. --8<---------------cut here---------------end--------------->8--- So *all* those rests _are_ considered unstable upstream. IMHO they should be *selectively* skipped when they causes build problems in Guix, including non-deterministic frequent ones like in this case. > Watching the build on Berlin in real time, I also see that the test > output grind nearly to a halt while running those. > 'tokudb.hotindex-insert-2' took 2700439 milliseconds, or 45 minutes, if > I'm reading the test output correctly. The same is happening upstream: https://jira.mariadb.org/browse/MDEV-15198 https://jira.mariadb.org/browse/MDEV-16040 (duplicate of the above) https://jira.mariadb.org/browse/MDEV-15271 That bugs (and all others related to unstable tests) are currently unresolved. HTH! Gio' -- Giovanni Biscuolo Xelera IT Infrastructures [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-11 22:01 ` Marius Bakke 2019-07-12 8:24 ` Giovanni Biscuolo @ 2019-07-12 14:58 ` Marius Bakke 2019-07-13 17:29 ` Mark H Weaver 2019-07-13 18:38 ` Mark H Weaver 1 sibling, 2 replies; 21+ messages in thread From: Marius Bakke @ 2019-07-12 14:58 UTC (permalink / raw) To: Mark H Weaver; +Cc: Platoxia, 35521 [-- Attachment #1.1: Type: text/plain, Size: 2094 bytes --] Marius Bakke <mbakke@fastmail.com> writes: > Mark H Weaver <mhw@netris.org> writes: > >> Hi, >> >> Marius Bakke <mbakke@fastmail.com> writes: >> >>> Chris Marusich <cmmarusich@gmail.com> writes: >>> >>>> Hi, >>>> >>>> I've been encountering this failure off and on for a few weeks now, and >>>> I'd like to help fix it. In short, it seems like non-deterministic test >>>> failures, to me. I think we should gather data and report the issue >>>> upstream, and maybe disable the offending tests in the meantime. >>> >>> I agree. I notice many of these failing tests are for the TokuDB >>> backend, which I doubt anyone is using in Guix anyway. >>> >>> Here is a patch that disables all tests mentioned in this report. I >>> would like to push it to core-updates. Are there others? >> >> I'm concerned by how frequently and casually we simply disable failing >> tests. What is the utility of running test suites at all, if this is >> how we respond? > > I had no idea this issue was so widespread until I noticed Berlins > builders hit it more often than not. I have not been able to reproduce > these failures on my machines. So it was kind of a panic reaction, > being the person responsible for running these tests and all. > > Looking further into the changes between 10.1.37 and 10.1.38, I notice > the 'tokudb.*' tests were enabled: > > https://github.com/MariaDB/server/commit/4c490d6df63695dc97b2c808e59954e6877d3a51 > > Watching the build on Berlin in real time, I also see that the test > output grind nearly to a halt while running those. > 'tokudb.hotindex-insert-2' took 2700439 milliseconds, or 45 minutes, if > I'm reading the test output correctly. > > The default test case timeout is 40 minutes (as specified in the Guix > package), but I'm using 80 for this build (60 was insufficient). > > I suspect the problem is that the 'tokudb.*' tests put a lot of strain > on the file system, which causes these other tests to fail. It's > interesting that disabling parallel build was insufficient though. Update: Berlin built mariadb twice on core-updates with this patch: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: db.diff --] [-- Type: text/x-patch, Size: 645 bytes --] diff --git a/gnu/packages/databases.scm b/gnu/packages/databases.scm index 6bfeaad9a2..64bc0938b6 100644 --- a/gnu/packages/databases.scm +++ b/gnu/packages/databases.scm @@ -753,7 +753,7 @@ Language.") (with-directory-excursion "mysql-test" (invoke "./mtr" "--verbose" "--retry=3" - "--testcase-timeout=40" + "--testcase-timeout=80" "--suite-timeout=600" "--parallel" (number->string (parallel-job-count)) "--skip-test-list=unstable-tests")) [-- Attachment #1.3: Type: text/plain, Size: 88 bytes --] Mark, Chris: Can you try this change with MariaDB 10.1.40 and see if it works for you? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply related [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-12 14:58 ` Marius Bakke @ 2019-07-13 17:29 ` Mark H Weaver 2019-07-13 22:42 ` Marius Bakke 2019-07-13 18:38 ` Mark H Weaver 1 sibling, 1 reply; 21+ messages in thread From: Mark H Weaver @ 2019-07-13 17:29 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 Hi Marius, > Update: Berlin built mariadb twice on core-updates with this patch: > > --8<---------------cut here---------------start------------->8--- > diff --git a/gnu/packages/databases.scm b/gnu/packages/databases.scm > index 6bfeaad9a2..64bc0938b6 100644 > --- a/gnu/packages/databases.scm > +++ b/gnu/packages/databases.scm > @@ -753,7 +753,7 @@ Language.") > (with-directory-excursion "mysql-test" > (invoke "./mtr" "--verbose" > "--retry=3" > - "--testcase-timeout=40" > + "--testcase-timeout=80" > "--suite-timeout=600" > "--parallel" (number->string (parallel-job-count)) > "--skip-test-list=unstable-tests")) > --8<---------------cut here---------------end--------------->8--- > > Mark, Chris: Can you try this change with MariaDB 10.1.40 and see if it > works for you? I tried it, but it made no difference on my Thinkpad X200, which still fails the same way as before with 10.1.38: Failing test(s): tokudb_bugs.mdev4533 Anyway, based on Giovanni's observations, https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35521#32 I'm now inclined to agree that these are likely to be flaky tests, so I withdraw my objections to disabling them, in this specific case. Having said that, I disagree with Giovanni's dismissal of my concerns in general, here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35521#29 I will respond to that dismissal in a later message. Thanks, Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-13 17:29 ` Mark H Weaver @ 2019-07-13 22:42 ` Marius Bakke 2019-07-14 2:35 ` Mark H Weaver 2019-07-14 4:42 ` Mark H Weaver 0 siblings, 2 replies; 21+ messages in thread From: Marius Bakke @ 2019-07-13 22:42 UTC (permalink / raw) To: Mark H Weaver; +Cc: Platoxia, 35521 [-- Attachment #1.1: Type: text/plain, Size: 1261 bytes --] Mark H Weaver <mhw@netris.org> writes: > Hi Marius, > >> Update: Berlin built mariadb twice on core-updates with this patch: >> >> --8<---------------cut here---------------start------------->8--- >> diff --git a/gnu/packages/databases.scm b/gnu/packages/databases.scm >> index 6bfeaad9a2..64bc0938b6 100644 >> --- a/gnu/packages/databases.scm >> +++ b/gnu/packages/databases.scm >> @@ -753,7 +753,7 @@ Language.") >> (with-directory-excursion "mysql-test" >> (invoke "./mtr" "--verbose" >> "--retry=3" >> - "--testcase-timeout=40" >> + "--testcase-timeout=80" >> "--suite-timeout=600" >> "--parallel" (number->string (parallel-job-count)) >> "--skip-test-list=unstable-tests")) >> --8<---------------cut here---------------end--------------->8--- >> >> Mark, Chris: Can you try this change with MariaDB 10.1.40 and see if it >> works for you? > > I tried it, but it made no difference on my Thinkpad X200, which still > fails the same way as before with 10.1.38: > > Failing test(s): tokudb_bugs.mdev4533 I was about to push this patch to core-updates: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: mariadb.patch --] [-- Type: text/x-patch, Size: 1422 bytes --] diff --git a/gnu/packages/databases.scm b/gnu/packages/databases.scm index 6bfeaad9a2..5d256b1af2 100644 --- a/gnu/packages/databases.scm +++ b/gnu/packages/databases.scm @@ -706,9 +706,6 @@ Language.") ;; 2030-12-31. See <https://bugs.gnu.org/34351> for details. "main.mysqldump" - ;; XXX: Fails sporadically. - "innodb_fts.crash_recovery" - ;; FIXME: This test fails on i686: ;; -myisampack: Can't create/write to file (Errcode: 17 "File exists") ;; +myisampack: Can't create/write to file (Errcode: 17 "File exists) @@ -753,7 +750,10 @@ Language.") (with-directory-excursion "mysql-test" (invoke "./mtr" "--verbose" "--retry=3" - "--testcase-timeout=40" + ;; On x86_64 we need a long timeout because of the + ;; TokuDB engine, whose individual test cases often + ;; require more than 1 hour to complete on busy hosts. + "--testcase-timeout=90" "--suite-timeout=600" "--parallel" (number->string (parallel-job-count)) "--skip-test-list=unstable-tests")) [-- Attachment #1.3: Type: text/plain, Size: 1388 bytes --] Lo and behold, tokudb_bugs.mdev4533 failed when I tried it on Berlin. A couple of lines above "Failing test(s):" is the test output: --8<---------------cut here---------------start------------->8--- CURRENT_TEST: tokudb_bugs.mdev4533 safe_process[29262]: parent_pid: 23338 safe_process[29262]: Started child 29263, terminated: 0 mysqltest: At line 6: query 'CREATE TABLE t1 (a INT(11), b CHAR(8)) ENGINE=TokuDB' failed: 1005: Ca n't create table `test`.`t1` (errno: 28 "No space left on device") The result from queries just before the failure was: DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a INT(11), b CHAR(8)) ENGINE=TokuDB; safe_process[29262]: Got signal 17, child_pid: 29263 safe_process[29262]: Killing child: 29263 safe_process[29262]: Child exit: 1 --8<---------------cut here---------------end--------------->8--- Could it be that you don't have enough disk space for this test? Do you have the log file available still? Here is the test in question: https://github.com/MariaDB/server/blob/10.1/storage/tokudb/mysql-test/tokudb_bugs/t/mdev4533.test As a side note, MariaDB is ~30 MiB bigger on x86_64 because of TokuDB. It would be great to move it to a separate output. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply related [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-13 22:42 ` Marius Bakke @ 2019-07-14 2:35 ` Mark H Weaver 2019-07-14 11:10 ` Arne Babenhauserheide 2019-07-14 4:42 ` Mark H Weaver 1 sibling, 1 reply; 21+ messages in thread From: Mark H Weaver @ 2019-07-14 2:35 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 Hi Marius, > Could it be that you don't have enough disk space for this test? Do you > have the log file available still? Yes, I have not only the log file, but also the failed build directory. My log file contains the same error in the 'tokudb_bugs.mdev4533' test: mysqltest: At line 6: query 'CREATE TABLE t1 (a INT(11), b CHAR(8)) ENGINE=TokuDB' failed: 1005: Can't create table `test`.`t1` (errno: 28 "No space left on device") After the build attempt, the failed build directory is ~3.4 GB, and I still have ~7.4 GB. That seems to imply that I had over 10 GB free before starting the build, which sounds about right. I don't have a separate /tmp partition. I will make another build attempt, and this time I will watch the disk utilization over time while the test suite is in progress. I should mention that I'm using Btrfs. Thanks, Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-14 2:35 ` Mark H Weaver @ 2019-07-14 11:10 ` Arne Babenhauserheide 2019-07-14 15:58 ` Gábor Boskovits 0 siblings, 1 reply; 21+ messages in thread From: Arne Babenhauserheide @ 2019-07-14 11:10 UTC (permalink / raw) To: 35521; +Cc: platoxia [-- Attachment #1: Type: text/plain, Size: 879 bytes --] Hi Mark, Mark H Weaver <mhw@netris.org> writes: > My log file contains the same error in the 'tokudb_bugs.mdev4533' test: > > mysqltest: At line 6: query 'CREATE TABLE t1 (a INT(11), b CHAR(8)) ENGINE=TokuDB' failed: 1005: Can't create table `test`.`t1` (errno: 28 "No space left on device") > > After the build attempt, the failed build directory is ~3.4 GB, and I > still have ~7.4 GB. That seems to imply that I had over 10 GB free > before starting the build, which sounds about right. I don't have a > separate /tmp partition. … > I should mention that I'm using Btrfs. I use ext4, but I saw no space left on device errors when running guix lint. Since I had 700GiB free, that does not sound like real missing disk space, but rather that something else is wrong. Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1076 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-14 11:10 ` Arne Babenhauserheide @ 2019-07-14 15:58 ` Gábor Boskovits 0 siblings, 0 replies; 21+ messages in thread From: Gábor Boskovits @ 2019-07-14 15:58 UTC (permalink / raw) To: Arne Babenhauserheide; +Cc: platoxia, 35521 [-- Attachment #1: Type: text/plain, Size: 1222 bytes --] Hello, Arne Babenhauserheide <arne_bab@web.de> ezt írta (időpont: 2019. júl. 14., Vas 13:11): > Hi Mark, > > Mark H Weaver <mhw@netris.org> writes: > > My log file contains the same error in the 'tokudb_bugs.mdev4533' test: > > > > mysqltest: At line 6: query 'CREATE TABLE t1 (a INT(11), b CHAR(8)) > ENGINE=TokuDB' failed: 1005: Can't create table `test`.`t1` (errno: 28 "No > space left on device") > Could you test using df -i if the file system is not running out of inodes? That is another reason when the no space left on device error is reported. > > > After the build attempt, the failed build directory is ~3.4 GB, and I > > still have ~7.4 GB. That seems to imply that I had over 10 GB free > > before starting the build, which sounds about right. I don't have a > > separate /tmp partition. > … > > I should mention that I'm using Btrfs. > > I use ext4, but I saw no space left on device errors when running guix > lint. Since I had 700GiB free, that does not sound like real missing > disk space, but rather that something else is wrong. > > Best wishes, > Arne > -- > Unpolitisch sein > heißt politisch sein > ohne es zu merken > Best regards, g_bor > [-- Attachment #2: Type: text/html, Size: 2145 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-13 22:42 ` Marius Bakke 2019-07-14 2:35 ` Mark H Weaver @ 2019-07-14 4:42 ` Mark H Weaver 2019-07-14 17:17 ` Marius Bakke 1 sibling, 1 reply; 21+ messages in thread From: Mark H Weaver @ 2019-07-14 4:42 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 Hello again, > Could it be that you don't have enough disk space for this test? Do you > have the log file available still? I made another build attempt on my X200, this time logging the output of "df --si" every 10 seconds. The free space started at ~11 GB free and never went below 7 GB, but the 'tokudb_bugs.mdev4533' test failed as before: "No space left on device" while trying to create the 'test' table. Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-14 4:42 ` Mark H Weaver @ 2019-07-14 17:17 ` Marius Bakke 2019-07-14 18:34 ` Mark H Weaver 0 siblings, 1 reply; 21+ messages in thread From: Marius Bakke @ 2019-07-14 17:17 UTC (permalink / raw) To: Mark H Weaver; +Cc: Platoxia, 35521 [-- Attachment #1.1: Type: text/plain, Size: 859 bytes --] Mark H Weaver <mhw@netris.org> writes: > Hello again, > >> Could it be that you don't have enough disk space for this test? Do you >> have the log file available still? > > I made another build attempt on my X200, this time logging the output of > "df --si" every 10 seconds. The free space started at ~11 GB free and > never went below 7 GB, but the 'tokudb_bugs.mdev4533' test failed as > before: "No space left on device" while trying to create the 'test' > table. Thanks for testing. Out of curiousity I tried to enable TokuDB on my server: MariaDB [(none)]> INSTALL PLUGIN tokudb SONAME 'ha_tokudb'; ERROR 2006 (HY000): MySQL server has gone away Ouch. Unfortunately the Guix service does not seem to enable any kind of logging, so I haven't dug further. Loading other plugins seems to work though. I am currently trying this patch on Berlin: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: tokudb.diff --] [-- Type: text/x-patch, Size: 1381 bytes --] diff --git a/gnu/packages/databases.scm b/gnu/packages/databases.scm index 6bfeaad9a2..c17031bb2c 100644 --- a/gnu/packages/databases.scm +++ b/gnu/packages/databases.scm @@ -659,6 +659,10 @@ Language.") ;; For now, disable the features that that use libarchive (xtrabackup). "-DWITH_LIBARCHIVE=OFF" + ;; FIXME: Disable the TokuDB engine, because its test suite frequently + ;; fails, and loading it crashes the server: <https://bugs.gnu.org/35521>. + "-DTOKUDB_OK=OFF" + ;; Ensure the system libraries are used. "-DWITH_JEMALLOC=yes" "-DWITH_PCRE=system" @@ -706,9 +710,6 @@ Language.") ;; 2030-12-31. See <https://bugs.gnu.org/34351> for details. "main.mysqldump" - ;; XXX: Fails sporadically. - "innodb_fts.crash_recovery" - ;; FIXME: This test fails on i686: ;; -myisampack: Can't create/write to file (Errcode: 17 "File exists") ;; +myisampack: Can't create/write to file (Errcode: 17 "File exists) @@ -786,7 +787,6 @@ Language.") ("libxml2" ,libxml2) ("ncurses" ,ncurses) ("pcre" ,pcre) - ("snappy" ,snappy) ("xz" ,xz) ("zlib" ,zlib))) (propagated-inputs [-- Attachment #1.3: Type: text/plain, Size: 166 bytes --] WDYT? There has been some activity around TokuDB in later versions of MariaDB, maybe we can enable it again with 10.4. For now, I think we should just disable it. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply related [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-14 17:17 ` Marius Bakke @ 2019-07-14 18:34 ` Mark H Weaver 2019-07-14 21:15 ` Marius Bakke 0 siblings, 1 reply; 21+ messages in thread From: Mark H Weaver @ 2019-07-14 18:34 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 Hi Marius, Marius wrote: > There has been some activity around TokuDB in later versions of MariaDB, > maybe we can enable it again with 10.4. For now, I think we should just > disable it. Disabling TokuDB for now sounds like a fine option. Thanks very much for looking into it. Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-14 18:34 ` Mark H Weaver @ 2019-07-14 21:15 ` Marius Bakke 0 siblings, 0 replies; 21+ messages in thread From: Marius Bakke @ 2019-07-14 21:15 UTC (permalink / raw) To: Mark H Weaver; +Cc: Platoxia, 35521-done [-- Attachment #1: Type: text/plain, Size: 593 bytes --] Mark H Weaver <mhw@netris.org> writes: > Hi Marius, > > Marius wrote: >> There has been some activity around TokuDB in later versions of MariaDB, >> maybe we can enable it again with 10.4. For now, I think we should just >> disable it. > > Disabling TokuDB for now sounds like a fine option. > Thanks very much for looking into it. Done in bba7a77ed9ad826bcdc6d9b8a183d66a23229501. Thanks for reporting the issue. Thinking forward, to trim the mariadb package, maybe it's possible to build all plugins as separate derivations, and let the user choose a union when setting up the service. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: Mariadb test suite failures on x86_64-linux 2019-07-12 14:58 ` Marius Bakke 2019-07-13 17:29 ` Mark H Weaver @ 2019-07-13 18:38 ` Mark H Weaver 1 sibling, 0 replies; 21+ messages in thread From: Mark H Weaver @ 2019-07-13 18:38 UTC (permalink / raw) To: Marius Bakke; +Cc: Platoxia, 35521 Earlier, I wrote: >> Mark, Chris: Can you try this change with MariaDB 10.1.40 and see if it >> works for you? > > I tried it, but it made no difference on my Thinkpad X200, which still > fails the same way as before with 10.1.38: > > Failing test(s): tokudb_bugs.mdev4533 I should clarify that I tested 10.1.40 this time, and it failed in the same way that 10.1.38 failed for me before. Mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#35521: /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv - Failing test(s): tokudb_alter_table.hcad_all_add 2019-05-01 9:18 bug#35521: Mariadb test suite failures on x86_64-linux Mark H Weaver 2019-05-01 9:47 ` Mark H Weaver @ 2019-05-10 1:33 ` Platoxia 1 sibling, 0 replies; 21+ messages in thread From: Platoxia @ 2019-05-10 1:33 UTC (permalink / raw) To: 35521@debbugs.gnu.org This problem persists and is preventing sucessful completion of guix system reconfigure for pre-1.0.0 systems (at least mine which is still at kernel 4.20), not only for those using mariadb but also for anyone using any of the 544 packages that depend on it; as per the command guix graph --type=reverse-package mariadb | grep -c label). This could, potentially, be fixed by simply adding this test to the list of disabled tests in the package definition: --- snip --- (add-after 'unpack 'adjust-tests (lambda _ (let ((disabled-tests '(;; These fail because root@hostname == root@localhost in ;; the build environment, causing a user count mismatch. ;; See <https://jira.mariadb.org/browse/MDEV-7761>. "main.join_cache" "main.explain_non_select" "main.stat_tables_innodb" "roles.acl_statistics" ;; This file contains a time bomb which makes it fail after ;; 2030-12-31. See <https://bugs.gnu.org/34351> for details. "main.mysqldump" ;; XXX: Fails sporadically. "innodb_fts.crash_recovery" ;; FIXME: This test fails on i686: ;; -myisampack: Can't create/write to file (Errcode: 17 "File exists") ;; +myisampack: Can't create/write to file (Errcode: 17 "File exists) ;; When running "myisampack --join=foo/t3 foo/t1 foo/t2" ;; (all three tables must exist and be identical) ;; in a loop it produces the same error around 1/240 times. ;; montywi on #maria suggested removing the real_end check in ;; "strings/my_vsnprintf.c" on line 503, yet it still does not ;; reach the ending quote occasionally. Disable it for now. "main.myisampack" ;; FIXME: This test fails on armhf-linux: "mroonga/storage.index_read_multiple_double")) ;; This file contains a list of known-flaky tests for this ;; release. Append our own items. (unstable-tests (open-file "mysql-test/unstable-tests" "a"))) (for-each (lambda (test) (format unstable-tests "~a : ~a\n" test "Disabled in Guix")) disabled-tests) (close-port unstable-tests) --- snip --- I say "potentially" because after getting this failure I happened to notice that approximately one and a half minutes after beginning the build of /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv the kernel throws this message: "traps: cmTC_35af5[27766] trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]". I have retested this several times and confirmed that this occurs each and every time mariadb-10.1.38.drv tries to build and in approximately the same amount of time after starting the build. I say approximately because the closest I could get to a timeframe on this kernel message in relation to the mariadb build is by sending the stdout from guix system reconfigure through logger so that it gets printed with a timestamp to the kernel messages terminal (alt-F12). Specifically, the message sequence is always as follows, without deviation (other than the cmTC_#), with no related messages in between; as per the command cat /dev/vcs12: --- snip --- May 9 16:36:35 localhost root cmd: guix system reconfigure: building /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv... May 9 16:38:08 localhost vmunix: [ 9169.050496] traps: cmTC_35af5[27766] trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000] --- snip --- I really suggest trying to simply add the tokudb_alter_table.hcad_all_add test to the package definition before trying to solve the overall problem, though. Maybe we can get this in for 1.0.1? I would be willing to do this myself and report the results here but I'm baffled at how to achieve this simple task. Perhaps someone could walk me through it? ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-07-14 21:16 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-05-01 9:18 bug#35521: Mariadb test suite failures on x86_64-linux Mark H Weaver 2019-05-01 9:47 ` Mark H Weaver 2019-07-10 6:18 ` Chris Marusich 2019-07-10 17:30 ` Marius Bakke 2019-07-10 21:32 ` Mark H Weaver 2019-07-11 20:18 ` Ludovic Courtès 2019-07-12 8:02 ` Giovanni Biscuolo 2019-07-11 22:01 ` Marius Bakke 2019-07-12 8:24 ` Giovanni Biscuolo 2019-07-12 14:58 ` Marius Bakke 2019-07-13 17:29 ` Mark H Weaver 2019-07-13 22:42 ` Marius Bakke 2019-07-14 2:35 ` Mark H Weaver 2019-07-14 11:10 ` Arne Babenhauserheide 2019-07-14 15:58 ` Gábor Boskovits 2019-07-14 4:42 ` Mark H Weaver 2019-07-14 17:17 ` Marius Bakke 2019-07-14 18:34 ` Mark H Weaver 2019-07-14 21:15 ` Marius Bakke 2019-07-13 18:38 ` Mark H Weaver 2019-05-10 1:33 ` bug#35521: /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv - Failing test(s): tokudb_alter_table.hcad_all_add Platoxia
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).