* Test suite timing issues? @ 2022-02-11 13:55 Michael J Gruber 2022-02-11 14:04 ` Michael J Gruber 2022-02-12 0:03 ` David Bremner 0 siblings, 2 replies; 25+ messages in thread From: Michael J Gruber @ 2022-02-11 13:55 UTC (permalink / raw) To: Notmuch mailing list Hi there, I'm trying to package notmuch for Redhat's enterprise linux and clones (EPEL, extra packages for enterprise linux). This looks mostly fine, including the tests, except for intermittent failures on epel-8-s390x. They look like the below, or in tests following those, and apparantly all have to do with "db not synced yet" or such, so that a message one subtest creates only shows up in the db for the next subtest. I've also had completely fine test runs on epel-8-s390x, but I'm starting to wonder whether I've just been lucky so far on other platforms ... Could it be possible that generate_message(), even though adding the message, still returns false and therefore add_message() does not call "notmuch new"? One might want to drop the "&&" there in test-lib-common, I dunno. This shouldn't be "intermittent". Michael T210-raw: Testing notmuch show --format=raw FAIL Attempt to show multiple raw messages --- T210-raw.1.expected 2022-02-11 13:22:28.011556841 +0000 +++ T210-raw.1.output 2022-02-11 13:22:28.011556841 +0000 @@ -1 +1,7 @@ -Error: search term did not match precisely one message (matched 2 messages). +From: Notmuch Test Suite <test_suite@notmuchmail.org> +To: Notmuch Test Suite <test_suite@notmuchmail.org> +Message-Id: <msg-001@notmuch-test-suite> +Subject: Test message #1 +Date: Fri, 05 Jan 2001 15:43:56 +0000 + +This is just a test message (#1) PASS Show a raw message FAIL Show another raw message --- T210-raw.3.expected 2022-02-11 13:22:28.031556841 +0000 +++ T210-raw.3.output 2022-02-11 13:22:28.031556841 +0000 @@ -1,7 +1 @@ -From: Notmuch Test Suite <test_suite@notmuchmail.org> -To: Notmuch Test Suite <test_suite@notmuchmail.org> -Message-Id: <msg-002@notmuch-test-suite> -Subject: Test message #2 -Date: GENERATED_DATE -This is just a test message (#2) Error: search term did not match precisely one message (matched 0 messages). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-11 13:55 Test suite timing issues? Michael J Gruber @ 2022-02-11 14:04 ` Michael J Gruber 2022-02-12 0:03 ` David Bremner 1 sibling, 0 replies; 25+ messages in thread From: Michael J Gruber @ 2022-02-11 14:04 UTC (permalink / raw) To: Notmuch mailing list Michael J Gruber venit, vidit, dixit 2022-02-11 14:55:32: > Hi there, > > I'm trying to package notmuch for Redhat's enterprise linux and clones > (EPEL, extra packages for enterprise linux). > > This looks mostly fine, including the tests, except for intermittent > failures on epel-8-s390x. They look like the below, or in tests > following those, and apparantly all have to do with "db not synced yet" > or such, so that a message one subtest creates only shows up in the db > for the next subtest. > > I've also had completely fine test runs on epel-8-s390x, but I'm > starting to wonder whether I've just been lucky so far on other > platforms ... Could it be possible that generate_message(), even though > adding the message, still returns false and therefore add_message() does > not call "notmuch new"? One might want to drop the "&&" there in > test-lib-common, I dunno. This shouldn't be "intermittent". > > Michael > > T210-raw: Testing notmuch show --format=raw > FAIL Attempt to show multiple raw messages > --- T210-raw.1.expected 2022-02-11 13:22:28.011556841 +0000 > +++ T210-raw.1.output 2022-02-11 13:22:28.011556841 +0000 > @@ -1 +1,7 @@ > -Error: search term did not match precisely one message (matched 2 messages). > +From: Notmuch Test Suite <test_suite@notmuchmail.org> > +To: Notmuch Test Suite <test_suite@notmuchmail.org> > +Message-Id: <msg-001@notmuch-test-suite> > +Subject: Test message #1 > +Date: Fri, 05 Jan 2001 15:43:56 +0000 > + > +This is just a test message (#1) > PASS Show a raw message > FAIL Show another raw message > --- T210-raw.3.expected 2022-02-11 13:22:28.031556841 +0000 > +++ T210-raw.3.output 2022-02-11 13:22:28.031556841 +0000 > @@ -1,7 +1 @@ > -From: Notmuch Test Suite <test_suite@notmuchmail.org> > -To: Notmuch Test Suite <test_suite@notmuchmail.org> > -Message-Id: <msg-002@notmuch-test-suite> > -Subject: Test message #2 > -Date: GENERATED_DATE > > -This is just a test message (#2) > Error: search term did not match precisely one message (matched 0 messages). Similar apparent timing issue on epel-8-aarch64: FAIL Adding message with long ID --- T290-long-id.2.expected 2022-02-11 13:55:17.028415326 +0000 +++ T290-long-id.2.output 2022-02-11 13:55:17.028415326 +0000 @@ -1 +1 @@ -Added 1 new message to the database. +No new mail. FAIL Referencing long ID after adding --- T290-long-id.3.expected 2022-02-11 13:55:17.078415753 +0000 +++ T290-long-id.3.output 2022-02-11 13:55:17.078415753 +0000 @@ -1 +1 @@ -Added 1 new message to the database. +Added 2 new messages to the database. Note how that message from the 1st subtest shows up only in the second. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-11 13:55 Test suite timing issues? Michael J Gruber 2022-02-11 14:04 ` Michael J Gruber @ 2022-02-12 0:03 ` David Bremner 2022-02-12 13:32 ` Michael J Gruber 1 sibling, 1 reply; 25+ messages in thread From: David Bremner @ 2022-02-12 0:03 UTC (permalink / raw) To: Michael J Gruber; +Cc: notmuch Michael J Gruber <git@grubix.eu> writes: > Hi there, > > I'm trying to package notmuch for Redhat's enterprise linux and clones > (EPEL, extra packages for enterprise linux). > > This looks mostly fine, including the tests, except for intermittent > failures on epel-8-s390x. They look like the below, or in tests > following those, and apparantly all have to do with "db not synced yet" > or such, so that a message one subtest creates only shows up in the db > for the next subtest. > > I've also had completely fine test runs on epel-8-s390x, but I'm > starting to wonder whether I've just been lucky so far on other > platforms ... Could it be possible that generate_message(), even though > adding the message, still returns false and therefore add_message() does > not call "notmuch new"? One might want to drop the "&&" there in > test-lib-common, I dunno. This shouldn't be "intermittent". It's hard to see what can go wrong (maybe perl doesn't work?), but failing to generate a message should be a fatal error. Maybe try something like diff --git a/test/test-lib-common.sh b/test/test-lib-common.sh index ebbf4cdf..076777ca 100644 --- a/test/test-lib-common.sh +++ b/test/test-lib-common.sh @@ -225,7 +225,7 @@ EOF # All of the arguments and return values supported by generate_message # are also supported here, so see that function for details. add_message () { - generate_message "$@" && + generate_message "$@" || error "failed to generate message" notmuch new > /dev/null } to see if you get more information ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 0:03 ` David Bremner @ 2022-02-12 13:32 ` Michael J Gruber 2022-02-12 14:30 ` David Bremner 2022-02-12 15:06 ` Tomi Ollila 0 siblings, 2 replies; 25+ messages in thread From: Michael J Gruber @ 2022-02-12 13:32 UTC (permalink / raw) To: David Bremner; +Cc: notmuch David Bremner venit, vidit, dixit 2022-02-12 01:03:00: > Michael J Gruber <git@grubix.eu> writes: > > > Hi there, > > > > I'm trying to package notmuch for Redhat's enterprise linux and clones > > (EPEL, extra packages for enterprise linux). > > > > This looks mostly fine, including the tests, except for intermittent > > failures on epel-8-s390x. They look like the below, or in tests > > following those, and apparantly all have to do with "db not synced yet" > > or such, so that a message one subtest creates only shows up in the db > > for the next subtest. > > > > I've also had completely fine test runs on epel-8-s390x, but I'm > > starting to wonder whether I've just been lucky so far on other > > platforms ... Could it be possible that generate_message(), even though > > adding the message, still returns false and therefore add_message() does > > not call "notmuch new"? One might want to drop the "&&" there in > > test-lib-common, I dunno. This shouldn't be "intermittent". > > It's hard to see what can go wrong (maybe perl doesn't work?), but > failing to generate a message should be a fatal error. Maybe try > something like > Well, as you can see in both reports, the pattern is as follows: The first subttest sees 1 less message than expected. The second sees 1 more message than expected. So in summary, they are generated but "notmuch new" is not run or does not pick them up. Though it's a Heisenbug on a specific arch, I'll try your suggestion and hammer our infra with it ... > diff --git a/test/test-lib-common.sh b/test/test-lib-common.sh > index ebbf4cdf..076777ca 100644 > --- a/test/test-lib-common.sh > +++ b/test/test-lib-common.sh > @@ -225,7 +225,7 @@ EOF > # All of the arguments and return values supported by generate_message > # are also supported here, so see that function for details. > add_message () { > - generate_message "$@" && > + generate_message "$@" || error "failed to generate message" > notmuch new > /dev/null > } > > > to see if you get more information ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 13:32 ` Michael J Gruber @ 2022-02-12 14:30 ` David Bremner 2022-02-12 15:06 ` Tomi Ollila 1 sibling, 0 replies; 25+ messages in thread From: David Bremner @ 2022-02-12 14:30 UTC (permalink / raw) To: Michael J Gruber; +Cc: notmuch Michael J Gruber <git@grubix.eu> writes: >> >> It's hard to see what can go wrong (maybe perl doesn't work?), but >> failing to generate a message should be a fatal error. Maybe try >> something like I was unclear. I meant "should" as in "in a perfect world", not as a prediction about the current code. So we should probably try to add some error checking either where the patch put it, or in generate_message itself. >> > > Well, as you can see in both reports, the pattern is as follows: > > The first subttest sees 1 less message than expected. > The second sees 1 more message than expected. > > So in summary, they are generated but "notmuch new" is not run or does > not pick them up. Though it's a Heisenbug on a specific arch, I'll try > your suggestion and hammer our infra with it ... > Yeah, debugging by batch submission is no fun, no matter what our grandparents tell us. I do have interactive access to s390x systems running Debian and Ubuntu, but the problems you report do not occur in Debian (probably not in Ubuntu either, or I would have heard about it, but I didn't check if e.g. Ubuntu disables the tests). There is not (supposed to be) anything asynchronous about the process of generating a message and indexing it. You could also try adding --full-scan to the relevant "notmuch new" invocation. This will disable the heuristics based on directory mtimes that notmuch uses to speed up indexing. d ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 13:32 ` Michael J Gruber 2022-02-12 14:30 ` David Bremner @ 2022-02-12 15:06 ` Tomi Ollila 2022-02-12 15:29 ` David Bremner 1 sibling, 1 reply; 25+ messages in thread From: Tomi Ollila @ 2022-02-12 15:06 UTC (permalink / raw) To: Michael J Gruber, David Bremner; +Cc: notmuch On Sat, Feb 12 2022, Michael J. Gruber wrote: > David Bremner venit, vidit, dixit 2022-02-12 01:03:00: >> Michael J Gruber <git@grubix.eu> writes: >> >> > Hi there, >> > >> > I'm trying to package notmuch for Redhat's enterprise linux and clones >> > (EPEL, extra packages for enterprise linux). >> > >> > This looks mostly fine, including the tests, except for intermittent >> > failures on epel-8-s390x. They look like the below, or in tests >> > following those, and apparantly all have to do with "db not synced yet" >> > or such, so that a message one subtest creates only shows up in the db >> > for the next subtest. >> > >> > I've also had completely fine test runs on epel-8-s390x, but I'm >> > starting to wonder whether I've just been lucky so far on other >> > platforms ... Could it be possible that generate_message(), even though >> > adding the message, still returns false and therefore add_message() does >> > not call "notmuch new"? One might want to drop the "&&" there in >> > test-lib-common, I dunno. This shouldn't be "intermittent". >> >> It's hard to see what can go wrong (maybe perl doesn't work?), but >> failing to generate a message should be a fatal error. Maybe try >> something like >> > > Well, as you can see in both reports, the pattern is as follows: > > The first subttest sees 1 less message than expected. > The second sees 1 more message than expected. Only thing that came into mind are directory timestamps... if directory (m)time is same as before notmuch will not scan it for files... ... following that if the granularity of directory timestamp were 1 second, then it could easily happen than first one new message is not seen, and next time there is one extra message to be see... Just wild guesses... Tomi > > So in summary, they are generated but "notmuch new" is not run or does > not pick them up. Though it's a Heisenbug on a specific arch, I'll try > your suggestion and hammer our infra with it ... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 15:06 ` Tomi Ollila @ 2022-02-12 15:29 ` David Bremner 2022-02-12 19:40 ` David Bremner 2022-02-12 20:45 ` Tomi Ollila 0 siblings, 2 replies; 25+ messages in thread From: David Bremner @ 2022-02-12 15:29 UTC (permalink / raw) To: Tomi Ollila, Michael J Gruber; +Cc: notmuch Tomi Ollila <tomi.ollila@iki.fi> writes: > On Sat, Feb 12 2022, Michael J. Gruber wrote: > > Only thing that came into mind are directory timestamps... if directory > (m)time is same as before notmuch will not scan it for files... > > ... following that if the granularity of directory timestamp were 1 second, > then it could easily happen than first one new message is not seen, and > next time there is one extra message to be see... What do you think about adding --full-scan to the notmuch-new invocation in add_message? It doesn't make any tests fail and is about the same speed. I need to do a few more trials, but first time through it was actually faster (!), maybe because the cache is hot ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 15:29 ` David Bremner @ 2022-02-12 19:40 ` David Bremner 2022-02-12 20:45 ` Tomi Ollila 1 sibling, 0 replies; 25+ messages in thread From: David Bremner @ 2022-02-12 19:40 UTC (permalink / raw) To: Tomi Ollila, Michael J Gruber; +Cc: notmuch David Bremner <david@tethera.net> writes: > Tomi Ollila <tomi.ollila@iki.fi> writes: > >> On Sat, Feb 12 2022, Michael J. Gruber wrote: >> >> Only thing that came into mind are directory timestamps... if directory >> (m)time is same as before notmuch will not scan it for files... >> >> ... following that if the granularity of directory timestamp were 1 second, >> then it could easily happen than first one new message is not seen, and >> next time there is one extra message to be see... > > What do you think about adding --full-scan to the notmuch-new invocation > in add_message? It doesn't make any tests fail and is about the same > speed. I need to do a few more trials, but first time through it was > actually faster (!), maybe because the cache is hot OK, I did 10 repetitions, and for the sequential version of the test suit it is about 2% slower in elapsed time. I would expect it to have more impact on slower disks (this SSD), but still not too bad of a slowdown. d ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 15:29 ` David Bremner 2022-02-12 19:40 ` David Bremner @ 2022-02-12 20:45 ` Tomi Ollila 2022-02-12 20:54 ` Michael J Gruber 2022-02-12 21:09 ` David Bremner 1 sibling, 2 replies; 25+ messages in thread From: Tomi Ollila @ 2022-02-12 20:45 UTC (permalink / raw) To: David Bremner, Michael J Gruber; +Cc: notmuch On Sat, Feb 12 2022, David Bremner wrote: > Tomi Ollila <tomi.ollila@iki.fi> writes: > >> On Sat, Feb 12 2022, Michael J. Gruber wrote: >> >> Only thing that came into mind are directory timestamps... if directory >> (m)time is same as before notmuch will not scan it for files... >> >> ... following that if the granularity of directory timestamp were 1 second, >> then it could easily happen than first one new message is not seen, and >> next time there is one extra message to be see... > > What do you think about adding --full-scan to the notmuch-new invocation > in add_message? It doesn't make any tests fail and is about the same > speed. I need to do a few more trials, but first time through it was > actually faster (!), maybe because the cache is hot Does such a change hide "buggy" functionality ? Or do we consider notmuch new buggy if it does not notice all new messages arrived every time ? Tomi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 20:45 ` Tomi Ollila @ 2022-02-12 20:54 ` Michael J Gruber 2022-02-12 21:09 ` David Bremner 1 sibling, 0 replies; 25+ messages in thread From: Michael J Gruber @ 2022-02-12 20:54 UTC (permalink / raw) To: Tomi Ollila; +Cc: notmuch [-- Attachment #1.1: Type: text/plain, Size: 1577 bytes --] Am Sa., 12. Feb. 2022 um 21:45 Uhr schrieb Tomi Ollila <tomi.ollila@iki.fi>: > On Sat, Feb 12 2022, David Bremner wrote: > > > Tomi Ollila <tomi.ollila@iki.fi> writes: > > > >> On Sat, Feb 12 2022, Michael J. Gruber wrote: > >> > >> Only thing that came into mind are directory timestamps... if directory > >> (m)time is same as before notmuch will not scan it for files... > >> > >> ... following that if the granularity of directory timestamp were 1 > second, > >> then it could easily happen than first one new message is not seen, and > >> next time there is one extra message to be see... > > > > What do you think about adding --full-scan to the notmuch-new invocation > > in add_message? It doesn't make any tests fail and is about the same > > speed. I need to do a few more trials, but first time through it was > > actually faster (!), maybe because the cache is hot > > Does such a change hide "buggy" functionality ? > > Or do we consider notmuch new buggy if it does not notice all new messages > arrived every time ? > > The timestamping sounds like a perfect explanation of what I've been seeing. Unfortunately, I can't reproduce the issue "reliably" (with a certain probability), and so if everything succeeds with --full-scan 10 times it still does not mean much. As I understand, notmuch new without --full-sync may have issues when the time resolution is too low (or operations too fast) and will pick a message on the next run, so it's not really buggy - it uses a shortcut that may be too quick but does not loose messages in the long run. Michael > > [-- Attachment #1.2: Type: text/html, Size: 2321 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 20:45 ` Tomi Ollila 2022-02-12 20:54 ` Michael J Gruber @ 2022-02-12 21:09 ` David Bremner 2022-02-12 21:53 ` Michael J Gruber 2022-02-12 22:14 ` Tomi Ollila 1 sibling, 2 replies; 25+ messages in thread From: David Bremner @ 2022-02-12 21:09 UTC (permalink / raw) To: Tomi Ollila, Michael J Gruber; +Cc: notmuch Tomi Ollila <tomi.ollila@iki.fi> writes: > > Does such a change hide "buggy" functionality ? We mostly don't use add_message, call notmuch new via NOTMUCH_NEW in T050-new.sh. So I think it would mostly not hide bugs in notmuch new. OTOH, I'm surprised the same issues with timestamps don't show up there, if that is really the problem. > Or do we consider notmuch new buggy if it does not notice all new messages > arrived every time ? That's a harder question. Maybe? But I don't know how serious a bug it is for actual users (who often get mail delivered in various concurrent ways). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 21:09 ` David Bremner @ 2022-02-12 21:53 ` Michael J Gruber 2022-02-12 22:14 ` Tomi Ollila 1 sibling, 0 replies; 25+ messages in thread From: Michael J Gruber @ 2022-02-12 21:53 UTC (permalink / raw) To: David Bremner; +Cc: notmuch [-- Attachment #1.1: Type: text/plain, Size: 652 bytes --] Am Sa., 12. Feb. 2022 um 22:10 Uhr schrieb David Bremner <david@tethera.net >: > Tomi Ollila <tomi.ollila@iki.fi> writes: > > > > > Does such a change hide "buggy" functionality ? > > We mostly don't use add_message, call notmuch new via NOTMUCH_NEW in > T050-new.sh. So I think it would mostly not hide bugs in notmuch > new. OTOH, I'm surprised the same issues with timestamps don't show up > there, if that is really the problem. > > And sure enough, I got a FAIL from NOTMUCH_NEW in T750-gzip just a few minutes ago. Trying with --full-scan there, too now. Alternatively, I'll just turn off tests again for package builds (like we had before)... [-- Attachment #1.2: Type: text/html, Size: 1129 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 21:09 ` David Bremner 2022-02-12 21:53 ` Michael J Gruber @ 2022-02-12 22:14 ` Tomi Ollila 2022-02-13 8:29 ` Tomi Ollila 1 sibling, 1 reply; 25+ messages in thread From: Tomi Ollila @ 2022-02-12 22:14 UTC (permalink / raw) To: David Bremner, Michael J Gruber; +Cc: notmuch On Sat, Feb 12 2022, David Bremner wrote: > Tomi Ollila <tomi.ollila@iki.fi> writes: > >> >> Does such a change hide "buggy" functionality ? > > We mostly don't use add_message, call notmuch new via NOTMUCH_NEW in > T050-new.sh. So I think it would mostly not hide bugs in notmuch > new. OTOH, I'm surprised the same issues with timestamps don't show up > there, if that is really the problem. > >> Or do we consider notmuch new buggy if it does not notice all new messages >> arrived every time ? > > That's a harder question. Maybe? But I don't know how serious a bug it > is for actual users (who often get mail delivered in various concurrent > ways). The bug could be that the filesystem mtime resolution is 1 sec (hmm what is the resolution in database?), and 2 emails are delivered to one directory during one second -- and at the same time notmuch new is scanning emails and found only one of those. then no more emails are delivered to that one directory for a while -- so notmuch new does not find those. (now that I think of it that could happen to me :O -- just very improbable (or not, `stat .` outputs Modify: 2022-02-12 22:47:19.550838056 +0200) -- have to check that timestamp resolution in database). Tomi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-12 22:14 ` Tomi Ollila @ 2022-02-13 8:29 ` Tomi Ollila 2022-02-14 14:01 ` David Bremner 0 siblings, 1 reply; 25+ messages in thread From: Tomi Ollila @ 2022-02-13 8:29 UTC (permalink / raw) To: David Bremner, Michael J Gruber; +Cc: notmuch On Sun, Feb 13 2022, Tomi Ollila wrote: > On Sat, Feb 12 2022, David Bremner wrote: > >> Tomi Ollila <tomi.ollila@iki.fi> writes: >> >>> >>> Does such a change hide "buggy" functionality ? >> >> We mostly don't use add_message, call notmuch new via NOTMUCH_NEW in >> T050-new.sh. So I think it would mostly not hide bugs in notmuch >> new. OTOH, I'm surprised the same issues with timestamps don't show up >> there, if that is really the problem. >> >>> Or do we consider notmuch new buggy if it does not notice all new messages >>> arrived every time ? >> >> That's a harder question. Maybe? But I don't know how serious a bug it >> is for actual users (who often get mail delivered in various concurrent >> ways). > > The bug could be that the filesystem mtime resolution is 1 sec (hmm what > is the resolution in database?), and 2 emails are delivered to one > directory during one second -- and at the same time notmuch new is scanning > emails and found only one of those. then no more emails are delivered to > that one directory for a while -- so notmuch new does not find those. > > (now that I think of it that could happen to me :O -- just very improbable > (or not, `stat .` outputs Modify: 2022-02-12 22:47:19.550838056 +0200) -- > have to check that timestamp resolution in database). Looked notmuch-new.c -- time_t (seconds since epoch) is used as timestamp comparisons (which would indicate the subsecond resolution most fs' provide is not used)... ... and if so, I wonder why some of our tests are not failing all the time for everyone...? Tomi > > Tomi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-13 8:29 ` Tomi Ollila @ 2022-02-14 14:01 ` David Bremner 2022-02-14 14:48 ` Tomi Ollila 2022-02-14 22:53 ` Tomi Ollila 0 siblings, 2 replies; 25+ messages in thread From: David Bremner @ 2022-02-14 14:01 UTC (permalink / raw) To: Tomi Ollila, Michael J Gruber; +Cc: notmuch Tomi Ollila <tomi.ollila@iki.fi> writes: > > Looked notmuch-new.c -- time_t (seconds since epoch) is used as timestamp > comparisons (which would indicate the subsecond resolution most fs' provide > is not used)... > > ... and if so, I wonder why some of our tests are not failing all the time > for everyone...? Not claiming everything is fine, but there is code there targetted at the failure mode you mentioned: /* If the directory's mtime is the same as the wall-clock time * when we stat'ed the directory, we skip updating the mtime in * the database because a message could be delivered later in this * same second. This may lead to unnecessary re-scans, but it * avoids overlooking messages. */ if (fs_mtime != stat_time) _filename_list_add (state->directory_mtimes, path)->mtime = fs_mtime; BTW, I have so far run the test suite 68 times in a row without failures on a Debian s390x host. The file system is ext4, mounted relatime. It would be interesting to know what file system is yielding the failures Michael is seeing. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-14 14:01 ` David Bremner @ 2022-02-14 14:48 ` Tomi Ollila 2022-02-14 22:53 ` Tomi Ollila 1 sibling, 0 replies; 25+ messages in thread From: Tomi Ollila @ 2022-02-14 14:48 UTC (permalink / raw) To: David Bremner, Michael J Gruber; +Cc: notmuch On Mon, Feb 14 2022, David Bremner wrote: > Tomi Ollila <tomi.ollila@iki.fi> writes: > >> >> Looked notmuch-new.c -- time_t (seconds since epoch) is used as timestamp >> comparisons (which would indicate the subsecond resolution most fs' provide >> is not used)... >> >> ... and if so, I wonder why some of our tests are not failing all the time >> for everyone...? > > Not claiming everything is fine, but there is code there targetted at > the failure mode you mentioned: > > /* If the directory's mtime is the same as the wall-clock time > * when we stat'ed the directory, we skip updating the mtime in > * the database because a message could be delivered later in this > * same second. This may lead to unnecessary re-scans, but it > * avoids overlooking messages. */ > if (fs_mtime != stat_time) > _filename_list_add (state->directory_mtimes, path)->mtime = > fs_mtime; Right. Very much explains why we don't see test failures... > BTW, I have so far run the test suite 68 times in a row without failures > on a Debian s390x host. The file system is ext4, mounted relatime. It > would be interesting to know what file system is yielding the failures > Michael is seeing. indeed. Tomi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-14 14:01 ` David Bremner 2022-02-14 14:48 ` Tomi Ollila @ 2022-02-14 22:53 ` Tomi Ollila 2022-02-15 8:51 ` Michael J Gruber 1 sibling, 1 reply; 25+ messages in thread From: Tomi Ollila @ 2022-02-14 22:53 UTC (permalink / raw) To: David Bremner, Michael J Gruber; +Cc: notmuch On Mon, Feb 14 2022, David Bremner wrote: > Tomi Ollila <tomi.ollila@iki.fi> writes: > >> >> Looked notmuch-new.c -- time_t (seconds since epoch) is used as timestamp >> comparisons (which would indicate the subsecond resolution most fs' provide >> is not used)... >> >> ... and if so, I wonder why some of our tests are not failing all the time >> for everyone...? > > Not claiming everything is fine, but there is code there targetted at > the failure mode you mentioned: > > /* If the directory's mtime is the same as the wall-clock time > * when we stat'ed the directory, we skip updating the mtime in > * the database because a message could be delivered later in this > * same second. This may lead to unnecessary re-scans, but it > * avoids overlooking messages. */ > if (fs_mtime != stat_time) > _filename_list_add (state->directory_mtimes, path)->mtime = fs_mtime; This sure had to be tested... :D so I outcommented the line as // if (fs_mtime != stat_time) and then build and run test -- a lot of failures... also ./test/T750-gzip.sh 2>&1 | grep -e PASS -e FAIL PASS Single new gzipped message PASS Single new gzipped message (full-scan) FAIL Multiple new messages, one gzipped FAIL Multiple new messages, one gzipped (full-scan) FAIL Renamed (gzipped) message PASS notmuch search with partially gzipped mail store FAIL notmuch search --output=files with partially gzipped mail store PASS show un-gzipped message PASS show un-gzipped message (format mbox) PASS show un-gzipped message (format raw) FAIL show gzipped message FAIL show gzipped message (mbox) FAIL show gzipped message (raw) PASS new doesn't run out of file descriptors with many gzipped files (above was "lucky" run, usually that 6th test, ...partially gzipped... test also FAILed (I'd guess second happened to change there)). then restored the fs_mtime != stat_time line -- then all of 750 passed. (finally, run that 750-gzip in a loop (dropped last, slow test), hundreds of times already -- no FAILures... (ecryptfs on ext4)) Tomi > BTW, I have so far run the test suite 68 times in a row without failures > on a Debian s390x host. The file system is ext4, mounted relatime. It > would be interesting to know what file system is yielding the failures > Michael is seeing. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-14 22:53 ` Tomi Ollila @ 2022-02-15 8:51 ` Michael J Gruber 2022-02-15 15:18 ` Michael J Gruber 0 siblings, 1 reply; 25+ messages in thread From: Michael J Gruber @ 2022-02-15 8:51 UTC (permalink / raw) To: David Bremner, Tomi Ollila; +Cc: notmuch Tomi Ollila venit, vidit, dixit 2022-02-14 23:53:54: > On Mon, Feb 14 2022, David Bremner wrote: > > > Tomi Ollila <tomi.ollila@iki.fi> writes: > > > >> > >> Looked notmuch-new.c -- time_t (seconds since epoch) is used as timestamp > >> comparisons (which would indicate the subsecond resolution most fs' provide > >> is not used)... > >> > >> ... and if so, I wonder why some of our tests are not failing all the time > >> for everyone...? > > > > Not claiming everything is fine, but there is code there targetted at > > the failure mode you mentioned: > > > > /* If the directory's mtime is the same as the wall-clock time > > * when we stat'ed the directory, we skip updating the mtime in > > * the database because a message could be delivered later in this > > * same second. This may lead to unnecessary re-scans, but it > > * avoids overlooking messages. */ > > if (fs_mtime != stat_time) > > _filename_list_add (state->directory_mtimes, path)->mtime = fs_mtime; > > This sure had to be tested... :D > > so I outcommented the line as // if (fs_mtime != stat_time) > > and then build and run test -- a lot of failures... > > also > > ./test/T750-gzip.sh 2>&1 | grep -e PASS -e FAIL > > PASS Single new gzipped message > PASS Single new gzipped message (full-scan) > FAIL Multiple new messages, one gzipped > FAIL Multiple new messages, one gzipped (full-scan) > FAIL Renamed (gzipped) message > PASS notmuch search with partially gzipped mail store > FAIL notmuch search --output=files with partially gzipped mail store > PASS show un-gzipped message > PASS show un-gzipped message (format mbox) > PASS show un-gzipped message (format raw) > FAIL show gzipped message > FAIL show gzipped message (mbox) > FAIL show gzipped message (raw) > PASS new doesn't run out of file descriptors with many gzipped files > > (above was "lucky" run, usually that 6th test, ...partially gzipped... > test also FAILed (I'd guess second happened to change there)). > > then restored the fs_mtime != stat_time line -- then all of 750 passed. > > (finally, run that 750-gzip in a loop (dropped last, slow test), hundreds > of times already -- no FAILures... (ecryptfs on ext4)) > > Tomi > > > BTW, I have so far run the test suite 68 times in a row without failures > > on a Debian s390x host. The file system is ext4, mounted relatime. It > > would be interesting to know what file system is yielding the failures > > Michael is seeing. Thanks for the detailed analysis. It convinces me that on notmuch's side everything is OK. This very much boils down to fs and stat issues. If I remember correctly, I've seen this isse once on a different release frpm epel-8 but only on epel-8 otherwise. This is on a build infrastructure where you may end up getting different hosts on each run, primed from the same "chroot". I'll try to find out more. Until now, we have been building notmuch release packages on Fedora without running the tests during the release build, because there used to be issues with the test suite (and before that that the separate test corpus download). With everything looking robust for a while, I'm switching tests on for Fedora release builds now, and at the same start packaging those extra packages for RedHat's enterprise platform. I want to avoid spurious release build failures, of course - otoh having those tests run is a good thing. So I'll try out a bit more, and if everything else fails, then running the test suite with --full-scan is still better than not running it at all. (I would have suggested a make variable, but it's really a corner case. given the stat/mtime check). Michael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Test suite timing issues? 2022-02-15 8:51 ` Michael J Gruber @ 2022-02-15 15:18 ` Michael J Gruber 2022-02-16 9:52 ` [PATCH] test: allow to use --full-sync Michael J Gruber 0 siblings, 1 reply; 25+ messages in thread From: Michael J Gruber @ 2022-02-15 15:18 UTC (permalink / raw) To: David Bremner, Tomi Ollila; +Cc: notmuch > > > BTW, I have so far run the test suite 68 times in a row without failures > > > on a Debian s390x host. The file system is ext4, mounted relatime. It > > > would be interesting to know what file system is yielding the failures > > > Michael is seeing. > ... > So I'll try out a bit more, and if everything else fails, then running > the test suite with --full-scan is still better than not running it at > all. (I would have suggested a make variable, but it's really a corner > case. given the stat/mtime check). Last update for now: It's still happening sporadically only, and the archs where it does happen use xfs (relatime,seclabel,attr2) resp. tmpfs (nosuid,nodev,seclabel). It has to be something weird on our infra or specifically el8. So, either I implement `make test FULLSCAN=1` or patch in `--full-scan` or disable the tests on epel. Thanks for your patience :) Cheers, Michael ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] test: allow to use --full-sync 2022-02-15 15:18 ` Michael J Gruber @ 2022-02-16 9:52 ` Michael J Gruber 2022-02-16 13:04 ` David Bremner 0 siblings, 1 reply; 25+ messages in thread From: Michael J Gruber @ 2022-02-16 9:52 UTC (permalink / raw) To: notmuch Some build infrastructure appears to habe problems with mtime/stat, leading to spurious failures (noticed on s390x and aarch64 with epel8). Allow the test suite to be run with --full-sync so that release builds can use the test suite while avoiding spurious failures. Signed-off-by: Michael J Gruber <git@grubix.eu> --- test/README | 8 ++++++++ test/test-lib-common.sh | 7 ++++++- test/test-lib.sh | 2 +- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/test/README b/test/README index 10f127cb..1de79b78 100644 --- a/test/README +++ b/test/README @@ -110,6 +110,14 @@ printed on screen. This printing can be disabled by setting the NOTMUCH_TEST_QUIET variable to a non-null value. Message on test failures and skips are still printed. +File system/mtime issues +------------------------ +Normally, `notmuch new` uses directory modification times, and the test +suite checks that this works. If you encounter spurious test failures +where messages added by one subtest appear to show up in the next subtest +only (indicating mtime/stat problems) you can set NOTMUCH_TEST_FULLSYNC +so that the test suite uses the --full-sync option of notmuch new. + Skipping Tests -------------- If, for any reason, you need to skip one or more tests, you can do so diff --git a/test/test-lib-common.sh b/test/test-lib-common.sh index ebbf4cdf..91039478 100644 --- a/test/test-lib-common.sh +++ b/test/test-lib-common.sh @@ -29,6 +29,11 @@ if [[ -z "$NOTMUCH_SRCDIR" ]] || [[ -z "$NOTMUCH_BUILDDIR" ]]; then exit 1 fi +if test -n "$NOTMUCH_TEST_FULLSCAN" +then + NOTMUCH_NEW_OPTIONS="--full-scan" +fi + backup_database () { test_name=$(basename $0 .sh) rm -rf $TMP_DIRECTORY/notmuch-dir-backup."$test_name" @@ -226,7 +231,7 @@ EOF # are also supported here, so see that function for details. add_message () { generate_message "$@" && - notmuch new > /dev/null + notmuch new $NOTMUCH_NEW_OPTIONS > /dev/null } if test -n "$valgrind" diff --git a/test/test-lib.sh b/test/test-lib.sh index 833bf5fe..0167b95f 100644 --- a/test/test-lib.sh +++ b/test/test-lib.sh @@ -521,7 +521,7 @@ test_json_nodes () { } NOTMUCH_NEW () { - notmuch new "${@}" | grep -v -E -e '^Processed [0-9]*( total)? file|Found [0-9]* total file' + notmuch new $NOTMUCH_NEW_OPTIONS "${@}" | grep -v -E -e '^Processed [0-9]*( total)? file|Found [0-9]* total file' } NOTMUCH_DUMP_TAGS () { -- 2.35.1.476.gacce7e407c ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] test: allow to use --full-sync 2022-02-16 9:52 ` [PATCH] test: allow to use --full-sync Michael J Gruber @ 2022-02-16 13:04 ` David Bremner 2022-02-16 15:53 ` Michael J Gruber 0 siblings, 1 reply; 25+ messages in thread From: David Bremner @ 2022-02-16 13:04 UTC (permalink / raw) To: Michael J Gruber, notmuch Michael J Gruber <git@grubix.eu> writes: > Some build infrastructure appears to habe problems with mtime/stat, have > leading to spurious failures (noticed on s390x and aarch64 with epel8). > Allow the test suite to be run with --full-sync so that release builds > can use the test suite while avoiding spurious failures. > > Signed-off-by: Michael J Gruber <git@grubix.eu> > --- > test/README | 8 ++++++++ > test/test-lib-common.sh | 7 ++++++- > test/test-lib.sh | 2 +- > 3 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/test/README b/test/README > index 10f127cb..1de79b78 100644 > --- a/test/README > +++ b/test/README > @@ -110,6 +110,14 @@ printed on screen. This printing can be disabled by setting the > NOTMUCH_TEST_QUIET variable to a non-null value. Message on test > failures and skips are still printed. typo in variable name. How confident are you that this actually fixes your intermittent failures? There are quite a few literal invocations of "notmuch new" in the test suite. Would those need to be changed as well? d ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] test: allow to use --full-sync 2022-02-16 13:04 ` David Bremner @ 2022-02-16 15:53 ` Michael J Gruber 2022-02-16 21:17 ` Tomi Ollila 2022-02-17 12:26 ` David Bremner 0 siblings, 2 replies; 25+ messages in thread From: Michael J Gruber @ 2022-02-16 15:53 UTC (permalink / raw) To: David Bremner, notmuch David Bremner venit, vidit, dixit 2022-02-16 14:04:17: > Michael J Gruber <git@grubix.eu> writes: > > > Some build infrastructure appears to habe problems with mtime/stat, > > have Huh, this looks as if my mother tongue slipped in, while it's just key proximity. > > leading to spurious failures (noticed on s390x and aarch64 with epel8). > > Allow the test suite to be run with --full-sync so that release builds > > can use the test suite while avoiding spurious failures. > > > > Signed-off-by: Michael J Gruber <git@grubix.eu> > > --- > > test/README | 8 ++++++++ > > test/test-lib-common.sh | 7 ++++++- > > test/test-lib.sh | 2 +- > > 3 files changed, 15 insertions(+), 2 deletions(-) > > > > diff --git a/test/README b/test/README > > index 10f127cb..1de79b78 100644 > > --- a/test/README > > +++ b/test/README > > @@ -110,6 +110,14 @@ printed on screen. This printing can be disabled by setting the > > NOTMUCH_TEST_QUIET variable to a non-null value. Message on test > > failures and skips are still printed. > > typo in variable name. Not here, but in fact in the actual patch ;) > How confident are you that this actually fixes your intermittent > failures? There are quite a few literal invocations of "notmuch new" in > the test suite. Would those need to be changed as well? About 80% confident ... The patch covers all areas in which I've ever witnessed the spurious FAILs. I have not had a single failure with the patch. Many of the other invocations are not immediately after message creation. Some of them are immediately after, and - given that notmuch new catches the case of same time - I do not understand why this happens at all. That accounts for the other 20%. I don't mind carrying this locally and retrying without for the next round of notmuch updates. Having the patch here on-list may help someone else in the future in any case. Michael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] test: allow to use --full-sync 2022-02-16 15:53 ` Michael J Gruber @ 2022-02-16 21:17 ` Tomi Ollila 2022-02-16 22:01 ` Tomi Ollila 2022-02-17 12:26 ` David Bremner 1 sibling, 1 reply; 25+ messages in thread From: Tomi Ollila @ 2022-02-16 21:17 UTC (permalink / raw) To: Michael J Gruber, David Bremner, notmuch On Wed, Feb 16 2022, Michael J. Gruber wrote: > David Bremner venit, vidit, dixit 2022-02-16 14:04:17: >> Michael J Gruber <git@grubix.eu> writes: >> >> > Some build infrastructure appears to habe problems with mtime/stat, >> >> have > > Huh, this looks as if my mother tongue slipped in, while it's just key > proximity. > >> > leading to spurious failures (noticed on s390x and aarch64 with epel8). >> > Allow the test suite to be run with --full-sync so that release builds >> > can use the test suite while avoiding spurious failures. >> > >> > Signed-off-by: Michael J Gruber <git@grubix.eu> >> > --- >> > test/README | 8 ++++++++ >> > test/test-lib-common.sh | 7 ++++++- >> > test/test-lib.sh | 2 +- >> > 3 files changed, 15 insertions(+), 2 deletions(-) >> > >> > diff --git a/test/README b/test/README >> > index 10f127cb..1de79b78 100644 >> > --- a/test/README >> > +++ b/test/README >> > @@ -110,6 +110,14 @@ printed on screen. This printing can be disabled by setting the >> > NOTMUCH_TEST_QUIET variable to a non-null value. Message on test >> > failures and skips are still printed. >> >> typo in variable name. > > Not here, but in fact in the actual patch ;) > > >> How confident are you that this actually fixes your intermittent >> failures? There are quite a few literal invocations of "notmuch new" in >> the test suite. Would those need to be changed as well? > > About 80% confident ... The patch covers all areas in which I've ever > witnessed the spurious FAILs. I have not had a single failure with the > patch. Many of the other invocations are not immediately after message > creation. > > Some of them are immediately after, and - given that notmuch new catches > the case of same time - I do not understand why this happens at all. > That accounts for the other 20%. > > I don't mind carrying this locally and retrying without for the next > round of notmuch updates. Having the patch here on-list may help > someone else in the future in any case. I'd not merge this NOTMUCH_NEW_OPTIONS=--full-scan ;D too eagerly ... I'd like we really knew the reason (which is then written in commit message) > > Michael Tomi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] test: allow to use --full-sync 2022-02-16 21:17 ` Tomi Ollila @ 2022-02-16 22:01 ` Tomi Ollila 0 siblings, 0 replies; 25+ messages in thread From: Tomi Ollila @ 2022-02-16 22:01 UTC (permalink / raw) To: Michael J Gruber, David Bremner, notmuch On Wed, Feb 16 2022, Tomi Ollila wrote: > > I'd not merge this NOTMUCH_NEW_OPTIONS=--full-scan ;D too eagerly > ... I'd like we really knew the reason (which is then written in > commit message) I seem I did not get my message above (or why the ;D is there) expressed clear enough -- the idea there was that also NOTMUCH_NEW_OPTIONS=--full-scan make test would have worked (or, NOTMUCH_NEW_OPTIONS=--verbose, to get noisy output -- and lot of failed tests ;D) (to disable such a leakage of environment variable into test run NOTMUCH_NEW_OPTIONS= should have been there before checking that variable -- OTOH just adding NOTMUCH_NEW_OPTIONS=... as way to provide any options to notmuch new (for users to experiment) could be an option (or something...)) Anyway, I am not proposing adding such...now... but better concentrate figuring out what is happening there and why... > Tomi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] test: allow to use --full-sync 2022-02-16 15:53 ` Michael J Gruber 2022-02-16 21:17 ` Tomi Ollila @ 2022-02-17 12:26 ` David Bremner 1 sibling, 0 replies; 25+ messages in thread From: David Bremner @ 2022-02-17 12:26 UTC (permalink / raw) To: Michael J Gruber, notmuch Michael J Gruber <git@grubix.eu> writes: > I don't mind carrying this locally and retrying without for the next > round of notmuch updates. Having the patch here on-list may help > someone else in the future in any case. OK, I suggest you do that for now. I'm not categorically opposed to (some version of) the change, but like Tomi I would like to understand what problem you are solving a bit better. I did try running the test suite on xfs (without bigtime set). This was amd64 so I couldn't exactly duplicate the conditions where you see failure. How much work is it to set up an environment that duplicates the failures? d ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-02-17 12:26 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-11 13:55 Test suite timing issues? Michael J Gruber 2022-02-11 14:04 ` Michael J Gruber 2022-02-12 0:03 ` David Bremner 2022-02-12 13:32 ` Michael J Gruber 2022-02-12 14:30 ` David Bremner 2022-02-12 15:06 ` Tomi Ollila 2022-02-12 15:29 ` David Bremner 2022-02-12 19:40 ` David Bremner 2022-02-12 20:45 ` Tomi Ollila 2022-02-12 20:54 ` Michael J Gruber 2022-02-12 21:09 ` David Bremner 2022-02-12 21:53 ` Michael J Gruber 2022-02-12 22:14 ` Tomi Ollila 2022-02-13 8:29 ` Tomi Ollila 2022-02-14 14:01 ` David Bremner 2022-02-14 14:48 ` Tomi Ollila 2022-02-14 22:53 ` Tomi Ollila 2022-02-15 8:51 ` Michael J Gruber 2022-02-15 15:18 ` Michael J Gruber 2022-02-16 9:52 ` [PATCH] test: allow to use --full-sync Michael J Gruber 2022-02-16 13:04 ` David Bremner 2022-02-16 15:53 ` Michael J Gruber 2022-02-16 21:17 ` Tomi Ollila 2022-02-16 22:01 ` Tomi Ollila 2022-02-17 12:26 ` David Bremner
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).