unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* T460: new sporadic failures with emacs 29
@ 2023-08-24 11:04 Michael J Gruber
  2023-08-24 13:15 ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2023-08-24 11:04 UTC (permalink / raw)
  To: notmuch

Hi there,

I'm sorry to report that I'm getting new sporadic (some archs
sometimes) failures on Fedora 39+ only, i.e. with emacs 29, on COPR.

```
T460-emacs-tree: Testing emacs tree view interface
 PASS   Basic notmuch-tree view in emacs
...
 PASS   Tree view of a single thread (from show)
 FAIL   Message window of tree view
--- T460-emacs-tree.14.notmuch-tree-show-window 2023-08-24
10:41:07.938464748 +0000
+++ T460-emacs-tree.14.OUTPUT 2023-08-24 10:41:07.938464748 +0000
@@ -1,41 +0,0 @@
-Lars Kellogg-Stedman <lars@seas.harvard.edu> (2009-11-17) (inbox signed)
...
-notmuch@notmuchmail.org
-http://notmuchmail.org/mailman/listinfo/notmuch
*ERROR*: Opening output file: Permission denied, /usr/bin/OUTPUT
 PASS   Stash id
```

That "/usr/bin/OUTPUT" looks strange and smells like a mis-expanded
variable. Why sporadically, though? The emacs test wait 0.1 before
writing - I dunno why, but those waits are fragile and make me nervous
about even keeping the tests for release builds.

I guess due to its load, COPR is prone to exposing timing issues.

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-24 11:04 T460: new sporadic failures with emacs 29 Michael J Gruber
@ 2023-08-24 13:15 ` David Bremner
  2023-08-24 14:01   ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2023-08-24 13:15 UTC (permalink / raw)
  To: Michael J Gruber, notmuch

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:

> -notmuch@notmuchmail.org
> -http://notmuchmail.org/mailman/listinfo/notmuch
> *ERROR*: Opening output file: Permission denied, /usr/bin/OUTPUT
>  PASS   Stash id
> ```
>
> That "/usr/bin/OUTPUT" looks strange and smells like a mis-expanded
> variable.

Yes, that's pretty weird. The only writes to "OUTPUT" are relative to
emacs default-directory. Not sure how that could be set to /usr/bin;
possible some weird script involved with starting emacs? 

> Why sporadically, though? The emacs test wait 0.1 before writing - I
> dunno why, but those waits are fragile and make me nervous about even
> keeping the tests for release builds.

One thing that might help is to make the wait some global variable
amount of time, and
various CI/build scenarios could set it to some generous length.


>
> I guess due to its load, COPR is prone to exposing timing issues.
>

there are some very slow architectures (e.g. mipsel) on the debian
buildds, so I'm a bit surprised we don't see similar issues there.
Maybe you are just doing more builds (which is great, obviously).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-24 13:15 ` David Bremner
@ 2023-08-24 14:01   ` David Bremner
  2023-08-24 14:09     ` Michael J Gruber
  2023-08-24 15:10     ` David Bremner
  0 siblings, 2 replies; 15+ messages in thread
From: David Bremner @ 2023-08-24 14:01 UTC (permalink / raw)
  To: Michael J Gruber, notmuch

David Bremner <david@tethera.net> writes:

> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
>
>> -notmuch@notmuchmail.org
>> -http://notmuchmail.org/mailman/listinfo/notmuch
>> *ERROR*: Opening output file: Permission denied, /usr/bin/OUTPUT
>>  PASS   Stash id
>> ```
>>
>> That "/usr/bin/OUTPUT" looks strange and smells like a mis-expanded
>> variable.
>
> Yes, that's pretty weird. The only writes to "OUTPUT" are relative to
> emacs default-directory. Not sure how that could be set to /usr/bin;
> possible some weird script involved with starting emacs? 

I just saw this when running in debian's "sbuild" isolated build
environment. So my current guess is that this has to do with HOME
pointing somewhere nonexistent. Is that also the case in COPR?

d

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-24 14:01   ` David Bremner
@ 2023-08-24 14:09     ` Michael J Gruber
  2023-08-24 15:10     ` David Bremner
  1 sibling, 0 replies; 15+ messages in thread
From: Michael J Gruber @ 2023-08-24 14:09 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Am Do., 24. Aug. 2023 um 16:01 Uhr schrieb David Bremner <david@tethera.net>:
>
> David Bremner <david@tethera.net> writes:
>
> > Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
> >
> >> -notmuch@notmuchmail.org
> >> -http://notmuchmail.org/mailman/listinfo/notmuch
> >> *ERROR*: Opening output file: Permission denied, /usr/bin/OUTPUT
> >>  PASS   Stash id
> >> ```
> >>
> >> That "/usr/bin/OUTPUT" looks strange and smells like a mis-expanded
> >> variable.
> >
> > Yes, that's pretty weird. The only writes to "OUTPUT" are relative to
> > emacs default-directory. Not sure how that could be set to /usr/bin;
> > possible some weird script involved with starting emacs?
>
> I just saw this when running in debian's "sbuild" isolated build
> environment. So my current guess is that this has to do with HOME
> pointing somewhere nonexistent. Is that also the case in COPR?
>

I encountered this on koji (the main Fedora infra), too, and am trying
with an increased wait (1 rather than 0.1) right now. Dunno by how
much this increases test suite run times.

HOME could be an issue only if some builder VMs are set-up
differently, I guess? They shouldn't be (which does not necessarily
mean they aren't).

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-24 14:01   ` David Bremner
  2023-08-24 14:09     ` Michael J Gruber
@ 2023-08-24 15:10     ` David Bremner
  2023-08-24 16:10       ` Michael J Gruber
  1 sibling, 1 reply; 15+ messages in thread
From: David Bremner @ 2023-08-24 15:10 UTC (permalink / raw)
  To: Michael J Gruber, notmuch

David Bremner <david@tethera.net> writes:

> I just saw this when running in debian's "sbuild" isolated build
> environment. So my current guess is that this has to do with HOME
> pointing somewhere nonexistent. Is that also the case in COPR?
>
> d

I realized that we override HOME inside the tests anyway, so emacs
should think there is some writable HOME in any case. I did notice that
the tests trigger a bunch of emacs native compilation (because the
caching happens in the temporary $HOME, which gets blown away every
time). 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-24 15:10     ` David Bremner
@ 2023-08-24 16:10       ` Michael J Gruber
  2023-08-25 22:28         ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2023-08-24 16:10 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Am Do., 24. Aug. 2023 um 17:10 Uhr schrieb David Bremner <david@tethera.net>:
>
> David Bremner <david@tethera.net> writes:
>
> > I just saw this when running in debian's "sbuild" isolated build
> > environment. So my current guess is that this has to do with HOME
> > pointing somewhere nonexistent. Is that also the case in COPR?
> >
> > d
>
> I realized that we override HOME inside the tests anyway, so emacs
> should think there is some writable HOME in any case. I did notice that
> the tests trigger a bunch of emacs native compilation (because the
> caching happens in the temporary $HOME, which gets blown away every
> time).

Also, $HOME is set in all my build envs (pass or fail), and
permissions are the same. Bummer.

It took more runs to get some fails now, and archs vary, so I still
think its a time out. And no way to get it locally so far.

ENOLISP (for me) but could it be the case that notmuch-test-wait can
abort its while loop too early if the first buffer write takes longer
than the timeout, or if some other process writes (because the process
parameter is nil)? Is something different for emacs 29 in this regard?
Any clues from sbuild?

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-24 16:10       ` Michael J Gruber
@ 2023-08-25 22:28         ` David Bremner
  2023-08-26 14:20           ` Michael J Gruber
  0 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2023-08-25 22:28 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:

> It took more runs to get some fails now, and archs vary, so I still
> think its a time out. And no way to get it locally so far.

I can duplicate it locally about once every 40 runs of the complete test
suite.

> ENOLISP (for me) but could it be the case that notmuch-test-wait can
> abort its while loop too early if the first buffer write takes longer
> than the timeout, or if some other process writes (because the process
> parameter is nil)? Is something different for emacs 29 in this regard?
> Any clues from sbuild?

Can you try the attached patch? It needs more testing, but I did get 140
runs of the test suite without an error. 


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: native-comp.diff --]
[-- Type: text/x-diff, Size: 364 bytes --]

diff --git a/test/test-lib.el b/test/test-lib.el
index 236dd99e..709c3b36 100644
--- a/test/test-lib.el
+++ b/test/test-lib.el
@@ -22,6 +22,10 @@
 
 ;;; Code:
 
+(setq native-comp-jit-compilation nil)
+(setq native-comp-speed -1)
+(setq native-comp-async-jobs-number 1)
+
 (require 'cl-lib)
 
 ;; Ensure that the dynamic variables that are defined by this library

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-25 22:28         ` David Bremner
@ 2023-08-26 14:20           ` Michael J Gruber
  2023-08-26 14:41             ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2023-08-26 14:20 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Am Sa., 26. Aug. 2023 um 00:28 Uhr schrieb David Bremner <david@tethera.net>:
>
> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
>
> > It took more runs to get some fails now, and archs vary, so I still
> > think its a time out. And no way to get it locally so far.
>
> I can duplicate it locally about once every 40 runs of the complete test
> suite.
>
> > ENOLISP (for me) but could it be the case that notmuch-test-wait can
> > abort its while loop too early if the first buffer write takes longer
> > than the timeout, or if some other process writes (because the process
> > parameter is nil)? Is something different for emacs 29 in this regard?
> > Any clues from sbuild?
>
> Can you try the attached patch? It needs more testing, but I did get 140
> runs of the test suite without an error.

I tried the current 0.38rc1 on COPR, and unfortunately I get the same
T460 failure (fedora-eln-aarch64 and fedora-rawhide-x86_64 this time,
out of 35 buildroots).
Did you get your fails with emacs 29 only, or with earlier emacs?

Trying with 0.37+patches on KOJI right now.

There's also one patch I want to send out before release, hopefully in
a minute or two ;-)

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-26 14:20           ` Michael J Gruber
@ 2023-08-26 14:41             ` David Bremner
  2023-08-26 19:22               ` Michael J Gruber
  0 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2023-08-26 14:41 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: notmuch

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:

>
> I tried the current 0.38rc1 on COPR, and unfortunately I get the same
> T460 failure (fedora-eln-aarch64 and fedora-rawhide-x86_64 this time,
> out of 35 buildroots).
> Did you get your fails with emacs 29 only, or with earlier emacs?

I only tested emacs 29; it would be some different incantation to
semi-disable native compilation for emacs 28.x. Are you seeing those
same failures (where emacs attempting to write into /usr/bin) on older
emacs?

> There's also one patch I want to send out before release, hopefully in
> a minute or two ;-)

OK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-26 14:41             ` David Bremner
@ 2023-08-26 19:22               ` Michael J Gruber
  2023-08-31 13:15                 ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2023-08-26 19:22 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Am Sa., 26. Aug. 2023 um 16:41 Uhr schrieb David Bremner <david@tethera.net>:
>
> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
>
> >
> > I tried the current 0.38rc1 on COPR, and unfortunately I get the same
> > T460 failure (fedora-eln-aarch64 and fedora-rawhide-x86_64 this time,
> > out of 35 buildroots).
> > Did you get your fails with emacs 29 only, or with earlier emacs?
>
> I only tested emacs 29; it would be some different incantation to
> semi-disable native compilation for emacs 28.x. Are you seeing those
> same failures (where emacs attempting to write into /usr/bin) on older
> emacs?

No, I see them only on Fedora rawhide/ELN and Fedora 39, but not on
the current release 38 or earlier. Emacs 29/28 is one difference and
an obvious guess as the cause, but it could be dtach or whatnot.

I get the same failures with notmuch 0.37+your patch on koji now
(rawhide, f39; not f38), sporadically.

I'm confident it's only in the test suite, so I can disable that test
on Fedora for the release build. (Will have to test whether the
failures creep up somewhere else then.)

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-26 19:22               ` Michael J Gruber
@ 2023-08-31 13:15                 ` David Bremner
  2023-08-31 14:54                   ` Michael J Gruber
  0 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2023-08-31 13:15 UTC (permalink / raw)
  To: Michael J Gruber, notmuch@notmuchmail.org; +Cc: notmuch@notmuchmail.org

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:

> Am Sa., 26. Aug. 2023 um 16:41 Uhr schrieb David Bremner <david@tethera.net>:
>>
>> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
>>
>> >
>> > I tried the current 0.38rc1 on COPR, and unfortunately I get the same
>> > T460 failure (fedora-eln-aarch64 and fedora-rawhide-x86_64 this time,
>> > out of 35 buildroots).
>> > Did you get your fails with emacs 29 only, or with earlier emacs?
>>
>> I only tested emacs 29; it would be some different incantation to
>> semi-disable native compilation for emacs 28.x. Are you seeing those
>> same failures (where emacs attempting to write into /usr/bin) on older
>> emacs?
>
> No, I see them only on Fedora rawhide/ELN and Fedora 39, but not on
> the current release 38 or earlier. Emacs 29/28 is one difference and
> an obvious guess as the cause, but it could be dtach or whatnot.

Hmm. I just built 200 times in sbuild (chroot) so I guess I am no longer
able to reproduce the issue on Debian, fwiw. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-31 13:15                 ` David Bremner
@ 2023-08-31 14:54                   ` Michael J Gruber
  2023-08-31 15:17                     ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2023-08-31 14:54 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch@notmuchmail.org

Am Do., 31. Aug. 2023 um 15:16 Uhr schrieb David Bremner <bremner@unb.ca>:
>
> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
>
> > Am Sa., 26. Aug. 2023 um 16:41 Uhr schrieb David Bremner <david@tethera.net>:
> >>
> >> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
> >>
> >> >
> >> > I tried the current 0.38rc1 on COPR, and unfortunately I get the same
> >> > T460 failure (fedora-eln-aarch64 and fedora-rawhide-x86_64 this time,
> >> > out of 35 buildroots).
> >> > Did you get your fails with emacs 29 only, or with earlier emacs?
> >>
> >> I only tested emacs 29; it would be some different incantation to
> >> semi-disable native compilation for emacs 28.x. Are you seeing those
> >> same failures (where emacs attempting to write into /usr/bin) on older
> >> emacs?
> >
> > No, I see them only on Fedora rawhide/ELN and Fedora 39, but not on
> > the current release 38 or earlier. Emacs 29/28 is one difference and
> > an obvious guess as the cause, but it could be dtach or whatnot.
>
> Hmm. I just built 200 times in sbuild (chroot) so I guess I am no longer
> able to reproduce the issue on Debian, fwiw.

I still get those issues. OTOH, skipping T460.14 did not show any
adverse side effects. So I'll do that for emacs29.
I might be nice to mark some tests ignored rather than skipped so that
we notice when they do not fail sporadically any more. That is, *if*
we look at the output of a passing test suite ...

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-31 14:54                   ` Michael J Gruber
@ 2023-08-31 15:17                     ` David Bremner
  2023-09-01  7:31                       ` Michael J Gruber
  0 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2023-08-31 15:17 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: notmuch@notmuchmail.org

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:

>
> I still get those issues. OTOH, skipping T460.14 did not show any
> adverse side effects. So I'll do that for emacs29.
> I might be nice to mark some tests ignored rather than skipped so that
> we notice when they do not fail sporadically any more. That is, *if*
> we look at the output of a passing test suite ...
>

It is possible to selectively mark tests as broken, but it requires
patching the test suite, and it sets a failing exit code if those tests
start passing. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-08-31 15:17                     ` David Bremner
@ 2023-09-01  7:31                       ` Michael J Gruber
  2023-09-01 10:27                         ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Michael J Gruber @ 2023-09-01  7:31 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch@notmuchmail.org

Am Do., 31. Aug. 2023 um 17:17 Uhr schrieb David Bremner <bremner@unb.ca>:
>
> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
>
> >
> > I still get those issues. OTOH, skipping T460.14 did not show any
> > adverse side effects. So I'll do that for emacs29.
> > I might be nice to mark some tests ignored rather than skipped so that
> > we notice when they do not fail sporadically any more. That is, *if*
> > we look at the output of a passing test suite ...
> >
>
> It is possible to selectively mark tests as broken, but it requires
> patching the test suite, and it sets a failing exit code if those tests
> start passing.

Yes, that's why I wrote "ignore". Something like NOTMUCH_IGNORE_TESTS
which runs the test, outputs the diff on fail, but "succeeds" without
counting towards pass/fail, and reports the number of ignored
pass/fail separately - basically "known_broken" without the
"known/expectation".

I just don't know whether it's worth it. Other folks disable a whole
test suite when they want to get a package update going ...

Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: T460: new sporadic failures with emacs 29
  2023-09-01  7:31                       ` Michael J Gruber
@ 2023-09-01 10:27                         ` David Bremner
  0 siblings, 0 replies; 15+ messages in thread
From: David Bremner @ 2023-09-01 10:27 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: notmuch@notmuchmail.org, Tomi Ollila

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:

>
> Yes, that's why I wrote "ignore". Something like NOTMUCH_IGNORE_TESTS
> which runs the test, outputs the diff on fail, but "succeeds" without
> counting towards pass/fail, and reports the number of ignored
> pass/fail separately - basically "known_broken" without the
> "known/expectation".
>
> I just don't know whether it's worth it. Other folks disable a whole
> test suite when they want to get a package update going ...

I guess it would be mainly interesting for distro packagers, or I guess
people who wanted to run come kind of CI.

Without looking at the code, I think just ignoring the return value
would be relatively easy, while keeping track of ignored tests might be
a bit more work. Maybe Tomi has a clearer idea / finds this a fun
problem.

d

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-09-01 10:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24 11:04 T460: new sporadic failures with emacs 29 Michael J Gruber
2023-08-24 13:15 ` David Bremner
2023-08-24 14:01   ` David Bremner
2023-08-24 14:09     ` Michael J Gruber
2023-08-24 15:10     ` David Bremner
2023-08-24 16:10       ` Michael J Gruber
2023-08-25 22:28         ` David Bremner
2023-08-26 14:20           ` Michael J Gruber
2023-08-26 14:41             ` David Bremner
2023-08-26 19:22               ` Michael J Gruber
2023-08-31 13:15                 ` David Bremner
2023-08-31 14:54                   ` Michael J Gruber
2023-08-31 15:17                     ` David Bremner
2023-09-01  7:31                       ` Michael J Gruber
2023-09-01 10:27                         ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).