unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering))
@ 2023-10-18 15:01 Štěpán Němec
  2023-10-18 19:06 ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Štěpán Němec @ 2023-10-18 15:01 UTC (permalink / raw)
  To: meta


(Bisected with HEAD at 042662948d804d24 (bad) and good at
62d50411dcc92cd (hadn't updated/run the tests for a few weeks.))

It's a bit of a mess, though, it doesn't fail completely
reliably and not always quite the same way.  Here's a log
of a few examples:

http://smrk.net/tmp/imapd.t.failures

Reverting 13a2088c74fd (readding EV_CLEAR) I got 12 passing
imapd.t runs in a row (as well as a full passing `make test`
run), removing EV_CLEAR again I got 3 passes, 1 fail, and in
the 5th run it just hung (another time it managed to
complete 10 imapd.t runs with just 2 fails and no hang).

It also leaves behind funny processes like this:

ooo# ps -f -U pi
  PID TT  STAT        TIME COMMAND
60002 p1  I+p      0:00.05 sh
69047 p1  R/1     20:06.56 perl: -watch quitting quitting (perl)
27951 p1  R/1     33:06.54 - perl:  (perl)
 5020 p1  I        0:00.09 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no
83900 p1  R/0     14:18.88 perl: -watch quitting quitting (perl)
85123 p1  R/0     27:28.68 - perl: UID:4 inbox.i1.0 imap://[::1]:21825 quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting qui
78162 p1  I        0:00.17 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no

which, given the hangs, makes me wonder if it's bumping into
some kind of resource limit?

(These tests were run on an OpenBSD development snapshot,
not a release, but given that reverting the change makes the
problem disappear I hope that doesn't matter.)

-- 
Štěpán

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering))
  2023-10-18 15:01 imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) Štěpán Němec
@ 2023-10-18 19:06 ` Eric Wong
  2023-10-18 19:18   ` Štěpán Němec
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-10-18 19:06 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: meta

Štěpán Němec <stepnem@smrk.net> wrote:
> (Bisected with HEAD at 042662948d804d24 (bad) and good at
> 62d50411dcc92cd (hadn't updated/run the tests for a few weeks.))
> 
> It's a bit of a mess, though, it doesn't fail completely
> reliably and not always quite the same way.  Here's a log
> of a few examples:
> 
> http://smrk.net/tmp/imapd.t.failures
> 
> Reverting 13a2088c74fd (readding EV_CLEAR) I got 12 passing
> imapd.t runs in a row (as well as a full passing `make test`
> run), removing EV_CLEAR again I got 3 passes, 1 fail, and in
> the 5th run it just hung (another time it managed to
> complete 10 imapd.t runs with just 2 fails and no hang).

Odd, can you confirm this is with p5-IO-KQueue installed?
(it's really slow w/o since it needs to sleep).

I saw some similar failures the other week on NetBSD, couldn't
reproduce it, and I lost power at my VM host so later forgot
about it :x

Never seen such failures on FreeBSD, though.

> It also leaves behind funny processes like this:
> 
> ooo# ps -f -U pi
>   PID TT  STAT        TIME COMMAND
> 60002 p1  I+p      0:00.05 sh
> 69047 p1  R/1     20:06.56 perl: -watch quitting quitting (perl)
> 27951 p1  R/1     33:06.54 - perl:  (perl)
>  5020 p1  I        0:00.09 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no
> 83900 p1  R/0     14:18.88 perl: -watch quitting quitting (perl)
> 85123 p1  R/0     27:28.68 - perl: UID:4 inbox.i1.0 imap://[::1]:21825 quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting qui
> 78162 p1  I        0:00.17 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no
> 
> which, given the hangs, makes me wonder if it's bumping into
> some kind of resource limit?

Parallel tests would increase the likelyhood of limits being hit
(make check, make check-run, prove -j$N)

`make test' and 'prove -lwv' (w/o -j) are serial,
and `make check-run N=1' can force serial tests while
saving loading overhead.

> (These tests were run on an OpenBSD development snapshot,
> not a release, but given that reverting the change makes the
> problem disappear I hope that doesn't matter.)

I can't reproduce it on 7.3 (amd64), right now.
Haven't gotten around to 7.4...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering))
  2023-10-18 19:06 ` Eric Wong
@ 2023-10-18 19:18   ` Štěpán Němec
  2023-10-18 21:23     ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Štěpán Němec @ 2023-10-18 19:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Wed, 18 Oct 2023 19:06:34 +0000
Eric Wong wrote:

> Odd, can you confirm this is with p5-IO-KQueue installed?

Yes, that's with p5-IO-KQueue-0.39.

> Parallel tests would increase the likelyhood of limits being hit
> (make check, make check-run, prove -j$N)
>
> `make test' and 'prove -lwv' (w/o -j) are serial,
> and `make check-run N=1' can force serial tests while
> saving loading overhead.

Hm, thanks.  I've been only using, 'make test' and prove
(no -j) so far, so that's not it.

-- 
Štěpán

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering))
  2023-10-18 19:18   ` Štěpán Němec
@ 2023-10-18 21:23     ` Eric Wong
  2023-10-19  8:43       ` Štěpán Němec
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-10-18 21:23 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: meta

Štěpán Němec <stepnem@smrk.net> wrote:
> Hm, thanks.  I've been only using, 'make test' and prove
> (no -j) so far, so that's not it.

Alright, I've reverted it and reinstated EV_CLEAR use
(commit cbb4498df289f9874fc9475b86310958826360e8).

In my experience, EV_CLEAR and EPOLLET tend to exacerbate
sporadic problems like these, not cause them...

Though I am curious if it's a red herring or not...  If you have
spare cycles to test on 7.3 or 7.4, it'd be greatly appreciated
(but no obligations to do so)

I know there is some wonkiness in signal handling in NetReader +
(Mail::IMAPClient|Net::NNTP) code that needs to be resolved.
NetBSD had sporadic failures with EINTR in tests which needs to
be fixed.  But I also don't know why it'd even see EINTR on
some tests...

AFAIK none of these problems affected FreeBSD.  I test and do
occasional development on FreeBSD significantly more than the
other BSDs, though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering))
  2023-10-18 21:23     ` Eric Wong
@ 2023-10-19  8:43       ` Štěpán Němec
  2023-10-23 19:58         ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Štěpán Němec @ 2023-10-19  8:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Wed, 18 Oct 2023 21:23:18 +0000
Eric Wong wrote:

> Though I am curious if it's a red herring or not...  If you have
> spare cycles to test on 7.3 or 7.4, it'd be greatly appreciated
> (but no obligations to do so)

I downgraded the VM to 7.3, ran tests (this time updated to
848dedde919 (lei: simplify startq/au_done wakeup
notifications), just with the EV_CLEAR re-removal on top),
then upgraded to 7.4, ran tests.  I see the same failure
pattern everywhere, so I really don't think the OpenBSD
version is a factor here.

(And again, if you want to have a look yourself, I'd be
happy to give you access to the machine; still the same
testing OC VM.)

-- 
Štěpán

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering))
  2023-10-19  8:43       ` Štěpán Němec
@ 2023-10-23 19:58         ` Eric Wong
  2023-11-27 11:20           ` OpenBSD debugging Štěpán Němec
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-10-23 19:58 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: meta

Štěpán Němec <stepnem@smrk.net> wrote:
> On Wed, 18 Oct 2023 21:23:18 +0000
> Eric Wong wrote:
> 
> > Though I am curious if it's a red herring or not...  If you have
> > spare cycles to test on 7.3 or 7.4, it'd be greatly appreciated
> > (but no obligations to do so)
> 
> I downgraded the VM to 7.3, ran tests (this time updated to
> 848dedde919 (lei: simplify startq/au_done wakeup
> notifications), just with the EV_CLEAR re-removal on top),
> then upgraded to 7.4, ran tests.  I see the same failure
> pattern everywhere, so I really don't think the OpenBSD
> version is a factor here.

Thanks for the info.  Just curious, what HW specs (ncpus, RAM)
is available on that system?  I wonder if that affects timing
somehow...

> (And again, if you want to have a look yourself, I'd be
> happy to give you access to the machine; still the same
> testing OC VM.)

Unfortunately, my *BSD debugging knowledge is far behind my
Linux; so I'm not sure how much help it'd be...

Some examples of things I miss on OpenBSD:

* /proc/$PID/fdinfo/$FD_OF_EPOLL on Linux is immensely helpful
  for knowing what and how epoll is watching target FDs.  I'm
  not sure if there's a way to introspect kqueue like that

* Linux strace decodes more struct args info than kdump

* ability to control pathname of core dumps

... probably a few other things, but been sick a few days and
brain still foggy :<

^ permalink raw reply	[flat|nested] 8+ messages in thread

* OpenBSD debugging
  2023-10-23 19:58         ` Eric Wong
@ 2023-11-27 11:20           ` Štěpán Němec
  2023-11-29 22:38             ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Štěpán Němec @ 2023-11-27 11:20 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


I apologize for the late response.

On Mon, 23 Oct 2023 19:58:18 +0000
Eric Wong wrote:

> Thanks for the info.  Just curious, what HW specs (ncpus, RAM)
> is available on that system?  I wonder if that affects timing
> somehow...

dmesg:
https://dmesgd.nycbug.org/index.cgi?do=view&id=7357
ncpus = 2 (so it's running the MP (multiprocessor) kernel), 1GB RAM

>> (And again, if you want to have a look yourself, I'd be
>> happy to give you access to the machine; still the same
>> testing OC VM.)
>
> Unfortunately, my *BSD debugging knowledge is far behind my
> Linux; so I'm not sure how much help it'd be...

If nothing else, you could do some tests on an otherwise
idle machine with good Internet connectivity (unless the
connection issues you keep mentioning are mainly on your
end, that is).

> Some examples of things I miss on OpenBSD:
>
> * /proc/$PID/fdinfo/$FD_OF_EPOLL on Linux is immensely helpful
>   for knowing what and how epoll is watching target FDs.  I'm
>   not sure if there's a way to introspect kqueue like that

Yeah, most likely there isn't, though I'm not quite sure
what exactly "like that" entails.  Care to expand a bit upon
the immense usefulness mentioned, i.e., how this helps you
specifically?

OpenBSD fstat(1) prints the kqueue memory addresses, so I
suppose a sufficiently determined individual could get
arbitrary info from the running kernel based on that,
although at that point there are probably better ways to get
the address than running fstat...

As for existing tools I'm aware of, there's ddb(4) which can
dump structures etc. (it has access to kernel symbols), but
it's not very convenient for casual debugging/introspection,
as it stops everything until you continue from the kernel
debugger, so it will mess up the clock etc.

Then there's bt(5)/btrace(8), which is a bpftrace clone.
It's a work in progress and nowhere near Linux
feature-/coverage-wise, but when it works it's nice.

AFAIK the most you can currently get from it by default is
entry and return for syscalls and a couple dozen static
tracepoints.  It's possible to enable entry/return for all
kernel functions with a custom kernel (which, depending on
circumstances, isn't as bad as it sounds: compiling an
OpenBSD kernel from scratch is a matter of (tens of) minutes
even on a weak machine; it took about 40 minutes in the
above VM, single-threaded).

Unfortunately there's no support for arbitrary argument
access (though it seems to be on TODO), you need to add a
custom tracepoint for that (which can be easy enough,
e.g. <https://flak.tedunangst.com/post/probing-my-ssds-latency>,
but again requires a kernel compile).

> * Linux strace decodes more struct args info than kdump

strace is certainly more featureful, though in the specific
case of kqueue/kevent I think kdump does show everything one
would expect to see?

> * ability to control pathname of core dumps

Yeah, the way the OpenBSD knobs have evolved (i.e., sane
behavior attainable only for processes with altered U/GID)
seems pretty weird, and even though I suspect at least some
developers would be able to entertain the thought that the
situation isn't optimal (despite the way the recent misc@
thread you participated in turned out), I don't see a good
way to improve it without some redesign, i.e. breaking
backwards compatibility (adding further knobs on top of
kern.nosuidcoredump would make matters even messier IMO).

That said, if it's critical for your use case, I think we could
patch it locally in the VM easily enough, now that I've set it
up for kernel compilation anyway.

Same for any custom bt tracepoints or other adjustments I'd
be able to help with.

In summary, if you ever feel the VM could be of use, just
let me know; if you consider your resources better spent
elsewhere I certainly understand.

-- 
Štěpán

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OpenBSD debugging
  2023-11-27 11:20           ` OpenBSD debugging Štěpán Němec
@ 2023-11-29 22:38             ` Eric Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-11-29 22:38 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: meta

Štěpán Němec <stepnem@smrk.net> wrote:
> 
> I apologize for the late response.

No worries, I still have mails in other places from months
ago I've been meaning to get to :x

> On Mon, 23 Oct 2023 19:58:18 +0000
> Eric Wong wrote:
> 
> > Thanks for the info.  Just curious, what HW specs (ncpus, RAM)
> > is available on that system?  I wonder if that affects timing
> > somehow...
> 
> dmesg:
> https://dmesgd.nycbug.org/index.cgi?do=view&id=7357
> ncpus = 2 (so it's running the MP (multiprocessor) kernel), 1GB RAM

Alright, will keep that in mind.  OpenBSD doesn't seem to
benefit from having many cores and I stress out about the
test suite taking ~30s on my fastest HW.

> > Unfortunately, my *BSD debugging knowledge is far behind my
> > Linux; so I'm not sure how much help it'd be...
> 
> If nothing else, you could do some tests on an otherwise
> idle machine with good Internet connectivity (unless the
> connection issues you keep mentioning are mainly on your
> end, that is).

Yeah, it's mainly on my end, but seems improved in the past
2 weeks or so.

> > Some examples of things I miss on OpenBSD:
> >
> > * /proc/$PID/fdinfo/$FD_OF_EPOLL on Linux is immensely helpful
> >   for knowing what and how epoll is watching target FDs.  I'm
> >   not sure if there's a way to introspect kqueue like that
> 
> Yeah, most likely there isn't, though I'm not quite sure
> what exactly "like that" entails.  Care to expand a bit upon
> the immense usefulness mentioned, i.e., how this helps you
> specifically?

Knowing which EVFILT_*  and EV_* flags are in use for a
given target FD would be useful (analogous to the single
events: field printed in /proc/$pid/fdinfo/$epfd that
corresponds to struct epoll_event.events)

> OpenBSD fstat(1) prints the kqueue memory addresses, so I
> suppose a sufficiently determined individual could get
> arbitrary info from the running kernel based on that,
> although at that point there are probably better ways to get
> the address than running fstat...

I'll have to remember that next time I need to and RTFM for it.
I didn't know about the fstat(1) command until a few weeks ago
(horrible naming conflict with the fstat(2) syscall didn't
help with discovery)

<snip>  I'll keep the rest in mind next time I need it.

> > * Linux strace decodes more struct args info than kdump
> 
> strace is certainly more featureful, though in the specific
> case of kqueue/kevent I think kdump does show everything one
> would expect to see?

Ah, I think I was going off my FreeBSD experience, there;
OpenBSD does seem to decode sendmsg/recvmsg args well.
FreeBSD doesn't tell me which FDs are being sent/received
via SCM_RIGHTS, maybe that's improved in FreeBSD 14...

But yeah, still lots of work to do elsewhere; but OpenBSD seems
like an important driver in keeping Perl5 stable and
widely-installed.  *BSDs in general have been great at finding
bugs that might eventually impact my GNU/Linux systems.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-29 22:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-18 15:01 imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) Štěpán Němec
2023-10-18 19:06 ` Eric Wong
2023-10-18 19:18   ` Štěpán Němec
2023-10-18 21:23     ` Eric Wong
2023-10-19  8:43       ` Štěpán Němec
2023-10-23 19:58         ` Eric Wong
2023-11-27 11:20           ` OpenBSD debugging Štěpán Němec
2023-11-29 22:38             ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).