* imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) @ 2023-10-18 15:01 Štěpán Němec 2023-10-18 19:06 ` Eric Wong 0 siblings, 1 reply; 8+ messages in thread From: Štěpán Němec @ 2023-10-18 15:01 UTC (permalink / raw) To: meta (Bisected with HEAD at 042662948d804d24 (bad) and good at 62d50411dcc92cd (hadn't updated/run the tests for a few weeks.)) It's a bit of a mess, though, it doesn't fail completely reliably and not always quite the same way. Here's a log of a few examples: http://smrk.net/tmp/imapd.t.failures Reverting 13a2088c74fd (readding EV_CLEAR) I got 12 passing imapd.t runs in a row (as well as a full passing `make test` run), removing EV_CLEAR again I got 3 passes, 1 fail, and in the 5th run it just hung (another time it managed to complete 10 imapd.t runs with just 2 fails and no hang). It also leaves behind funny processes like this: ooo# ps -f -U pi PID TT STAT TIME COMMAND 60002 p1 I+p 0:00.05 sh 69047 p1 R/1 20:06.56 perl: -watch quitting quitting (perl) 27951 p1 R/1 33:06.54 - perl: (perl) 5020 p1 I 0:00.09 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no 83900 p1 R/0 14:18.88 perl: -watch quitting quitting (perl) 85123 p1 R/0 27:28.68 - perl: UID:4 inbox.i1.0 imap://[::1]:21825 quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting qui 78162 p1 I 0:00.17 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no which, given the hangs, makes me wonder if it's bumping into some kind of resource limit? (These tests were run on an OpenBSD development snapshot, not a release, but given that reverting the change makes the problem disappear I hope that doesn't matter.) -- Štěpán ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) 2023-10-18 15:01 imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) Štěpán Němec @ 2023-10-18 19:06 ` Eric Wong 2023-10-18 19:18 ` Štěpán Němec 0 siblings, 1 reply; 8+ messages in thread From: Eric Wong @ 2023-10-18 19:06 UTC (permalink / raw) To: Štěpán Němec; +Cc: meta Štěpán Němec <stepnem@smrk.net> wrote: > (Bisected with HEAD at 042662948d804d24 (bad) and good at > 62d50411dcc92cd (hadn't updated/run the tests for a few weeks.)) > > It's a bit of a mess, though, it doesn't fail completely > reliably and not always quite the same way. Here's a log > of a few examples: > > http://smrk.net/tmp/imapd.t.failures > > Reverting 13a2088c74fd (readding EV_CLEAR) I got 12 passing > imapd.t runs in a row (as well as a full passing `make test` > run), removing EV_CLEAR again I got 3 passes, 1 fail, and in > the 5th run it just hung (another time it managed to > complete 10 imapd.t runs with just 2 fails and no hang). Odd, can you confirm this is with p5-IO-KQueue installed? (it's really slow w/o since it needs to sleep). I saw some similar failures the other week on NetBSD, couldn't reproduce it, and I lost power at my VM host so later forgot about it :x Never seen such failures on FreeBSD, though. > It also leaves behind funny processes like this: > > ooo# ps -f -U pi > PID TT STAT TIME COMMAND > 60002 p1 I+p 0:00.05 sh > 69047 p1 R/1 20:06.56 perl: -watch quitting quitting (perl) > 27951 p1 R/1 33:06.54 - perl: (perl) > 5020 p1 I 0:00.09 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no > 83900 p1 R/0 14:18.88 perl: -watch quitting quitting (perl) > 85123 p1 R/0 27:28.68 - perl: UID:4 inbox.i1.0 imap://[::1]:21825 quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting quitting qui > 78162 p1 I 0:00.17 `-- /usr/local/bin/git --git-dir=watchimap/all.git -c core.abbrev=no > > which, given the hangs, makes me wonder if it's bumping into > some kind of resource limit? Parallel tests would increase the likelyhood of limits being hit (make check, make check-run, prove -j$N) `make test' and 'prove -lwv' (w/o -j) are serial, and `make check-run N=1' can force serial tests while saving loading overhead. > (These tests were run on an OpenBSD development snapshot, > not a release, but given that reverting the change makes the > problem disappear I hope that doesn't matter.) I can't reproduce it on 7.3 (amd64), right now. Haven't gotten around to 7.4... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) 2023-10-18 19:06 ` Eric Wong @ 2023-10-18 19:18 ` Štěpán Němec 2023-10-18 21:23 ` Eric Wong 0 siblings, 1 reply; 8+ messages in thread From: Štěpán Němec @ 2023-10-18 19:18 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Wed, 18 Oct 2023 19:06:34 +0000 Eric Wong wrote: > Odd, can you confirm this is with p5-IO-KQueue installed? Yes, that's with p5-IO-KQueue-0.39. > Parallel tests would increase the likelyhood of limits being hit > (make check, make check-run, prove -j$N) > > `make test' and 'prove -lwv' (w/o -j) are serial, > and `make check-run N=1' can force serial tests while > saving loading overhead. Hm, thanks. I've been only using, 'make test' and prove (no -j) so far, so that's not it. -- Štěpán ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) 2023-10-18 19:18 ` Štěpán Němec @ 2023-10-18 21:23 ` Eric Wong 2023-10-19 8:43 ` Štěpán Němec 0 siblings, 1 reply; 8+ messages in thread From: Eric Wong @ 2023-10-18 21:23 UTC (permalink / raw) To: Štěpán Němec; +Cc: meta Štěpán Němec <stepnem@smrk.net> wrote: > Hm, thanks. I've been only using, 'make test' and prove > (no -j) so far, so that's not it. Alright, I've reverted it and reinstated EV_CLEAR use (commit cbb4498df289f9874fc9475b86310958826360e8). In my experience, EV_CLEAR and EPOLLET tend to exacerbate sporadic problems like these, not cause them... Though I am curious if it's a red herring or not... If you have spare cycles to test on 7.3 or 7.4, it'd be greatly appreciated (but no obligations to do so) I know there is some wonkiness in signal handling in NetReader + (Mail::IMAPClient|Net::NNTP) code that needs to be resolved. NetBSD had sporadic failures with EINTR in tests which needs to be fixed. But I also don't know why it'd even see EINTR on some tests... AFAIK none of these problems affected FreeBSD. I test and do occasional development on FreeBSD significantly more than the other BSDs, though. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) 2023-10-18 21:23 ` Eric Wong @ 2023-10-19 8:43 ` Štěpán Němec 2023-10-23 19:58 ` Eric Wong 0 siblings, 1 reply; 8+ messages in thread From: Štěpán Němec @ 2023-10-19 8:43 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Wed, 18 Oct 2023 21:23:18 +0000 Eric Wong wrote: > Though I am curious if it's a red herring or not... If you have > spare cycles to test on 7.3 or 7.4, it'd be greatly appreciated > (but no obligations to do so) I downgraded the VM to 7.3, ran tests (this time updated to 848dedde919 (lei: simplify startq/au_done wakeup notifications), just with the EV_CLEAR re-removal on top), then upgraded to 7.4, ran tests. I see the same failure pattern everywhere, so I really don't think the OpenBSD version is a factor here. (And again, if you want to have a look yourself, I'd be happy to give you access to the machine; still the same testing OC VM.) -- Štěpán ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) 2023-10-19 8:43 ` Štěpán Němec @ 2023-10-23 19:58 ` Eric Wong 2023-11-27 11:20 ` OpenBSD debugging Štěpán Němec 0 siblings, 1 reply; 8+ messages in thread From: Eric Wong @ 2023-10-23 19:58 UTC (permalink / raw) To: Štěpán Němec; +Cc: meta Štěpán Němec <stepnem@smrk.net> wrote: > On Wed, 18 Oct 2023 21:23:18 +0000 > Eric Wong wrote: > > > Though I am curious if it's a red herring or not... If you have > > spare cycles to test on 7.3 or 7.4, it'd be greatly appreciated > > (but no obligations to do so) > > I downgraded the VM to 7.3, ran tests (this time updated to > 848dedde919 (lei: simplify startq/au_done wakeup > notifications), just with the EV_CLEAR re-removal on top), > then upgraded to 7.4, ran tests. I see the same failure > pattern everywhere, so I really don't think the OpenBSD > version is a factor here. Thanks for the info. Just curious, what HW specs (ncpus, RAM) is available on that system? I wonder if that affects timing somehow... > (And again, if you want to have a look yourself, I'd be > happy to give you access to the machine; still the same > testing OC VM.) Unfortunately, my *BSD debugging knowledge is far behind my Linux; so I'm not sure how much help it'd be... Some examples of things I miss on OpenBSD: * /proc/$PID/fdinfo/$FD_OF_EPOLL on Linux is immensely helpful for knowing what and how epoll is watching target FDs. I'm not sure if there's a way to introspect kqueue like that * Linux strace decodes more struct args info than kdump * ability to control pathname of core dumps ... probably a few other things, but been sick a few days and brain still foggy :< ^ permalink raw reply [flat|nested] 8+ messages in thread
* OpenBSD debugging 2023-10-23 19:58 ` Eric Wong @ 2023-11-27 11:20 ` Štěpán Němec 2023-11-29 22:38 ` Eric Wong 0 siblings, 1 reply; 8+ messages in thread From: Štěpán Němec @ 2023-11-27 11:20 UTC (permalink / raw) To: Eric Wong; +Cc: meta I apologize for the late response. On Mon, 23 Oct 2023 19:58:18 +0000 Eric Wong wrote: > Thanks for the info. Just curious, what HW specs (ncpus, RAM) > is available on that system? I wonder if that affects timing > somehow... dmesg: https://dmesgd.nycbug.org/index.cgi?do=view&id=7357 ncpus = 2 (so it's running the MP (multiprocessor) kernel), 1GB RAM >> (And again, if you want to have a look yourself, I'd be >> happy to give you access to the machine; still the same >> testing OC VM.) > > Unfortunately, my *BSD debugging knowledge is far behind my > Linux; so I'm not sure how much help it'd be... If nothing else, you could do some tests on an otherwise idle machine with good Internet connectivity (unless the connection issues you keep mentioning are mainly on your end, that is). > Some examples of things I miss on OpenBSD: > > * /proc/$PID/fdinfo/$FD_OF_EPOLL on Linux is immensely helpful > for knowing what and how epoll is watching target FDs. I'm > not sure if there's a way to introspect kqueue like that Yeah, most likely there isn't, though I'm not quite sure what exactly "like that" entails. Care to expand a bit upon the immense usefulness mentioned, i.e., how this helps you specifically? OpenBSD fstat(1) prints the kqueue memory addresses, so I suppose a sufficiently determined individual could get arbitrary info from the running kernel based on that, although at that point there are probably better ways to get the address than running fstat... As for existing tools I'm aware of, there's ddb(4) which can dump structures etc. (it has access to kernel symbols), but it's not very convenient for casual debugging/introspection, as it stops everything until you continue from the kernel debugger, so it will mess up the clock etc. Then there's bt(5)/btrace(8), which is a bpftrace clone. It's a work in progress and nowhere near Linux feature-/coverage-wise, but when it works it's nice. AFAIK the most you can currently get from it by default is entry and return for syscalls and a couple dozen static tracepoints. It's possible to enable entry/return for all kernel functions with a custom kernel (which, depending on circumstances, isn't as bad as it sounds: compiling an OpenBSD kernel from scratch is a matter of (tens of) minutes even on a weak machine; it took about 40 minutes in the above VM, single-threaded). Unfortunately there's no support for arbitrary argument access (though it seems to be on TODO), you need to add a custom tracepoint for that (which can be easy enough, e.g. <https://flak.tedunangst.com/post/probing-my-ssds-latency>, but again requires a kernel compile). > * Linux strace decodes more struct args info than kdump strace is certainly more featureful, though in the specific case of kqueue/kevent I think kdump does show everything one would expect to see? > * ability to control pathname of core dumps Yeah, the way the OpenBSD knobs have evolved (i.e., sane behavior attainable only for processes with altered U/GID) seems pretty weird, and even though I suspect at least some developers would be able to entertain the thought that the situation isn't optimal (despite the way the recent misc@ thread you participated in turned out), I don't see a good way to improve it without some redesign, i.e. breaking backwards compatibility (adding further knobs on top of kern.nosuidcoredump would make matters even messier IMO). That said, if it's critical for your use case, I think we could patch it locally in the VM easily enough, now that I've set it up for kernel compilation anyway. Same for any custom bt tracepoints or other adjustments I'd be able to help with. In summary, if you ever feel the VM could be of use, just let me know; if you consider your resources better spent elsewhere I certainly understand. -- Štěpán ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: OpenBSD debugging 2023-11-27 11:20 ` OpenBSD debugging Štěpán Němec @ 2023-11-29 22:38 ` Eric Wong 0 siblings, 0 replies; 8+ messages in thread From: Eric Wong @ 2023-11-29 22:38 UTC (permalink / raw) To: Štěpán Němec; +Cc: meta Štěpán Němec <stepnem@smrk.net> wrote: > > I apologize for the late response. No worries, I still have mails in other places from months ago I've been meaning to get to :x > On Mon, 23 Oct 2023 19:58:18 +0000 > Eric Wong wrote: > > > Thanks for the info. Just curious, what HW specs (ncpus, RAM) > > is available on that system? I wonder if that affects timing > > somehow... > > dmesg: > https://dmesgd.nycbug.org/index.cgi?do=view&id=7357 > ncpus = 2 (so it's running the MP (multiprocessor) kernel), 1GB RAM Alright, will keep that in mind. OpenBSD doesn't seem to benefit from having many cores and I stress out about the test suite taking ~30s on my fastest HW. > > Unfortunately, my *BSD debugging knowledge is far behind my > > Linux; so I'm not sure how much help it'd be... > > If nothing else, you could do some tests on an otherwise > idle machine with good Internet connectivity (unless the > connection issues you keep mentioning are mainly on your > end, that is). Yeah, it's mainly on my end, but seems improved in the past 2 weeks or so. > > Some examples of things I miss on OpenBSD: > > > > * /proc/$PID/fdinfo/$FD_OF_EPOLL on Linux is immensely helpful > > for knowing what and how epoll is watching target FDs. I'm > > not sure if there's a way to introspect kqueue like that > > Yeah, most likely there isn't, though I'm not quite sure > what exactly "like that" entails. Care to expand a bit upon > the immense usefulness mentioned, i.e., how this helps you > specifically? Knowing which EVFILT_* and EV_* flags are in use for a given target FD would be useful (analogous to the single events: field printed in /proc/$pid/fdinfo/$epfd that corresponds to struct epoll_event.events) > OpenBSD fstat(1) prints the kqueue memory addresses, so I > suppose a sufficiently determined individual could get > arbitrary info from the running kernel based on that, > although at that point there are probably better ways to get > the address than running fstat... I'll have to remember that next time I need to and RTFM for it. I didn't know about the fstat(1) command until a few weeks ago (horrible naming conflict with the fstat(2) syscall didn't help with discovery) <snip> I'll keep the rest in mind next time I need it. > > * Linux strace decodes more struct args info than kdump > > strace is certainly more featureful, though in the specific > case of kqueue/kevent I think kdump does show everything one > would expect to see? Ah, I think I was going off my FreeBSD experience, there; OpenBSD does seem to decode sendmsg/recvmsg args well. FreeBSD doesn't tell me which FDs are being sent/received via SCM_RIGHTS, maybe that's improved in FreeBSD 14... But yeah, still lots of work to do elsewhere; but OpenBSD seems like an important driver in keeping Perl5 stable and widely-installed. *BSDs in general have been great at finding bugs that might eventually impact my GNU/Linux systems. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-11-29 22:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-18 15:01 imapd.t failing on OpenBSD, bisects to 13a2088c74fd (kqnotify: drop EV_CLEAR (edge triggering)) Štěpán Němec 2023-10-18 19:06 ` Eric Wong 2023-10-18 19:18 ` Štěpán Němec 2023-10-18 21:23 ` Eric Wong 2023-10-19 8:43 ` Štěpán Němec 2023-10-23 19:58 ` Eric Wong 2023-11-27 11:20 ` OpenBSD debugging Štěpán Němec 2023-11-29 22:38 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).