From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.7 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, T_SCC_BODY_TEXT_LINE,URI_NOVOWEL shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6ED771F670; Thu, 17 Feb 2022 21:02:33 +0000 (UTC) Date: Thu, 17 Feb 2022 21:02:33 +0000 From: Eric Wong To: Julien Moutinho Cc: Dominique Martinet , meta@public-inbox.org Subject: [PATCH] t/lei-sigpipe: attempt to improve diagnostics for stuck test Message-ID: <20220217210233.GA31923@dcvr> References: <20211208010730.f47xxgzj53nwgvja@sourcephile.fr> <20211208040836.GA27368@dcvr> <20211209013743.okzgim7bbrpahks7@sourcephile.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20211209013743.okzgim7bbrpahks7@sourcephile.fr> List-Id: Julien Moutinho wrote: > With Inline::C, and with Nix's sandbox > -------------------------------------- >=20 > Using (in pkgs/servers/mail/public-inbox/default.nix): > > preCheck =3D '' > > perl certs/create-certs.perl > > export HOME=3D$(mktemp -d) > > mkdir -p $HOME/.cache/public-inbox/inline-c > > ''; >=20 > $ nix -L build -f . public-inbox > > [...] > > perl5.34.0-public-inbox> t/lei-refresh-mail-sync.t .... ok > Hangs like forever after this line, while on t/lei-sigpipe.t: >=20 > $ ps auxwwf > > root 1755997 0.0 0.0 453172 2800 ? Ssl d=C3=A9c.04 0:0= 3 nix-daemon --daemon > > root 1811429 0.0 0.0 453652 1928 ? Ssl d=C3=A9c.04 0:0= 0 \_ nix-daemon 1757028 > > root 1811455 0.0 0.0 453652 2004 ? Ssl d=C3=A9c.04 0:0= 0 \_ nix-daemon 1757031 > > root 1839837 0.0 0.0 453652 1832 ? Ssl d=C3=A9c.04 0:0= 0 \_ nix-daemon 1757027 > > root 1839841 0.0 0.0 453652 1996 ? Ssl d=C3=A9c.04 0:0= 0 \_ nix-daemon 1757030 > > root 1839856 0.0 0.0 453652 1832 ? Ssl d=C3=A9c.04 0:0= 0 \_ nix-daemon 1757029 > > root 3135741 0.0 0.2 603352 17080 ? Ssl 01:05 0:00 \_ = nix-daemon 3135672 > > nixbld1 3137110 0.0 0.0 5200 4072 ? Ss 01:05 0:01 = \_ bash -e /nix/store/qf3mzpvsmkrw963xchbivcci06078n13-builder.sh > > nixbld1 3140619 0.0 0.0 4996 4012 ? S 01:05 0:00 = \_ make SHELL=3D/nix/store/l0wlqpbsvh1pgvhcdhw7qkka3d31si7k-bash-5.1-p= 8/bin/bash VERBOSE=3Dy test > > nixbld1 3140633 0.0 0.1 17188 13352 ? S 01:05 0:01 = | \_ /nix/store/vslsa0l17xjcrdgm2knwj0z5hlvf73m7-perl-5.34.0/bin/per= l -MExtUtils::Command::MM -MTest::Harness -e undef *Test::Harness::Switches= ; test_harness(0, 'blib/lib', 'blib/arch') t/admin.t t/altid.t t/altid_v2.t= t/cgi.t t/cmd_ipc.t t/content_hash.t t/dir_idle.t t/edit.t t/eml_content_d= isposition.t t/eml_content_type.t t/epoll.t t/extindex-psgi.t t/extsearch.t= t/fake_inotify.t t/feed.t t/gcf2.t t/gcf2_client.t t/httpd.t t/idx_stack.t= t/imap.t t/imap_searchqp.t t/imapd-tls.t t/imapd.t t/inbox_idle.t t/index-= git-times.t t/indexlevels-mirror.t t/ipc.t t/kqnotify.t t/lei-auto-watch.t = t/lei-convert.t t/lei-daemon.t t/lei-export-kw.t t/lei-externals.t t/lei-im= port-http.t t/lei-import-imap.t t/lei-import-maildir.t t/lei-import-nntp.t = t/lei-import.t t/lei-index.t t/lei-inspect.t t/lei-lcat.t t/lei-p2q.t t/lei= -q-kw.t t/lei-q-remote-import.t t/lei-q-save.t t/lei-q-thread.t t/lei-refre= sh-mail-sync.t t/lei-sigpipe.t t/lei-tag.t t/lei-up.t t/lei-watch.t t/lei.t= t/lei_dedupe.t t/lei_external.t t/lei_lcat.t t/lei_mail_sync.t t/lei_overv= iew.t t/lei_saved_search.t t/lei_store.t t/lei_to_mail.t t/lei_xsearch.t t/= mbox_lock.t t/mbox_reader.t t/mdir_reader.t t/mime.t t/miscsearch.t t/net_r= eader-imap.t t/nntpd-tls.t t/nntpd.t t/nodatacow.t t/on_destroy.t t/plack.t= t/psgi_attach.t t/psgi_bad_mids.t t/psgi_mount.t t/psgi_multipart_not.t t/= psgi_scan_all.t t/psgi_search.t t/psgi_v2.t t/rename_noreplace.t t/reply.t = t/search-thr-index.t t/shared_kv.t t/solver_git.t t/thread-index-gap.t t/ur= i_imap.t t/uri_nntps.t t/v2dupindex.t t/www_altid.t t/xcpdb-reshard.t t/www= _static.t t/watch_multiple_headers.t t/watch_maildir_v2.t t/watch_maildir.t= t/watch_imap.t t/watch_filter_rubylang.t t/view.t t/v2reindex.t t/v2mirror= =2Et t/v2mda.t t/v2index-late-dupe.t t/v2-add-remove-add.t t/v1reindex.t t/= v1-add-remove-add.t t/time.t t/thread-cycle.t t/spamcheck_spamc.t t/sigfd.t= t/replace.t t/reindex-time-range.t t/qspawn.t t/purge.t t/psgi_text.t t/pr= echeck.t t/over.t t/nulsubject.t t/nntpd-v2.t t/nntp.t t/multi-mid.t t/msgt= ime.t t/msgmap.t t/msg_iter.t t/mid.t t/mda_filter_rubylang.t t/mda.t t/lin= kify.t t/init.t t/indexlevels-mirror-v1.t t/inbox.t t/import.t t/imap_track= er.t t/hval.t t/httpd-unix.t t/httpd-https.t t/httpd-corner.t t/gzip_filter= =2Et t/git.t t/filter_vger.t t/filter_subjecttag.t t/filter_rubylang.t t/fi= lter_mirror.t t/filter_base.t t/emergency.t t/ds-poll.t t/ds-leak.t t/ds-kq= xs.t t/config_limiter.t t/config.t t/address.t > > nixbld1 3145719 0.0 0.4 44576 35640 ? S 01:09 0:01 = | \_ t/lei-sigpipe.t > > nixbld1 3145743 0.0 0.1 14388 10388 ? S 01:09 0:00 = | \_ /nix/store/vslsa0l17xjcrdgm2knwj0z5hlvf73m7-perl-5.34.0= /bin/perl -w -I/nix/store/vslsa0l17xjcrdgm2knwj0z5hlvf73m7-perl-5.34.0/lib/= perl5/site_perl -I/nix/store/1hdx7bxjwqrfnalalkgbwi32l45h8z7b-perl5.34.0-Ma= il-IMAPClient-3.42/lib/perl5/site_perl -I/nix/st blib/script/lei q -q -t z:= 1.. > > nixbld1 3145724 0.0 0.5 50504 42160 ? S 01:09 0:00 = \_ lei-daemon /build/pi-lei-sigpipe-7522-WoHO/lei-daemon/xdg_run/lei/5= =2Eseq.sock > > nixbld1 3145730 0.0 0.4 50092 39736 ? S 01:09 0:00 = \_ lei/store /build/pi-lei-sigpipe-7522-WoHO/lei-daemon/.local/sha= re >=20 > t/lei-sigpipe.t is on: > > 01:30:29.508334 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3D1, tv_nsec= =3D0}, 0x7fffffff5090) =3D 0 <1.000360> > > 01:30:30.509837 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3D1, tv_nsec= =3D0}, 0x7fffffff5090) =3D 0 <1.000186> > > 01:30:31.510259 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3D1, tv_nsec= =3D0}, 0x7fffffff5090) =3D 0 <1.000175> > > [...] Huh? but it's stuck in sysread() as noted below... I'm not sure where clock_nanosleep(2) calls are coming from with a 1s interval. I doubt I'd ever loop clock_nanosleep(CLOCK_REALTIME no matter how messed up my brain gets :P I do use nanosleep(2) with a 100ms interval on sendmsg ENOBUFS|ENOMEM|ETOOMANYREFS in Spawn.pm and that's a different syscall. From what I can tell, Mail::IMAPClient doesn't clock_nanosleep, nor does Perl itself... Sleeping on CLOCK_REALTIME seems wrong outside of a cron-like scheduler... > lei-daemon store is on: > > 01:31:03.834708 epoll_wait(5, OK, that's normal. > Looks like it is waiting on sysread() in > https://public-inbox.org/public-inbox.git/tree/t/lei-sigpipe.t#n35 OK, that makes more sense... > This test does succeed outside Nix's sandbox: > $ (cd public-inbox-1.7.0; export PERL_INLINE_DIRECTORY=3D$PWD/inline-c; r= m -rf $PERL_INLINE_DIRECTORY; mkdir $PERL_INLINE_DIRECTORY; prove -bvw t/le= i-sigpipe.t ) OK, > More surprisingly, it even succeeds when run manually > inside the hanging Nix sandbox: OK, > Even more strange, Dominique was able to reproduce > the hang this morning, but no longer tonight.. It's been a while, and I can't reproduce it. Maybe this patch will help with diagnosis: ------------8<--------- Subject: [PATCH] t/lei-sigpipe: attempt to improve diagnostics for stuck te= st This may help diagnose a difficult-to-reproduce test failure on NixOS. Link: https://public-inbox/meta/20211209013743.okzgim7bbrpahks7@sourcephile= =2Efr/ --- t/lei-sigpipe.t | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/t/lei-sigpipe.t b/t/lei-sigpipe.t index d9738b07..6b2772a6 100644 --- a/t/lei-sigpipe.t +++ b/t/lei-sigpipe.t @@ -27,11 +27,17 @@ EOM } =20 lei_ok(qw(import), $f) if $imported++ =3D=3D 0; - open my $errfh, '+>', "$ENV{HOME}/stderr.log" or xbail $!; + open my $errfh, '+>>', "$ENV{HOME}/stderr.log" or xbail $!; my $opt =3D { run_mode =3D> 0, 2 =3D> $errfh, 1 =3D> $w }; my $cmd =3D [qw(lei q -q -t), @$out, 'z:1..']; my $tp =3D start_script($cmd, undef, $opt); close $w; + vec(my $rvec =3D '', fileno($r), 1) =3D 1; + if (!select($rvec, undef, undef, 30)) { + seek($errfh, 0, 0) or xbail $!; + my $s =3D do { local $/; <$errfh> }; + xbail "lei q had no output after 30s, stderr=3D$s"; + } is(sysread($r, my $buf, 1), 1, 'read one byte'); close $r; # trigger SIGPIPE $tp->join;