* bug#24201: 25.1.50; TLS connections sometimes hang @ 2016-08-11 13:20 Lars Ingebrigtsen 2016-08-15 2:35 ` Paul Eggert 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2016-08-11 13:20 UTC (permalink / raw) To: 24201 A couple of weeks ago, TLS connections started making Emacs hang. I've yet to find a way to reproduce it reliably, but it seems to involve opening an async TLS connection while opening a synchronous TLS connection, or something. strace in these instances shows an infinite loop like this: [pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0}) [pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0}) [pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0}) [pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0}) Does this ring a bell? If not, I'll try to debug it further. I think it started happening like two or three weeks ago. In GNU Emacs 25.1.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 3.14.5) of 2016-07-20 built on stories Repository revision: 60dd094a8c7bdbbff121c99f56f42910534e7cc1 Windowing system distributor 'The X.Org Foundation', version 11.0.11604000 System Description: Debian GNU/Linux 8.5 (jessie) Recent messages: Quit opening connection to news.gmane.org Quit [2 times] Mark set Opening nntp server on news.gmane.org...done Mark set Opening nntp server on news.gmane.org...done Quit Quit opening connection to news.gmane.org Quit Making completion list... Configured features: XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Group Minor modes in effect: gnus-agent-group-mode: t shell-dirtrack-mode: t diff-auto-refine-mode: t gnus-topic-mode: t gnus-undo-mode: t global-whitespace-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t line-number-mode: t Load-path shadows: /home/larsi/src/cddb.el/expect hides /home/larsi/lisp/expect /home/larsi/src/cddb.el/captitle hides /home/larsi/lisp/captitle /home/larsi/src/clock.el/clock hides /home/larsi/lisp/clock ~/pgnus/contrib/vcard hides /home/larsi/lisp/vcard /home/larsi/src/pvr.el/pvr hides /home/larsi/lisp/pvr ~/lisp/zenirc-2.112/src/zenirc-example hides /home/larsi/lisp/zenirc-example ~/pgnus/contrib/compface hides /home/larsi/src/emacs/trunk/lisp/image/compface Features: (shadow emacsbug perl-mode org-element org-rmail org-mhe org-irc org-info org-gnus org-docview org-bibtex bibtex org-bbdb org-w3m org org-macro org-footnote org-pcomplete org-list org-faces org-entities org-version ob-emacs-lisp ob ob-tangle ob-ref ob-lob ob-table ob-exp org-src ob-keys ob-comint ob-core ob-eval org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs ffap tabify imenu man sh-script smie executable imap cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs smerge-mode crm macros eieio-opt speedbar sb-image ezimage dframe etags texinfo conf-mode nndir nnspool dired-aux grep compile vc-bzr vc-src vc-sccs vc-svn vc-rcs vc-dir ewoc bug-reference noutline outline edebug log-edit previews json shell pcomplete comint canlock flow-fill vc-annotate log-view pcvs-util vc vc-dispatcher eww pp vc-git diff-mode map pulse find-func thingatpt xref project ring misearch multi-isearch mailalias smtpmail sendmail ecomplete copyright vc-cvs shr-color color gnus-html help-fns radix-tree sort gnus-cite smiley ansi-color url-queue url-cache mm-archive gnus-async gnus-dup qp gnus-ml gmane spam-gmane dns mm-url disp-table gnus-fun gnus-mdrtn pop3 nndoc nnmbox utf-7 gnus-topic nnml nnfolder network-stream starttls nnir spam-report spam spam-stat gnus-uu yenc gnus-delay gnus-draft gnus-agent gnus-srvr gnus-score score-mode nnvirtual nntp gnus-cache gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime dig gnus-sum nndraft nnmh gnus-group gnus-undo gnus-start gnus-cloud nnimap utf7 netrc nnoo parse-time gnus-spec gnus-win nnmail gnus-int gnus-range mail-source message format-spec rfc822 mml mml-sec epa epg mailabbrev gmm-utils mailheader gnus nnheader gnus-util rmail rmail-loaddefs mail-utils whitespace movie mkv shr svg imdb dom pvr debug debbugs-gnu easy-mmode derived debbugs soap-client mm-decode mm-bodies mm-encode url-http tls gnutls url-auth mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr url-gw nsm puny url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util mailcap warnings rng-xsd rng-dt rng-util xsd-regexp xml ido flyspell ispell benchmark w3m browse-url doc-view subr-x dired dired-loaddefs image-mode timezone w3m-hist w3m-fb w3m-ems wid-edit w3m-ccl ccl w3m-favicon w3m-image w3m-proc w3m-util add-log mail-extr jka-compr cl finder-inf package epg-config url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache url-vars seq byte-opt gv bytecomp byte-compile cl-extra help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote dbusbind inotify dynamic-setting system-font-setting font-render-setting move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 3259686 974810) (symbols 48 216414 408) (miscs 40 3192 5495) (strings 32 609506 170950) (string-bytes 1 57242968) (vectors 16 95117) (vector-slots 8 2432568 99890) (floats 8 10530 14727) (intervals 56 472696 10642) (buffers 976 311) (heap 1024 591462 804639)) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2016-08-11 13:20 bug#24201: 25.1.50; TLS connections sometimes hang Lars Ingebrigtsen @ 2016-08-15 2:35 ` Paul Eggert 2016-10-10 10:36 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Paul Eggert @ 2016-08-15 2:35 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201 > A couple of weeks ago, TLS connections started making Emacs hang. I installed several changes on August 3. They were in the area of nonblocking GnuTLS connections (Bug#23982) and perhaps could explain your problem. See the following commits: 967e2ef61dab8d2046f3285eefa71f3dcb9d9b60 f0b5d4bd3e17bc0a464b9ec69507413824568f9e 744cc9fa17d20450fc3cf1ca6bb71869abeeb77b 1a8d31123698ccf6f165e49fcfe16631d07a7aea 64edd14ca9cf6d1b957316a9a547e482c9ba811b On August 2 I also installed a fix for a use-after-close problem that could be relevant. See: 53917616b0aca43b9ca85531d4dcb426885bdb73 a066fb1ceee373c982214c28206108c5fba01bf7 ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2016-08-15 2:35 ` Paul Eggert @ 2016-10-10 10:36 ` Eli Zaretskii 2016-10-10 10:54 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2016-10-10 10:36 UTC (permalink / raw) To: Paul Eggert; +Cc: larsi, 24201 > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sun, 14 Aug 2016 19:35:45 -0700 > Cc: 24201@debbugs.gnu.org > > > A couple of weeks ago, TLS connections started making Emacs hang. > > I installed several changes on August 3. They were in the area of nonblocking > GnuTLS connections (Bug#23982) and perhaps could explain your problem. See the > following commits: > > 967e2ef61dab8d2046f3285eefa71f3dcb9d9b60 > f0b5d4bd3e17bc0a464b9ec69507413824568f9e > 744cc9fa17d20450fc3cf1ca6bb71869abeeb77b > 1a8d31123698ccf6f165e49fcfe16631d07a7aea > 64edd14ca9cf6d1b957316a9a547e482c9ba811b > > On August 2 I also installed a fix for a use-after-close problem that could be > relevant. See: > > 53917616b0aca43b9ca85531d4dcb426885bdb73 > a066fb1ceee373c982214c28206108c5fba01bf7 Should this bug be closed? Lars, do you still see the problem, and if you do, on what branch? Thanks. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2016-10-10 10:36 ` Eli Zaretskii @ 2016-10-10 10:54 ` Lars Ingebrigtsen 2016-10-10 11:23 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2016-10-10 10:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, Paul Eggert Eli Zaretskii <eliz@gnu.org> writes: > Should this bug be closed? Lars, do you still see the problem, and if > you do, on what branch? I still see this problem, and it's on the master branch. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2016-10-10 10:54 ` Lars Ingebrigtsen @ 2016-10-10 11:23 ` Eli Zaretskii 2017-09-02 12:40 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2016-10-10 11:23 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Paul Eggert <eggert@cs.ucla.edu>, 24201@debbugs.gnu.org > Date: Mon, 10 Oct 2016 12:54:36 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Should this bug be closed? Lars, do you still see the problem, and if > > you do, on what branch? > > I still see this problem, and it's on the master branch. OK, thanks. I moved the bug to block the release of 26.1, which will be the first version released from what is now master. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2016-10-10 11:23 ` Eli Zaretskii @ 2017-09-02 12:40 ` Eli Zaretskii 2017-09-02 13:04 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2017-09-02 12:40 UTC (permalink / raw) To: larsi; +Cc: 24201, eggert > Date: Mon, 10 Oct 2016 14:23:43 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > > > From: Lars Ingebrigtsen <larsi@gnus.org> > > Cc: Paul Eggert <eggert@cs.ucla.edu>, 24201@debbugs.gnu.org > > Date: Mon, 10 Oct 2016 12:54:36 +0200 > > > > Eli Zaretskii <eliz@gnu.org> writes: > > > > > Should this bug be closed? Lars, do you still see the problem, and if > > > you do, on what branch? > > > > I still see this problem, and it's on the master branch. > > OK, thanks. I moved the bug to block the release of 26.1, which will > be the first version released from what is now master. Any news on this one? It's been almost a year since the last communication about this issue; if we still have the problem, why is no one else seeing it? Thanks. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2017-09-02 12:40 ` Eli Zaretskii @ 2017-09-02 13:04 ` Lars Ingebrigtsen 2017-09-02 14:21 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2017-09-02 13:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> Date: Mon, 10 Oct 2016 14:23:43 +0300 >> From: Eli Zaretskii <eliz@gnu.org> >> Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu >> >> > From: Lars Ingebrigtsen <larsi@gnus.org> >> > Cc: Paul Eggert <eggert@cs.ucla.edu>, 24201@debbugs.gnu.org >> > Date: Mon, 10 Oct 2016 12:54:36 +0200 >> > >> > Eli Zaretskii <eliz@gnu.org> writes: >> > >> > > Should this bug be closed? Lars, do you still see the problem, and if >> > > you do, on what branch? >> > >> > I still see this problem, and it's on the master branch. >> >> OK, thanks. I moved the bug to block the release of 26.1, which will >> be the first version released from what is now master. > > Any news on this one? It's been almost a year since the last > communication about this issue; if we still have the problem, why is > no one else seeing it? I still haven't found time to debug it, but I see the hangs daily (on builds on different machines). I'd guess few other people see this because they don't use the network as intensely from Emacs (you need to have two TLS connections being set up "simultaneously" to see the hang). But I don't think this should be a release blocker, precisely because so few people see the bug. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2017-09-02 13:04 ` Lars Ingebrigtsen @ 2017-09-02 14:21 ` Eli Zaretskii 2018-02-18 17:57 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2017-09-02 14:21 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert unblock 24655 by 24201 thanks > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Sat, 02 Sep 2017 15:04:31 +0200 > > > Any news on this one? It's been almost a year since the last > > communication about this issue; if we still have the problem, why is > > no one else seeing it? > > I still haven't found time to debug it, but I see the hangs daily (on > builds on different machines). I'd guess few other people see this > because they don't use the network as intensely from Emacs (you need to > have two TLS connections being set up "simultaneously" to see the hang). > > But I don't think this should be a release blocker, precisely because so > few people see the bug. OK, I'm unblocking Emacs 26.1 due to this. Hopefully, you will find tome to debug this at some point. Thanks. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2017-09-02 14:21 ` Eli Zaretskii @ 2018-02-18 17:57 ` Lars Ingebrigtsen 2018-02-19 16:52 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2018-02-18 17:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert OK, I finally found a way to semi-reliably reproduce one of these hangs I've been seeing. The symptoms are that I move from one Gnus newsgroup to another, and in certain circumstances Emacs will then hang forever. `C-g' does work, but only after hitting it a number of times -- it varies how many, but 5-10 `C-g's usually allows Emacs to proceed. I've included the xbacktrace and the backtrace from gdb below. If I interpret this correctly, it looks like we're in wait_reading_process_output, and that code decides to let a timer run, and the timer then tries to establish a TLS connection, then we'll hang indefinitely. I think. Should I try to run this in an unoptimized build to get more values filled in? (gdb) xbacktrace "accept-process-output" (0xffffb138) "network-stream-get-response" (0xffffb3d8) "network-stream-open-starttls" (0xffffb888) "open-network-stream" (0xffffbb90) "nntp-open-connection" (0xffffc090) "nntp-open-server" (0xffffc3a0) "nntp-possibly-change-group" (0xffffca80) "nntp-with-open-group-function" (0xffffcd30) "nntp-request-article" (0xffffcfb0) "gnus-request-article" (0xffffd250) "gnus-async-prefetch-article" (0xffffd5e8) "apply" (0xffffd6f8) "timer-event-handler" (0xffffd9b8) #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa4f0, oldmask=oldmask@entry=0x7fffffffa590) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 #1 0x000000000050b41e in block_interrupt_signal (oldset=oldset@entry=0x7fffffffa590) at sysdep.c:843 #2 0x00000000005c821d in really_call_select (arg=0x7fffffffa630) at thread.c:567 #3 0x00000000005c8789 in thread_select (func=<optimized out>, max_fds=max_fds@entry=16, rfds=rfds@entry=0x7fffffffa700, wfds=wfds@entry=0x7fffffffa780, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad50, sigmask=0x0) at thread.c:602 #4 0x00000000005e5a35 in xg_select (fds_lim=16, rfds=rfds@entry=0x7fffffffae40, wfds=wfds@entry=0x7fffffffaec0, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad50, sigmask=sigmask@entry=0x0) at xgselect.c:117 #5 0x00000000005a9f29 in wait_reading_process_output (time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x40c3ba0, just_wait_proc=0) at process.c:5379 #6 0x00000000005ab808 in Faccept_process_output (process=XIL(0x40c3ba5), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4667 #7 0x0000000000568404 in Ffuncall (nargs=<optimized out>, args=args@entry=0x7fffffffb130) at eval.c:2818 #8 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x4201a65), maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=59, args=<optimized out>, args@entry=0x3b) at bytecode.c:632 #9 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffb150), nargs=59, nargs@entry=3, arg_vector=0x3b, arg_vector@entry=0x7fffffffb3d8) at eval.c:3019 #10 0x000000000056834b in Ffuncall (nargs=4, args=args@entry=0x7fffffffb3d0) at eval.c:2832 #11 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x4201805), maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=58, args=<optimized out>, args@entry=0x3a) at bytecode.c:632 #12 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffb448), nargs=58, nargs@entry=5, arg_vector=0x3a, arg_vector@entry=0x7fffffffb888) at eval.c:3019 #13 0x000000000056834b in Ffuncall (nargs=6, args=args@entry=0x7fffffffb880) at eval.c:2832 #14 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x42014f5), maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=56, args=<optimized out>, args@entry=0x38) at bytecode.c:632 #15 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffb8d8), nargs=56, nargs@entry=14, arg_vector=0x38, arg_vector@entry=0x7fffffffbb90) at eval.c:3019 #16 0x000000000056834b in Ffuncall (nargs=15, args=args@entry=0x7fffffffbb88) at eval.c:2832 #17 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x416a9c5), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #18 0x0000000000567f69 in funcall_lambda (fun=XIL(0x416ad05), nargs=nargs@entry=1, arg_vector=arg_vector@entry=0x7fffffffc090) at eval.c:3101 #19 0x000000000056834b in Ffuncall (nargs=2, args=args@entry=0x7fffffffc088) at eval.c:2832 #20 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x4164515), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #21 0x0000000000567f69 in funcall_lambda (fun=XIL(0x4161925), nargs=nargs@entry=3, arg_vector=arg_vector@entry=0x7fffffffc3a0) at eval.c:3101 #22 0x000000000056834b in Ffuncall (nargs=4, args=args@entry=0x7fffffffc398) at eval.c:2832 #23 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x416b7d5), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #24 0x0000000000567f69 in funcall_lambda (fun=XIL(0x416bb25), nargs=nargs@entry=3, arg_vector=arg_vector@entry=0x7fffffffca80) at eval.c:3101 #25 0x000000000056834b in Ffuncall (nargs=4, args=args@entry=0x7fffffffca78) at eval.c:2832 #26 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x415c965), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #27 0x0000000000567f69 in funcall_lambda (fun=XIL(0x415ca15), nargs=nargs@entry=4, arg_vector=arg_vector@entry=0x7fffffffcd30) at eval.c:3101 #28 0x000000000056834b in Ffuncall (nargs=5, args=args@entry=0x7fffffffcd28) at eval.c:2832 #29 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x4161285), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #30 0x0000000000567f69 in funcall_lambda (fun=XIL(0x41612b5), nargs=nargs@entry=4, arg_vector=arg_vector@entry=0x7fffffffcfb0) at eval.c:3101 #31 0x000000000056834b in Ffuncall (nargs=5, args=args@entry=0x7fffffffcfa8) at eval.c:2832 #32 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x3e57535), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #33 0x0000000000567f69 in funcall_lambda (fun=XIL(0x3e1d7f5), nargs=nargs@entry=2, arg_vector=arg_vector@entry=0x7fffffffd250) at eval.c:3101 #34 0x000000000056834b in Ffuncall (nargs=3, args=args@entry=0x7fffffffd248) at eval.c:2832 #35 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x41b5ba5), maxdepth=<optimized out>, args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, args@entry=0x0) at bytecode.c:632 #36 0x0000000000567f69 in funcall_lambda (fun=XIL(0x424d585), nargs=nargs@entry=3, arg_vector=arg_vector@entry=0x7fffffffd5e8) at eval.c:3101 #37 0x000000000056834b in Ffuncall (nargs=nargs@entry=4, args=0x7fffffffd5e0) at eval.c:2832 #38 0x0000000000569d70 in Fapply (nargs=2, args=<optimized out>) at eval.c:2438 #39 0x0000000000568404 in Ffuncall (nargs=<optimized out>, args=args@entry=0x7fffffffd6f0) at eval.c:2818 #40 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x99b3dd), maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=6, args=<optimized out>, args@entry=0x6) at bytecode.c:632 #41 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffd728), nargs=6, nargs@entry=1, arg_vector=0x6, arg_vector@entry=0x7fffffffd9b8) at eval.c:3019 #42 0x000000000056834b in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffffd9b0) at eval.c:2832 #43 0x00000000005684ba in call1 (fn=fn@entry=XIL(0xc480), arg1=arg1@entry=XIL(0x4b72725)) at eval.c:2669 #44 0x00000000004f8ebb in timer_check_2 (idle_timers=<optimized out>, timers=<optimized out>) at keyboard.c:4330 #45 timer_check () at keyboard.c:4392 #46 0x00000000005a9367 in wait_reading_process_output (time_limit=time_limit@entry=30, nsecs=nsecs@entry=0, read_kbd=-1, do_display=do_display@entry=true, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x0, just_wait_proc=0) at process.c:5144 #47 0x0000000000424060 in sit_for (timeout=<optimized out>, reading=reading@entry=true, display_option=display_option@entry=1) at dispnew.c:5804 #48 0x00000000004feae2 in read_char (commandflag=commandflag@entry=1, map=map@entry=XIL(0x4dca1c3), prev_event=<optimized out>, used_mouse_menu=used_mouse_menu@entry=0x7fffffffe5ab, end_time=end_time@entry=0x0) at keyboard.c:2723 #49 0x00000000004ff7b7 in read_key_sequence (keybuf=keybuf@entry=0x7fffffffe6a0, prompt=prompt@entry=XIL(0), dont_downcase_last=dont_downcase_last@entry=false, can_return_switch_frame=can_return_switch_frame@entry=true, fix_current_buffer=fix_current_buffer@entry=true, prevent_redisplay=prevent_redisplay@entry=false, bufsize=30) at keyboard.c:9137 #50 0x00000000005013be in command_loop_1 () at keyboard.c:1370 #51 0x00000000005669ae in internal_condition_case ( bfun=bfun@entry=0x5011a0 <command_loop_1>, handlers=handlers@entry=XIL(0x5280), hfun=hfun@entry=0x4f8210 <cmd_error>) at eval.c:1332 #52 0x00000000004f2b94 in command_loop_2 (ignore=ignore@entry=XIL(0)) at keyboard.c:1111 #53 0x000000000056694c in internal_catch (tag=tag@entry=XIL(0xc750), func=func@entry=0x4f2b70 <command_loop_2>, arg=arg@entry=XIL(0)) at eval.c:1097 #54 0x00000000004f2b50 in command_loop () at keyboard.c:1090 #55 0x00000000004f7df3 in recursive_edit_1 () at keyboard.c:696 #56 0x00000000004f8158 in Frecursive_edit () at keyboard.c:767 #57 0x000000000041a5c3 in main (argc=<optimized out>, argv=0x7fffffffea18) at emacs.c:1724 -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-18 17:57 ` Lars Ingebrigtsen @ 2018-02-19 16:52 ` Lars Ingebrigtsen 2018-02-19 17:56 ` Paul Eggert 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2018-02-19 16:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert It seems more difficult to reproduce the bug in a non-optimised build, but it finally happened: #0 pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 #1 0x00000000005800ec in block_interrupt_signal (oldset=0x7fffffff8a60) at sysdep.c:843 #2 0x00000000006898fc in really_call_select (arg=0x7fffffff8bb0) at thread.c:575 #3 0x00000000005e08ed in flush_stack_call_func (func=0x689853 <really_call_select>, arg=0x7fffffff8bb0) at alloc.c:5218 #4 0x000000000068999e in thread_select (func=0x7fffefb02420 <__pselect>, max_fds=16, rfds=0x7fffffff90e0, wfds=0x7fffffff9060, efds=0x0, timeout=0x7fffffff9380, sigmask=0x0) at thread.c:602 #5 0x00000000006b2ce1 in xg_select (fds_lim=16, rfds=0x7fffffff9420, wfds=0x7fffffff93a0, efds=0x0, timeout=0x7fffffff9380, sigmask=0x0) at xgselect.c:117 #6 0x000000000065f6c5 in wait_reading_process_output (time_limit=0, nsecs=50000000, read_kbd=0, do_display=false, wait_for_cell=XIL(0), wait_proc=0x150084d0, just_wait_proc=0) at process.c:5379 #7 0x000000000065da81 in Faccept_process_output (process=XIL(0x150084d5), seconds=XIL(0xd44b7f), millisec=make_number(50), just_this_one=XIL(0)) at process.c:4667 #8 0x00000000006081b1 in funcall_subr (subr=0xc46580 <Saccept_process_output>, numargs=3, args=0x7fffffff9768) at eval.c:2901 #9 0x0000000000607d46 in Ffuncall (nargs=4, args=0x7fffffff9760) at eval.c:2818 #10 0x000000000065096e in exec_byte_code (bytestr=XIL(0x43d1f04), vector=XIL(0x414c3a5), maxdepth=make_number(7), args_template=make_number(771), nargs=3, args=0x7fffffff9cb0) at bytecode.c:632 #11 0x000000000060877c in funcall_lambda (fun=XIL(0x414c3f5), nargs=3, arg_vector=0x7fffffff9c98) at eval.c:3019 #12 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffff9c90) at eval.c:2820 #13 0x000000000065096e in exec_byte_code (bytestr=XIL(0x43d1d04), vector=XIL(0x42e7be5), maxdepth=make_number(28), args_template=make_number(1285), nargs=5, args=0x7fffffffa400) at bytecode.c:632 #14 0x000000000060877c in funcall_lambda (fun=XIL(0x414c325), nargs=5, arg_vector=0x7fffffffa3d8) at eval.c:3019 #15 0x0000000000607d8a in Ffuncall (nargs=6, args=0x7fffffffa3d0) at eval.c:2820 #16 0x000000000065096e in exec_byte_code (bytestr=XIL(0x43d6684), vector=XIL(0x414c015), maxdepth=make_number(22), args_template=make_number(1156), nargs=14, args=0x7fffffffa990) at bytecode.c:632 #17 0x000000000060877c in funcall_lambda (fun=XIL(0x414c175), nargs=14, arg_vector=0x7fffffffa970) at eval.c:3019 #18 0x0000000000607d8a in Ffuncall (nargs=15, args=0x7fffffffa968) at eval.c:2820 #19 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4235df4), vector=XIL(0x42361a5), maxdepth=make_number(15), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #20 0x0000000000608b04 in funcall_lambda (fun=XIL(0x4235b35), nargs=1, arg_vector=0x42361a5) at eval.c:3101 #21 0x0000000000607d8a in Ffuncall (nargs=2, args=0x7fffffffb088) at eval.c:2820 #22 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4230074), vector=XIL(0x422f5b5), maxdepth=make_number(4), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #23 0x0000000000608b04 in funcall_lambda (fun=XIL(0x422f685), nargs=3, arg_vector=0x422f5b5) at eval.c:3101 #24 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffffb5b8) at eval.c:2820 #25 0x000000000065096e in exec_byte_code (bytestr=XIL(0x42373c4), vector=XIL(0x4236965), maxdepth=make_number(8), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #26 0x0000000000608b04 in funcall_lambda (fun=XIL(0x4236cb5), nargs=3, arg_vector=0x4236965) at eval.c:3101 #27 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffffbeb8) at eval.c:2820 #28 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4226f24), vector=XIL(0x4228785), maxdepth=make_number(4), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #29 0x0000000000608b04 in funcall_lambda (fun=XIL(0x4227ee5), nargs=4, arg_vector=0x4228785) at eval.c:3101 #30 0x0000000000607d8a in Ffuncall (nargs=5, args=0x7fffffffc388) at eval.c:2820 #31 0x000000000065096e in exec_byte_code (bytestr=XIL(0x422e994), vector=XIL(0x422db75), maxdepth=make_number(5), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #32 0x0000000000608b04 in funcall_lambda (fun=XIL(0x422dba5), nargs=4, arg_vector=0x422db75) at eval.c:3101 #33 0x0000000000607d8a in Ffuncall (nargs=5, args=0x7fffffffc828) at eval.c:2820 #34 0x000000000065096e in exec_byte_code (bytestr=XIL(0x3b09454), vector=XIL(0x3f7ec65), maxdepth=make_number(5), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #35 0x0000000000608b04 in funcall_lambda (fun=XIL(0x3eef535), nargs=2, arg_vector=0x3f7ec65) at eval.c:3101 #36 0x0000000000607d8a in Ffuncall (nargs=3, args=0x7fffffffcce8) at eval.c:2820 #37 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4e49874), vector=XIL(0x42f70d5), maxdepth=make_number(7), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632 #38 0x0000000000608b04 in funcall_lambda (fun=XIL(0x42d4fd5), nargs=3, arg_vector=0x42f70d5) at eval.c:3101 #39 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffffd2a0) at eval.c:2820 #40 0x000000000060716c in Fapply (nargs=2, args=0x7fffffffd488) at eval.c:2438 #41 0x0000000000608049 in funcall_subr (subr=0xc42a00 <Sapply>, numargs=2, args=0x7fffffffd488) at eval.c:2873 #42 0x0000000000607d46 in Ffuncall (nargs=3, args=0x7fffffffd480) at eval.c:2818 #43 0x000000000065096e in exec_byte_code (bytestr=XIL(0xa6864c), vector=XIL(0xa6866d), maxdepth=make_number(10), args_template=make_number(257), nargs=1, args=0x7fffffffd9f0) at bytecode.c:632 #44 0x000000000060877c in funcall_lambda (fun=XIL(0xa6861d), nargs=1, arg_vector=0x7fffffffd9e8) at eval.c:3019 #45 0x0000000000607d8a in Ffuncall (nargs=2, args=0x7fffffffd9e0) at eval.c:2820 #46 0x0000000000607744 in call1 (fn=XIL(0xc480), arg1=XIL(0x11d06665)) at eval.c:2669 #47 0x000000000056763e in timer_check_2 (timers=XIL(0x9cd82a3), idle_timers=XIL(0x9cd82d3)) at keyboard.c:4330 #48 0x000000000056776c in timer_check () at keyboard.c:4392 #49 0x0000000000565989 in readable_events (flags=1) at keyboard.c:3349 #50 0x000000000056bed8 in get_input_pending (flags=1) at keyboard.c:6805 #51 0x00000000005723a2 in detect_input_pending_run_timers (do_display=true) at keyboard.c:9943 #52 0x000000000065fb9d in wait_reading_process_output (time_limit=30, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5516 #53 0x0000000000425fd4 in sit_for (timeout=make_number(30), reading=true, display_option=1) at dispnew.c:5804 #54 0x00000000005640ac in read_char (commandflag=1, map=XIL(0xb4357a3), prev_event=XIL(0), used_mouse_menu=0x7fffffffe39f, end_time=0x0) at keyboard.c:2723 #55 0x0000000000570b12 in read_key_sequence (keybuf=0x7fffffffe560, bufsize=30, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9137 #56 0x0000000000560d38 in command_loop_1 () at keyboard.c:1370 #57 0x00000000006046ce in internal_condition_case (bfun=0x560902 <command_loop_1>, handlers=XIL(0x5280), hfun=0x5600b8 <cmd_error>) at eval.c:1332 #58 0x00000000005605ec in command_loop_2 (ignore=XIL(0)) at keyboard.c:1111 #59 0x0000000000603fa0 in internal_catch (tag=XIL(0xc750), func=0x5605bf <command_loop_2>, arg=XIL(0)) at eval.c:1097 #60 0x000000000056058a in command_loop () at keyboard.c:1090 #61 0x000000000055fc87 in recursive_edit_1 () at keyboard.c:696 #62 0x000000000055fe0b in Frecursive_edit () at keyboard.c:767 #63 0x000000000055da7c in main (argc=1, argv=0x7fffffffea18) at emacs.c:1724 Lisp Backtrace: "accept-process-output" (0xffff9768) "network-stream-get-response" (0xffff9c98) "network-stream-open-starttls" (0xffffa3d8) "open-network-stream" (0xffffa970) "nntp-open-connection" (0xffffb090) "nntp-open-server" (0xffffb5c0) "nntp-possibly-change-group" (0xffffbec0) "nntp-with-open-group-function" (0xffffc390) "nntp-request-article" (0xffffc830) "gnus-request-article" (0xffffccf0) "gnus-async-prefetch-article" (0xffffd2a8) "apply" (0xffffd488) "timer-event-handler" (0xffffd9e8) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 16:52 ` Lars Ingebrigtsen @ 2018-02-19 17:56 ` Paul Eggert 2018-02-19 18:16 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Paul Eggert @ 2018-02-19 17:56 UTC (permalink / raw) To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: 24201 Lars Ingebrigtsen wrote: > #0 pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60) > at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 When you say that Emacs "hangs" surely you don't mean that the call to pthread_sigmask is hanging. So, can we assume that Emacs is busy-waiting for something? Is the CPU pegged? What does the busy-wait loop look like? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 17:56 ` Paul Eggert @ 2018-02-19 18:16 ` Lars Ingebrigtsen 2018-02-19 18:32 ` Eli Zaretskii 2018-02-19 18:37 ` Andreas Schwab 0 siblings, 2 replies; 46+ messages in thread From: Lars Ingebrigtsen @ 2018-02-19 18:16 UTC (permalink / raw) To: Paul Eggert; +Cc: 24201 Paul Eggert <eggert@cs.ucla.edu> writes: > Lars Ingebrigtsen wrote: >> #0 pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60) >> at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 > > When you say that Emacs "hangs" surely you don't mean that the call to > pthread_sigmask is hanging. So, can we assume that Emacs is > busy-waiting for something? Is the CPU pegged? What does the busy-wait > loop look like? It's using 100% CPU, so it's busy-waiting, but every time I stop it, it's in that pthread_sigmask, as far as I'm able to tell... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 18:16 ` Lars Ingebrigtsen @ 2018-02-19 18:32 ` Eli Zaretskii 2018-02-19 19:06 ` Lars Ingebrigtsen 2018-02-19 18:37 ` Andreas Schwab 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2018-02-19 18:32 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Eli Zaretskii <eliz@gnu.org>, 24201@debbugs.gnu.org > Date: Mon, 19 Feb 2018 19:16:50 +0100 > > Paul Eggert <eggert@cs.ucla.edu> writes: > > > Lars Ingebrigtsen wrote: > >> #0 pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60) > >> at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 > > > > When you say that Emacs "hangs" surely you don't mean that the call to > > pthread_sigmask is hanging. So, can we assume that Emacs is > > busy-waiting for something? Is the CPU pegged? What does the busy-wait > > loop look like? > > It's using 100% CPU, so it's busy-waiting, but every time I stop it, > it's in that pthread_sigmask, as far as I'm able to tell... And if you then say "continue" or "finish", does it exit from the pthread_sigmask call? If not, some other thread is spinning. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 18:32 ` Eli Zaretskii @ 2018-02-19 19:06 ` Lars Ingebrigtsen 2018-02-19 19:57 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2018-02-19 19:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> It's using 100% CPU, so it's busy-waiting, but every time I stop it, >> it's in that pthread_sigmask, as far as I'm able to tell... > > And if you then say "continue" or "finish", does it exit from the > pthread_sigmask call? If not, some other thread is spinning. Hm... actually, I don't know. I say "cont" and then break it again, and it says that it's in a pthread_sigmask call, but I don't know whether it's the same one... The process has unfortunately exited now, but the next time this happens, what's the best way to debug what's going on? I'm not very well versed in gdb usage. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 19:06 ` Lars Ingebrigtsen @ 2018-02-19 19:57 ` Eli Zaretskii 2018-02-19 20:39 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2018-02-19 19:57 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: eggert@cs.ucla.edu, 24201@debbugs.gnu.org > Date: Mon, 19 Feb 2018 20:06:46 +0100 > > Hm... actually, I don't know. I say "cont" and then break it again, > and it says that it's in a pthread_sigmask call, but I don't know > whether it's the same one... > > The process has unfortunately exited now, but the next time this > happens, what's the best way to debug what's going on? I'm not very > well versed in gdb usage. The best way is to start typing "finish" until it doesn't return from the frame. See the instructions in etc/DEBUG starting at "If Emacs is in an infinite loop". If something in the instructions is unclear, please don't hesitate asking. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 19:57 ` Eli Zaretskii @ 2018-02-19 20:39 ` Lars Ingebrigtsen 2019-06-24 13:25 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2018-02-19 20:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: > The best way is to start typing "finish" until it doesn't return from > the frame. See the instructions in etc/DEBUG starting at "If Emacs is > in an infinite loop". If something in the instructions is unclear, > please don't hesitate asking. Thanks; I'll give it a try (and read the document) the next time this happens. :-) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 20:39 ` Lars Ingebrigtsen @ 2019-06-24 13:25 ` Lars Ingebrigtsen 2019-06-24 19:20 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-24 13:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert I've now started looking at this again, and is running Emacs under gdb. Last night, I got the hang again. I C-z'd in gdb (i.e., SIGTSTP), and then did "finish" until it returned sufficiently that we got back to the hang, and the I repeated, the procedure a couple of times, until the file "finish" returned to Emacs and Emacs was un-hung. These hangs usually last for about a minute or so, with intervention with gdb or not, so I would guess that the code stops hanging when the peer finally gives up on the connection and hangs up or something. Does the output from gdb have any smoking gun about what's making this semi-infloop in accept-process-output? (In case nobody remembers what this bug report is about: Sometimes, in rare circumstances (like once every other day), when opening a TLS connection from a timer (or probably any other asynchronous context), and there's other TLS traffic already in happening, opening a new TLS connection will hang. I have no repeatable test case, though.) Thread 1 "emacs" received signal SIGTSTP, Stopped (user). pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa500, oldmask=oldmask@entry=0x7fffffffa5a0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 50 ../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory. (gdb) xbacktrace "accept-process-output" (0xffffb198) "network-stream-get-response" (0xffffb428) "network-stream-open-starttls" (0xffffb8d8) "open-network-stream" (0xffffbbc0) "nntp-open-connection" (0xffffc0a0) "nntp-open-server" (0xffffc3a0) "nntp-possibly-change-group" (0xffffca50) "nntp-with-open-group-function" (0xffffccf0) "nntp-request-article" (0xffffcf60) "gnus-request-article" (0xffffd228) "gnus-async-prefetch-article" (0xffffd538) "apply" (0xffffd648) "timer-event-handler" (0xffffd8f8) (gdb) bt #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa500, oldmask=oldmask@entry=0x7fffffffa5a0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 #1 0x00005555556858be in block_interrupt_signal ( oldset=oldset@entry=0x7fffffffa5a0) at sysdep.c:873 #2 0x000055555574cbed in really_call_select (arg=0x7fffffffa640) at thread.c:581 #3 0x000055555574d1d9 in thread_select (func=<optimized out>, max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffa710, wfds=wfds@entry=0x7fffffffa790, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=0x0) at thread.c:616 #4 0x000055555576aa75 in xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffae90, wfds=wfds@entry=0x7fffffffaf10, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=sigmask@entry=0x0) at xgselect.c:117 #5 0x000055555572bb97 in wait_reading_process_output ( time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555556830280, just_wait_proc=0) at process.c:5427 #6 0x000055555572d45b in Faccept_process_output (process=XIL(0x555556830285), seconds=<optimized out>, millisec=<optimized out>, ---Type <return> to continue, or q <return> to quit---q just_this_one=<optimized Quit (gdb) finish Run till exit from #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa500, oldmask=oldmask@entry=0x7fffffffa5a0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 block_interrupt_signal (oldset=oldset@entry=0x7fffffffa5a0) at sysdep.c:874 warning: Source file is more recent than executable. 874 pthread_sigmask (SIG_SETMASK, oldset, 0); Value returned is $6 = 0 (gdb) finish Run till exit from #0 block_interrupt_signal ( oldset=oldset@entry=0x7fffffffa5a0) at sysdep.c:874 really_call_select (arg=0x7fffffffa640) at thread.c:583 583 release_global_lock (); (gdb) finish Run till exit from #0 really_call_select (arg=0x7fffffffa640) at thread.c:583 thread_select (func=<optimized out>, max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffa710, wfds=wfds@entry=0x7fffffffa790, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=0x0) at thread.c:618 618 } (gdb) finish Run till exit from #0 thread_select (func=<optimized out>, max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffa710, wfds=wfds@entry=0x7fffffffa790, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=0x0) at thread.c:618 xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffae90, wfds=wfds@entry=0x7fffffffaf10, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=sigmask@entry=0x0) at xgselect.c:120 120 if (nfds < 0) Value returned is $7 = 3 (gdb) finish Run till exit from #0 xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffae90, wfds=wfds@entry=0x7fffffffaf10, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=sigmask@entry=0x0) at xgselect.c:120 wait_reading_process_output (time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555556830280, just_wait_proc=0) at process.c:5447 5447 if (nfds == 0) Value returned is $8 = 2 (gdb) finish Run till exit from #0 wait_reading_process_output ( time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555556830280, just_wait_proc=0) at process.c:5447 Faccept_process_output (process=XIL(0x555556830285), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 4715 ? Qnil : Qt); Value returned is $9 = 74 (gdb) finish Run till exit from #0 Faccept_process_output (process=XIL(0x555556830285), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb190) at eval.c:2803 2803 val = funcall_subr (XSUBR (fun), numargs, args + 1); Value returned is $10 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb190) at eval.c:2803 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x35) at bytecode.c:633 warning: Source file is more recent than executable. 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $11 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x0000555555720e60 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x35) at bytecode.c:633 Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $12 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x34) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $13 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 0x0000555555720e60 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x34) at bytecode.c:633 ^Z Thread 1 "emacs" received signal SIGTSTP, Stopped (user). pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa800, oldmask=oldmask@entry=0x7fffffffa8a0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 50 ../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory. (gdb) xbacktrace "accept-process-output" (0xffffb498) "network-stream-get-response" (0xffffb728) "network-stream-open-starttls" (0xffffbbd8) "open-network-stream" (0xffffbec0) "nntp-open-connection" (0xffffc3a0) "nntp-possibly-change-group" (0xffffca50) "nntp-with-open-group-function" (0xffffccf0) "nntp-request-article" (0xffffcf60) "gnus-request-article" (0xffffd228) "gnus-async-prefetch-article" (0xffffd538) "apply" (0xffffd648) "timer-event-handler" (0xffffd8f8) (gdb) finish Run till exit from #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa800, oldmask=oldmask@entry=0x7fffffffa8a0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 block_interrupt_signal (oldset=oldset@entry=0x7fffffffa8a0) at sysdep.c:874 874 pthread_sigmask (SIG_SETMASK, oldset, 0); Value returned is $14 = 0 (gdb) finish Run till exit from #0 block_interrupt_signal ( oldset=oldset@entry=0x7fffffffa8a0) at sysdep.c:874 really_call_select (arg=0x7fffffffa940) at thread.c:583 583 release_global_lock (); (gdb) finish Run till exit from #0 really_call_select (arg=0x7fffffffa940) at thread.c:583 thread_select (func=<optimized out>, max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffaa10, wfds=wfds@entry=0x7fffffffaa90, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb060, sigmask=0x0) at thread.c:618 618 } (gdb) finish Run till exit from #0 thread_select (func=<optimized out>, max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffaa10, wfds=wfds@entry=0x7fffffffaa90, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb060, sigmask=0x0) at thread.c:618 xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffb190, wfds=wfds@entry=0x7fffffffb210, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb060, sigmask=sigmask@entry=0x0) at xgselect.c:120 120 if (nfds < 0) Value returned is $15 = 3 (gdb) finish Run till exit from #0 xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffb190, wfds=wfds@entry=0x7fffffffb210, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb060, sigmask=sigmask@entry=0x0) at xgselect.c:120 wait_reading_process_output (time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x55555b23b750, just_wait_proc=0) at process.c:5447 5447 if (nfds == 0) Value returned is $16 = 2 (gdb) finish Run till exit from #0 wait_reading_process_output ( time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x55555b23b750, just_wait_proc=0) at process.c:5447 Faccept_process_output (process=XIL(0x55555b23b755), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 4715 ? Qnil : Qt); Value returned is $17 = 74 (gdb) finish Run till exit from #0 Faccept_process_output (process=XIL(0x55555b23b755), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb490) at eval.c:2803 2803 val = funcall_subr (XSUBR (fun), numargs, args + 1); Value returned is $18 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb490) at eval.c:2803 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x31) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $19 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x0000555555720e60 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x31) at bytecode.c:633 Ffuncall (nargs=4, args=args@entry=0x7fffffffb720) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $20 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 Ffuncall (nargs=4, args=args@entry=0x7fffffffb720) at eval.c:2826 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x30) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $21 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 0x0000555555720e60 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x30) at bytecode.c:633 ^Z Thread 1 "emacs" received signal SIGTSTP, Stopped (user). 0x00005555556c6c53 in vectorlike_nbytes (hdr=0x555556d1dd20) at alloc.c:2964 2964 ptrdiff_t size = hdr->size & ~ARRAY_MARK_FLAG; (gdb) finish Run till exit from #0 0x00005555556c6c53 in vectorlike_nbytes ( hdr=0x555556d1dd20) at alloc.c:2964 sweep_vectors () at alloc.c:3073 3073 gcstat.total_vector_slots += nbytes / word_size; Value returned is $22 = 1040 (gdb) finish Run till exit from #0 sweep_vectors () at alloc.c:3073 gc_sweep () at alloc.c:6973 6973 pdumper_clear_marks (); (gdb) finish Run till exit from #0 gc_sweep () at alloc.c:6973 garbage_collect_1 (gcst=0x7fffffffaa30) at alloc.c:5917 5917 unmark_main_thread (); (gdb) finish Run till exit from #0 garbage_collect_1 (gcst=0x7fffffffaa30) at alloc.c:5917 garbage_collect () at alloc.c:5990 5990 } Value returned is $23 = true (gdb) finish Run till exit from #0 garbage_collect () at alloc.c:5990 Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffffab20) at eval.c:2789 2789 if (debug_on_next_call) (gdb) finish Run till exit from #0 Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffffab20) at eval.c:2789 call1 (fn=fn@entry=XIL(0xc960), arg1=arg1@entry=XIL(0x5555570ee915)) at eval.c:2657 2657 } Value returned is $24 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 call1 (fn=fn@entry=XIL(0xc960), arg1=arg1@entry=XIL(0x5555570ee915)) at eval.c:2657 timer_check_2 (idle_timers=XIL(0), timers=XIL(0x555571081263)) at keyboard.c:4309 4309 unbind_to (count, Qnil); Value returned is $25 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 timer_check_2 (idle_timers=XIL(0), timers=XIL(0x555571081263)) at keyboard.c:4309 timer_check () at keyboard.c:4368 4368 nexttime = timer_check_2 (timers, idle_timers); (gdb) finish Run till exit from #0 timer_check () at keyboard.c:4368 0x0000555555673bf9 in readable_events (flags=flags@entry=1) at keyboard.c:3367 3367 timer_check (); Value returned is $26 = {tv_sec = 0, tv_nsec = 58418057} (gdb) finish Run till exit from #0 0x0000555555673bf9 in readable_events ( flags=flags@entry=1) at keyboard.c:3367 0x0000555555674638 in get_input_pending (flags=flags@entry=1) at keyboard.c:6769 6769 input_pending = (!NILP (Vquit_flag) || readable_events (flags)); Value returned is $27 = true (gdb) finish Run till exit from #0 0x0000555555674638 in get_input_pending ( flags=flags@entry=1) at keyboard.c:6769 swallow_events (do_display=do_display@entry=false) at keyboard.c:4115 4115 if (!input_pending && timers_run != old_timers_run && do_display) Value returned is $28 = true (gdb) finish Run till exit from #0 swallow_events (do_display=do_display@entry=false) at keyboard.c:4115 wait_reading_process_output (time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555556853170, just_wait_proc=0) at process.c:5605 5605 if (! NILP (wait_for_cell) && ! NILP (XCAR (wait_for_cell))) (gdb) finish Run till exit from #0 wait_reading_process_output ( time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555556853170, just_wait_proc=0) at process.c:5605 Faccept_process_output (process=XIL(0x555556853175), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 4715 ? Qnil : Qt); Value returned is $29 = 74 (gdb) finish Run till exit from #0 Faccept_process_output (process=XIL(0x555556853175), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb190) at eval.c:2803 2803 val = funcall_subr (XSUBR (fun), numargs, args + 1); Value returned is $30 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb190) at eval.c:2803 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x35) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $31 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x0000555555720e60 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x35) at bytecode.c:633 Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $32 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x34) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $33 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 0x0000555555720e60 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x34) at bytecode.c:633 ^Z Thread 1 "emacs" received signal SIGTSTP, Stopped (user). 0x00005555556ca268 in mark_object (arg=<optimized out>) at alloc.c:6571 6571 if (NILP (ptr->u.s.u.cdr)) (gdb) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-24 13:25 ` Lars Ingebrigtsen @ 2019-06-24 19:20 ` Eli Zaretskii 2019-06-24 20:46 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-06-24 19:20 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Mon, 24 Jun 2019 15:25:52 +0200 > > I've now started looking at this again, and is running Emacs under gdb. Thanks. > Last night, I got the hang again. I C-z'd in gdb (i.e., SIGTSTP), and > then did "finish" until it returned sufficiently that we got back to the > hang, and the I repeated, the procedure a couple of times, until the > file "finish" returned to Emacs and Emacs was un-hung. > > These hangs usually last for about a minute or so, with intervention > with gdb or not, so I would guess that the code stops hanging when the > peer finally gives up on the connection and hangs up or something. > > Does the output from gdb have any smoking gun about what's making this > semi-infloop in accept-process-output? It doesn't loop in accept-process-output, it loops several stack frames above that. accept-process-output returns when you type "finish", so it by itself is not the culprit. > (gdb) finish > Run till exit from #0 Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) > at eval.c:2826 > 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, > vector=<optimized out>, maxdepth=<optimized out>, > args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, > args@entry=0x34) at bytecode.c:633 > 633 TOP = Ffuncall (op + 1, &TOP); > Value returned is $13 = (union Lisp_X *) 0x0 > (gdb) finish > Run till exit from #0 0x0000555555720e60 in exec_byte_code ( > bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, > args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, > args@entry=0x34) at bytecode.c:633 > > ^Z Each time "finish" doesn't return you have a call to funcall and 2 calls to exec_byte_code above it. Can you tell what function is being called by funcall? It's easy: after you get to that frame, and see this: Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $12 = (union Lisp_X *) 0x0 don't type "finish", but instead do this: (gdb) p args[0] (gdb) xsymbol You should see one of the Lisp functions you see in the Lisp backtrace: "accept-process-output" (0xffffb498) "network-stream-get-response" (0xffffb728) "network-stream-open-starttls" (0xffffbbd8) "open-network-stream" (0xffffbec0) "nntp-open-connection" (0xffffc3a0) "nntp-possibly-change-group" (0xffffca50) "nntp-with-open-group-function" (0xffffccf0) "nntp-request-article" (0xffffcf60) "gnus-request-article" (0xffffd228) "gnus-async-prefetch-article" (0xffffd538) "apply" (0xffffd648) "timer-event-handler" (0xffffd8f8) Once you see the name of that function, look at its caller's code, and try to figure out what are the two levels of exec_byte_code above Ffuncall doing, and through this why they might loop. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-24 19:20 ` Eli Zaretskii @ 2019-06-24 20:46 ` Lars Ingebrigtsen 2019-06-25 19:15 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-24 20:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: > Each time "finish" doesn't return you have a call to funcall and 2 > calls to exec_byte_code above it. Can you tell what function is being > called by funcall? It's easy: after you get to that frame, and see > this: > > Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 > 2826 lisp_eval_depth--; > Value returned is $12 = (union Lisp_X *) 0x0 > > don't type "finish", but instead do this: > > (gdb) p args[0] > (gdb) xsymbol [...] > Once you see the name of that function, look at its caller's code, and > try to figure out what are the two levels of exec_byte_code above > Ffuncall doing, and through this why they might loop. Ah, OK, I'll try that the next time this happens. Thanks; I'm a total novice at using gdb to debug Emacs... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-24 20:46 ` Lars Ingebrigtsen @ 2019-06-25 19:15 ` Lars Ingebrigtsen 2019-06-25 21:57 ` Lars Ingebrigtsen 2019-06-26 16:29 ` Eli Zaretskii 0 siblings, 2 replies; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-25 19:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Lars Ingebrigtsen <larsi@gnus.org> writes: >> Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826 >> 2826 lisp_eval_depth--; >> Value returned is $12 = (union Lisp_X *) 0x0 >> >> don't type "finish", but instead do this: >> >> (gdb) p args[0] >> (gdb) xsymbol It's frustrating -- when running under gdb, this hang seems to happen a lot less often, and it hasn't happen against since last time. But I was just looking at the code to try to understand the previous backtrace. So, the "finish" stuff seemed to show that we weren't hanging in accept-process-output at all, but were instead just calling functions as normal, and looping on the Lisp level? And, for instance, Ffuncall has a maybe_quit, so C-g should work in these instances? But they don't, which is confusing. Is it even possible for the Lisp-ier functions in the backtrace to inhibit C-g? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-25 19:15 ` Lars Ingebrigtsen @ 2019-06-25 21:57 ` Lars Ingebrigtsen 2019-06-26 16:32 ` Eli Zaretskii 2019-06-26 16:29 ` Eli Zaretskii 1 sibling, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-25 21:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert OK, got another one. I hit C-g a bunch in Emacs first, but got no response, and then TSTP-d the process: Thread 1 "emacs" received signal SIGTSTP, Stopped (user). pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa550, oldmask=oldmask@entry=0x7fffffffa5f0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 50 ../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory. (gdb) finish Run till exit from #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa550, oldmask=oldmask@entry=0x7fffffffa5f0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 block_interrupt_signal (oldset=oldset@entry=0x7fffffffa5f0) at sysdep.c:868 868 } Value returned is $21 = 0 (gdb) finish Run till exit from #0 block_interrupt_signal ( oldset=oldset@entry=0x7fffffffa5f0) at sysdep.c:868 really_call_select (arg=0x7fffffffa690) at thread.c:594 594 if (self->not_holding_lock) (gdb) finish Run till exit from #0 really_call_select (arg=0x7fffffffa690) at thread.c:594 thread_select (func=<optimized out>, max_fds=max_fds@entry=22, rfds=rfds@entry=0x7fffffffa760, wfds=wfds@entry=0x7fffffffa7e0, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffadb0, sigmask=0x0) at thread.c:618 618 } (gdb) finish Run till exit from #0 thread_select (func=<optimized out>, max_fds=max_fds@entry=22, rfds=rfds@entry=0x7fffffffa760, wfds=wfds@entry=0x7fffffffa7e0, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffadb0, sigmask=0x0) at thread.c:618 xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffaee0, wfds=wfds@entry=0x7fffffffaf60, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffadb0, sigmask=sigmask@entry=0x0) at xgselect.c:120 120 if (nfds < 0) Value returned is $22 = 1 (gdb) finish Run till exit from #0 xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffaee0, wfds=wfds@entry=0x7fffffffaf60, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffadb0, sigmask=sigmask@entry=0x0) at xgselect.c:120 wait_reading_process_output (time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x55555c0176e0, just_wait_proc=0) at process.c:5447 5447 if (nfds == 0) Value returned is $23 = 1 (gdb) finish Run till exit from #0 wait_reading_process_output ( time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x55555c0176e0, just_wait_proc=0) at process.c:5447 Faccept_process_output (process=XIL(0x55555c0176e5), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 4715 ? Qnil : Qt); Value returned is $24 = 74 (gdb) finish Run till exit from #0 Faccept_process_output (process=XIL(0x55555c0176e5), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 0x00005555556ea7c9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb1e0) at eval.c:2803 2803 val = funcall_subr (XSUBR (fun), numargs, args + 1); Value returned is $25 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x00005555556ea7c9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb1e0) at eval.c:2803 0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x35) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $26 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x0000555555721250 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x35) at bytecode.c:633 Ffuncall (nargs=4, args=args@entry=0x7fffffffb470) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $27 = (union Lisp_X *) 0x0 --- OK, so here's I'm at the lisp_eval_depth--. --- (gdb) p args[0] $28 = XIL(0x1401750) (gdb) xsymbol $29 = (struct Lisp_Symbol *) 0x555557199570 "network-stream-get-response" (gdb) xbacktrace "network-stream-get-response" (0xffffb478) "network-stream-open-starttls" (0xffffb928) "open-network-stream" (0xffffbc10) "nntp-open-connection" (0xffffc0f0) "nntp-open-server" (0xffffc3f0) "nntp-possibly-change-group" (0xffffcaa0) "nntp-with-open-group-function" (0xffffcd40) "nntp-request-article" (0xffffcfb0) "gnus-request-article" (0xffffd278) "gnus-async-prefetch-article" (0xffffd588) "apply" (0xffffd698) "timer-event-handler" (0xffffd948) And then there's a couple more of the same things: Emacs continues after a few more finishes, and I TSTP it: (gdb) finish Run till exit from #0 Ffuncall (nargs=4, args=args@entry=0x7fffffffb470) at eval.c:2826 0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x34) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $30 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 0x0000555555721250 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x34) at bytecode.c:633 ^Z Thread 1 "emacs" received signal SIGTSTP, Stopped (user). 0x00005555556ca609 in cons_marked_p (c=0x55556da37da0) at alloc.c:3774 3774 : XCONS_MARKED_P (c); (gdb) xbacktrace "Automatic GC" (0x0) "timer-event-handler" (0xffffaec8) "accept-process-output" (0xffffb4e8) "network-stream-get-response" (0xffffb778) "network-stream-open-starttls" (0xffffbc28) "open-network-stream" (0xffffbf10) "nntp-open-connection" (0xffffc3f0) "nntp-possibly-change-group" (0xffffcaa0) "nntp-with-open-group-function" (0xffffcd40) "nntp-request-article" (0xffffcfb0) "gnus-request-article" (0xffffd278) "gnus-async-prefetch-article" (0xffffd588) "apply" (0xffffd698) "timer-event-handler" (0xffffd948) (gdb) finish Run till exit from #0 0x00005555556ca609 in cons_marked_p (c=0x55556da37da0) at alloc.c:3774 mark_object (arg=<optimized out>) at alloc.c:6566 6566 if (cons_marked_p (ptr)) (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6566 mark_object (arg=<optimized out>) at alloc.c:6580 6580 if (cdr_count == mark_object_loop_halt) (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6580 mark_object (arg=<optimized out>) at alloc.c:6580 6580 if (cdr_count == mark_object_loop_halt) (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6580 mark_object (arg=<optimized out>) at alloc.c:6580 6580 if (cdr_count == mark_object_loop_halt) (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6580 mark_object (arg=<optimized out>) at alloc.c:6580 6580 if (cdr_count == mark_object_loop_halt) (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6580 mark_localized_symbol (ptr=<optimized out>) at alloc.c:6228 6228 mark_object (blv->defcell); (gdb) finish Run till exit from #0 mark_localized_symbol (ptr=<optimized out>) at alloc.c:6228 mark_object (arg=<optimized out>) at alloc.c:6544 6544 break; (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6544 mark_vectorlike (header=0x555556e35a20) at alloc.c:6105 6105 for (i = 0; i < size; i++) /* ...and then mark its elements. */ (gdb) finish Run till exit from #0 mark_vectorlike (header=0x555556e35a20) at alloc.c:6105 mark_object (arg=<optimized out>) at alloc.c:6531 6531 mark_object (ptr->u.s.plist); (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6531 mark_vectorlike (header=0x555556e51a40) at alloc.c:6105 6105 for (i = 0; i < size; i++) /* ...and then mark its elements. */ (gdb) finish Run till exit from #0 mark_vectorlike (header=0x555556e51a40) at alloc.c:6105 mark_object (arg=<optimized out>) at alloc.c:6531 6531 mark_object (ptr->u.s.plist); (gdb) finish Run till exit from #0 mark_object (arg=<optimized out>) at alloc.c:6531 mark_vectorlike (header=0x555556e5a7b0) at alloc.c:6105 6105 for (i = 0; i < size; i++) /* ...and then mark its elements. */ (gdb) finish Run till exit from #0 mark_vectorlike (header=0x555556e5a7b0) at alloc.c:6105 mark_object (arg=<optimized out>) at alloc.c:6531 6531 mark_object (ptr->u.s.plist); But I landed inside a gc, so I just cont: (gdb) cont Continuing. ^Z Thread 1 "emacs" received signal SIGTSTP, Stopped (user). pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa550, oldmask=oldmask@entry=0x7fffffffa5f0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 50 ../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory. (gdb) finish Run till exit from #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa550, oldmask=oldmask@entry=0x7fffffffa5f0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 block_interrupt_signal (oldset=oldset@entry=0x7fffffffa5f0) at sysdep.c:868 868 } Value returned is $31 = 0 (gdb) cont Continuing. ^Z And then a new TSTP: Thread 1 "emacs" received signal SIGTSTP, Stopped (user). pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa850, oldmask=oldmask@entry=0x7fffffffa8f0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 50 ../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory. (gdb) xbacktrace "accept-process-output" (0xffffb4e8) "network-stream-get-response" (0xffffb778) "network-stream-open-starttls" (0xffffbc28) "open-network-stream" (0xffffbf10) "nntp-open-connection" (0xffffc3f0) "nntp-possibly-change-group" (0xffffcaa0) "nntp-with-open-group-function" (0xffffcd40) "nntp-request-article" (0xffffcfb0) "gnus-request-article" (0xffffd278) "gnus-async-prefetch-article" (0xffffd588) "apply" (0xffffd698) "timer-event-handler" (0xffffd948) (gdb) finish Run till exit from #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7fffffffa850, oldmask=oldmask@entry=0x7fffffffa8f0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 block_interrupt_signal (oldset=oldset@entry=0x7fffffffa8f0) at sysdep.c:868 868 } Value returned is $32 = 0 (gdb) finish Run till exit from #0 block_interrupt_signal ( oldset=oldset@entry=0x7fffffffa8f0) at sysdep.c:868 really_call_select (arg=0x7fffffffa990) at thread.c:583 583 release_global_lock (); (gdb) finish Run till exit from #0 really_call_select (arg=0x7fffffffa990) at thread.c:583 thread_select (func=<optimized out>, max_fds=max_fds@entry=22, rfds=rfds@entry=0x7fffffffaa60, wfds=wfds@entry=0x7fffffffaae0, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb0b0, sigmask=0x0) at thread.c:618 618 } (gdb) finish Run till exit from #0 thread_select (func=<optimized out>, max_fds=max_fds@entry=22, rfds=rfds@entry=0x7fffffffaa60, wfds=wfds@entry=0x7fffffffaae0, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb0b0, sigmask=0x0) at thread.c:618 xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffb1e0, wfds=wfds@entry=0x7fffffffb260, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb0b0, sigmask=sigmask@entry=0x0) at xgselect.c:120 120 if (nfds < 0) Value returned is $33 = 3 (gdb) finish Run till exit from #0 xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffb1e0, wfds=wfds@entry=0x7fffffffb260, efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb0b0, sigmask=sigmask@entry=0x0) at xgselect.c:120 wait_reading_process_output (time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555560255dc0, just_wait_proc=0) at process.c:5447 5447 if (nfds == 0) Value returned is $34 = 2 (gdb) finish Run till exit from #0 wait_reading_process_output ( time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x555560255dc0, just_wait_proc=0) at process.c:5447 Faccept_process_output (process=XIL(0x555560255dc5), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 4715 ? Qnil : Qt); Value returned is $35 = 74 (gdb) finish Run till exit from #0 Faccept_process_output (process=XIL(0x555560255dc5), seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715 0x00005555556ea7c9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb4e0) at eval.c:2803 2803 val = funcall_subr (XSUBR (fun), numargs, args + 1); Value returned is $36 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x00005555556ea7c9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb4e0) at eval.c:2803 0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x31) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $37 = (union Lisp_X *) 0xc5a0 (gdb) finish Run till exit from #0 0x0000555555721250 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, args@entry=0x31) at bytecode.c:633 Ffuncall (nargs=4, args=args@entry=0x7fffffffb770) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $38 = (union Lisp_X *) 0x0 (gdb) p args[0] $39 = XIL(0x1401750) (gdb) xsymbol $40 = (struct Lisp_Symbol *) 0x555557199570 "network-stream-get-response" (gdb) xbacktrace "network-stream-get-response" (0xffffb778) "network-stream-open-starttls" (0xffffbc28) "open-network-stream" (0xffffbf10) "nntp-open-connection" (0xffffc3f0) "nntp-possibly-change-group" (0xffffcaa0) "nntp-with-open-group-function" (0xffffcd40) "nntp-request-article" (0xffffcfb0) "gnus-request-article" (0xffffd278) "gnus-async-prefetch-article" (0xffffd588) "apply" (0xffffd698) "timer-event-handler" (0xffffd948) So this looks very similar to the first one. (gdb) finish Run till exit from #0 Ffuncall (nargs=4, args=args@entry=0x7fffffffb770) at eval.c:2826 0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x30) at bytecode.c:633 633 TOP = Ffuncall (op + 1, &TOP); Value returned is $41 = (union Lisp_X *) 0x0 (gdb) finish Run till exit from #0 0x0000555555721250 in exec_byte_code ( bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, args@entry=0x30) at bytecode.c:633 Ffuncall (nargs=2, args=args@entry=0x7fffffffbbd8) at eval.c:2826 2826 lisp_eval_depth--; Value returned is $42 = (union Lisp_X *) 0x555559d0a544 (gdb) cont Continuing. And upon continuing, Emacs un-stuck with the following message: Error running timer ‘gnus-async-prefetch-article’: (error "Server closed connection") -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-25 21:57 ` Lars Ingebrigtsen @ 2019-06-26 16:32 ` Eli Zaretskii 2019-06-27 10:34 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-06-26 16:32 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Tue, 25 Jun 2019 23:57:35 +0200 > > (gdb) finish > Run till exit from #0 0x0000555555721250 in exec_byte_code ( > bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, > args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, > args@entry=0x31) at bytecode.c:633 > Ffuncall (nargs=4, args=args@entry=0x7fffffffb770) at eval.c:2826 > 2826 lisp_eval_depth--; > Value returned is $38 = (union Lisp_X *) 0x0 > (gdb) p args[0] > $39 = XIL(0x1401750) > (gdb) xsymbol > $40 = (struct Lisp_Symbol *) 0x555557199570 > "network-stream-get-response" OK, so that Ffuncall calls network-stream-get-response. the next question is what Lisp code runs 2 stack frames above, i.e. which Lisp code calls some other Lisp code, which calls network-stream-get-response? Can you spot that? The offending code should be somewhere in network-stream-get-response or functions it calls. Once we identify the Lisp code which loops, we need to try to figure out why it loops. > And upon continuing, Emacs un-stuck with the following message: > > Error running timer ‘gnus-async-prefetch-article’: (error "Server closed connection") So probably we loop until the other side closes the connection? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-26 16:32 ` Eli Zaretskii @ 2019-06-27 10:34 ` Lars Ingebrigtsen 2019-06-27 13:25 ` Eli Zaretskii 2019-06-28 14:39 ` Basil L. Contovounesios 0 siblings, 2 replies; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-27 10:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: > OK, so that Ffuncall calls network-stream-get-response. the next > question is what Lisp code runs 2 stack frames above, i.e. which Lisp > code calls some other Lisp code, which calls > network-stream-get-response? Can you spot that? The offending code > should be somewhere in network-stream-get-response or functions it > calls. > > Once we identify the Lisp code which loops, we need to try to figure > out why it loops. I'm pretty sure it's this: (while (and (memq (process-status stream) '(open run)) (not (re-search-forward end-of-command nil t))) (accept-process-output stream 0.05) (goto-char start)) But I do not understand why C-g doesn't work: I've tried to follow the call chain, and I can see no instances of binding `inhibit-quit'... If C-g worked, it'd be easier to debug... >> And upon continuing, Emacs un-stuck with the following message: >> >> Error running timer ‘gnus-async-prefetch-article’: (error "Server >> closed connection") > > So probably we loop until the other side closes the connection? Yup. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-27 10:34 ` Lars Ingebrigtsen @ 2019-06-27 13:25 ` Eli Zaretskii 2019-06-27 19:28 ` Lars Ingebrigtsen 2019-06-28 14:39 ` Basil L. Contovounesios 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-06-27 13:25 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Thu, 27 Jun 2019 12:34:28 +0200 > > > Once we identify the Lisp code which loops, we need to try to figure > > out why it loops. > > I'm pretty sure it's this: > > (while (and (memq (process-status stream) '(open run)) > (not (re-search-forward end-of-command nil t))) > (accept-process-output stream 0.05) > (goto-char start)) Any idea why this loops? Does the other end not respond or something? > But I do not understand why C-g doesn't work: I've tried to follow the > call chain, and I can see no instances of binding `inhibit-quit'... Is inhibit-quit in fact nil or not inside this loop? > If C-g worked, it'd be easier to debug... If you are running this under GDB, you can interrupt by "GDB ways", they are described in etc/DEBUG. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-27 13:25 ` Eli Zaretskii @ 2019-06-27 19:28 ` Lars Ingebrigtsen 2019-06-28 6:19 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-27 19:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> I'm pretty sure it's this: >> >> (while (and (memq (process-status stream) '(open run)) >> (not (re-search-forward end-of-command nil t))) >> (accept-process-output stream 0.05) >> (goto-char start)) > > Any idea why this loops? Does the other end not respond or something? I'm not sure -- I haven't been able to debug that at all. Since C-g doesn't work (and it happens so rarely and I have to catch it before the peer hangs up), it makes it awkward... >> But I do not understand why C-g doesn't work: I've tried to follow the >> call chain, and I can see no instances of binding `inhibit-quit'... > > Is inhibit-quit in fact nil or not inside this loop? Looking at the code, I can't see any signs of it being inhibited here, but I'm waiting for the next time it happens so that I can inspect with gdb... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-27 19:28 ` Lars Ingebrigtsen @ 2019-06-28 6:19 ` Eli Zaretskii 2019-06-28 8:25 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-06-28 6:19 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Thu, 27 Jun 2019 21:28:34 +0200 > > >> (while (and (memq (process-status stream) '(open run)) > >> (not (re-search-forward end-of-command nil t))) > >> (accept-process-output stream 0.05) > >> (goto-char start)) > > > > Any idea why this loops? Does the other end not respond or something? > > I'm not sure -- I haven't been able to debug that at all. Since C-g > doesn't work (and it happens so rarely and I have to catch it before the > peer hangs up), it makes it awkward... You could try using debug-on-event in conjunction with SIGUSR1/SIGUSR2 to enter the debugger when you need. > > Is inhibit-quit in fact nil or not inside this loop? > > Looking at the code, I can't see any signs of it being inhibited here, > but I'm waiting for the next time it happens so that I can inspect with > gdb... You could also add a call to 'message' in that loop to print the value of the flag. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-28 6:19 ` Eli Zaretskii @ 2019-06-28 8:25 ` Lars Ingebrigtsen 2019-06-28 8:34 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-28 8:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> I'm not sure -- I haven't been able to debug that at all. Since C-g >> doesn't work (and it happens so rarely and I have to catch it before the >> peer hangs up), it makes it awkward... > > You could try using debug-on-event in conjunction with SIGUSR1/SIGUSR2 > to enter the debugger when you need. Oh, that seems very useful; yes. I'll set that up and try that the next time it happens. >> > Is inhibit-quit in fact nil or not inside this loop? >> >> Looking at the code, I can't see any signs of it being inhibited here, >> but I'm waiting for the next time it happens so that I can inspect with >> gdb... > > You could also add a call to 'message' in that loop to print the value > of the flag. `inhibit-quit' is not set normally when that function is called (I just checked), but perhaps... Oh, I just thought something -- this function (when it hangs) is always called from a timer. Does the timer machinery set `inhibit-quit'? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-28 8:25 ` Lars Ingebrigtsen @ 2019-06-28 8:34 ` Eli Zaretskii 2019-06-28 9:55 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-06-28 8:34 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Fri, 28 Jun 2019 10:25:41 +0200 > > Oh, I just thought something -- this function (when it hangs) is > always called from a timer. Does the timer machinery set > `inhibit-quit'? Yes, it does: see timer_check_2. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-28 8:34 ` Eli Zaretskii @ 2019-06-28 9:55 ` Lars Ingebrigtsen 2019-06-28 12:26 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-28 9:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> From: Lars Ingebrigtsen <larsi@gnus.org> >> Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu >> Date: Fri, 28 Jun 2019 10:25:41 +0200 >> >> Oh, I just thought something -- this function (when it hangs) is >> always called from a timer. Does the timer machinery set >> `inhibit-quit'? > > Yes, it does: see timer_check_2. Great; now at least I know where to disable the `inhibit-quit' so that I can debug the infloop in network-stream more conveniently. Thanks for all the help with this, Eli. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-28 9:55 ` Lars Ingebrigtsen @ 2019-06-28 12:26 ` Eli Zaretskii 0 siblings, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2019-06-28 12:26 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Fri, 28 Jun 2019 11:55:09 +0200 > > Thanks for all the help with this, Eli. I'm glad to be of any help I can extend. Thanks for working on this tricky problem. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-27 10:34 ` Lars Ingebrigtsen 2019-06-27 13:25 ` Eli Zaretskii @ 2019-06-28 14:39 ` Basil L. Contovounesios 2019-06-28 14:50 ` Eli Zaretskii 1 sibling, 1 reply; 46+ messages in thread From: Basil L. Contovounesios @ 2019-06-28 14:39 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: eggert, 24201 Lars Ingebrigtsen <larsi@gnus.org> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >> OK, so that Ffuncall calls network-stream-get-response. the next >> question is what Lisp code runs 2 stack frames above, i.e. which Lisp >> code calls some other Lisp code, which calls >> network-stream-get-response? Can you spot that? The offending code >> should be somewhere in network-stream-get-response or functions it >> calls. >> >> Once we identify the Lisp code which loops, we need to try to figure >> out why it loops. > > I'm pretty sure it's this: > > (while (and (memq (process-status stream) '(open run)) > (not (re-search-forward end-of-command nil t))) > (accept-process-output stream 0.05) > (goto-char start)) This is probably unrelated, but isn't there a race condition between checking process-status and accepting output, as described in (info "(elisp) Accepting Output")? Thanks, -- Basil ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-28 14:39 ` Basil L. Contovounesios @ 2019-06-28 14:50 ` Eli Zaretskii 2019-06-30 19:02 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-06-28 14:50 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: larsi, eggert, 24201 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Cc: Eli Zaretskii <eliz@gnu.org>, 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Fri, 28 Jun 2019 15:39:04 +0100 > > > (while (and (memq (process-status stream) '(open run)) > > (not (re-search-forward end-of-command nil t))) > > (accept-process-output stream 0.05) > > (goto-char start)) > > This is probably unrelated, but isn't there a race condition between > checking process-status and accepting output, as described in > (info "(elisp) Accepting Output")? Yes. But the issue here is why it loops for such a long time, not why it continues looping after the connection is closed. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-28 14:50 ` Eli Zaretskii @ 2019-06-30 19:02 ` Lars Ingebrigtsen 2019-07-04 13:04 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-06-30 19:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Basil L. Contovounesios, 24201, eggert I'm able to reproduce the hang more often now by artificially closing the connection a lot and then letting it be reopened, so I'm now getting the hangs about once an hour when using Gnus. I changed the inflooping function like thus to record what's in it: (defun network-stream-get-response (stream start end-of-command) (when end-of-command (with-current-buffer (process-buffer stream) (save-excursion (goto-char start) (while (and (memq (process-status stream) '(open run)) (not (re-search-forward end-of-command nil t))) (push (list (format-time-string "%FT%T") (point) (current-buffer) (buffer-string)) lars-buffers) (accept-process-output stream 0.05) (goto-char start)) After making this change, I've only seen the hang twice, so this is all very preliminary, but in both hangs, it didn't really loop there. That is, in both instances, I got something like this as the top-level element in `lars-buffers': ("2019-06-30T20:50:37" 1 #<buffer *server news.gmane.org nntp *Async Prefetch Article**> "") and the next entries were from a successful connection. And this is after letting it hang for, say, ten seconds before `C-g'-ing it, so there should be hundreds of elements there, since accept-process-output is supposed to return after 0.05s. What's in the buffer after `C-g' was, both times: ---- 200 news.gmane.org InterNetNews NNRP server INN 2.6.1 ready (posting ok)^M Process nntpd<3> connection broken by remote peer ---- I'll keep poking at this, but this is pointing at `accept-process-output' having some kind of problem... But we'll see; perhaps these two hangs weren't typical... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-30 19:02 ` Lars Ingebrigtsen @ 2019-07-04 13:04 ` Lars Ingebrigtsen 2019-07-04 19:05 ` Eli Zaretskii 2019-07-05 8:21 ` Eli Zaretskii 0 siblings, 2 replies; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-07-04 13:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Basil L. Contovounesios, 24201, eggert I've been poking more at this, and I think there really must be a bug somewhere in accept-process-output. With the following patch, connection never hangs: diff --git a/lisp/net/network-stream.el b/lisp/net/network-stream.el index 2b3292b71b..cdb33a59f1 100644 --- a/lisp/net/network-stream.el +++ b/lisp/net/network-stream.el @@ -376,7 +376,7 @@ network-stream-get-response (goto-char start) (while (and (memq (process-status stream) '(open run)) (not (re-search-forward end-of-command nil t))) - (accept-process-output stream 0.05) + (accept-process-output stream 0.05 nil t) (goto-char start)) ;; Return the data we got back, or nil if the process died. (unless (= start (point)) It's the JUST-THIS-ONE parameter: If that's non-nil, then accept-process-output returns after the timeout... and we get the data. Now, tracing the logic in wait_reading_process_output is rather... difficult. It's an 800 line function with lots of inputs from everywhere. But I think this code looks suspicious: if (NILP (wait_for_cell) && just_wait_proc >= 0 && timespec_valid_p (timer_delay) && timespec_cmp (timer_delay, timeout) < 0) { if (!timespec_valid_p (now)) now = current_timespec (); struct timespec timeout_abs = timespec_add (now, timeout); if (!timespec_valid_p (got_output_end_time) || timespec_cmp (timeout_abs, got_output_end_time) < 0) got_output_end_time = timeout_abs; timeout = timer_delay; } else got_output_end_time = invalid_timespec (); This is done only if JUST-THIS-ONE is set. But... the timeout stuff is implemented in a very convoluted manner, with reuse of certain variables that makes what this is trying to achieve rather opaque. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply related [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-04 13:04 ` Lars Ingebrigtsen @ 2019-07-04 19:05 ` Eli Zaretskii 2019-07-05 12:59 ` Lars Ingebrigtsen 2019-07-05 8:21 ` Eli Zaretskii 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-07-04 19:05 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: contovob, 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: "Basil L. Contovounesios" <contovob@tcd.ie>, 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Thu, 04 Jul 2019 15:04:54 +0200 > > With the following patch, connection never hangs: > > diff --git a/lisp/net/network-stream.el b/lisp/net/network-stream.el > index 2b3292b71b..cdb33a59f1 100644 > --- a/lisp/net/network-stream.el > +++ b/lisp/net/network-stream.el > @@ -376,7 +376,7 @@ network-stream-get-response > (goto-char start) > (while (and (memq (process-status stream) '(open run)) > (not (re-search-forward end-of-command nil t))) > - (accept-process-output stream 0.05) > + (accept-process-output stream 0.05 nil t) > (goto-char start)) > ;; Return the data we got back, or nil if the process died. > (unless (= start (point)) Does this mean you have other process objects active at that time? > It's the JUST-THIS-ONE parameter: If that's non-nil, then > accept-process-output returns after the timeout... and we get the data. I don't understand: accept-process-output is supposed to hit the timeout only when there's no data. But you say "and we get the data". So what am I missing here? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-04 19:05 ` Eli Zaretskii @ 2019-07-05 12:59 ` Lars Ingebrigtsen 2019-07-05 18:03 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-07-05 12:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: contovob, 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> - (accept-process-output stream 0.05) >> + (accept-process-output stream 0.05 nil t) >> (goto-char start)) >> ;; Return the data we got back, or nil if the process died. >> (unless (= start (point)) > > Does this mean you have other process objects active at that time? Yes, normally when this code is triggered by the timer, there's other networking happening more or less at the same time. >> It's the JUST-THIS-ONE parameter: If that's non-nil, then >> accept-process-output returns after the timeout... and we get the data. > > I don't understand: accept-process-output is supposed to hit the > timeout only when there's no data. That's not what the doc string says, I think? -- Optional second argument SECONDS and third argument MILLISEC specify a timeout; return after that much time even if there is no subprocess output. -- "even if"... > But you say "and we get the data". So what am I missing here? Without the JUST-THIS-ONE parameter, accept-process-output seems to loop until the peer closes the connection. And then control is returned to Lisp world and the data is in the buffer. The gdb-ing I did earlier seemed to say that this was not the case (i.e., that it was looping in that loop above there), but Emacs apparently behaves differently when stepping through in gdb than when it's not... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-05 12:59 ` Lars Ingebrigtsen @ 2019-07-05 18:03 ` Eli Zaretskii 2019-07-06 12:16 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-07-05 18:03 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: contovob, 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: contovob@tcd.ie, 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Fri, 05 Jul 2019 14:59:52 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> - (accept-process-output stream 0.05) > >> + (accept-process-output stream 0.05 nil t) > >> (goto-char start)) > >> ;; Return the data we got back, or nil if the process died. > >> (unless (= start (point)) > > > > Does this mean you have other process objects active at that time? > > Yes, normally when this code is triggered by the timer, there's other > networking happening more or less at the same time. So maybe another process steals the response? > >> It's the JUST-THIS-ONE parameter: If that's non-nil, then > >> accept-process-output returns after the timeout... and we get the data. > > > > I don't understand: accept-process-output is supposed to hit the > > timeout only when there's no data. > > That's not what the doc string says, I think? > > -- > Optional second argument SECONDS and third argument MILLISEC > specify a timeout; return after that much time even if there is > no subprocess output. > -- > > "even if"... But if output is available, accept-process-output will return immediately, so "even if" really means "if". > Without the JUST-THIS-ONE parameter, accept-process-output seems to loop > until the peer closes the connection. And then control is returned to > Lisp world and the data is in the buffer. Since you are saying that the remote does respond, this would mean the responses get lost somehow, or are consumed by other filter functions. The question is how can that happen? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-05 18:03 ` Eli Zaretskii @ 2019-07-06 12:16 ` Lars Ingebrigtsen 2019-07-07 18:13 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-07-06 12:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: contovob, 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: >> Yes, normally when this code is triggered by the timer, there's other >> networking happening more or less at the same time. > > So maybe another process steals the response? That's possible... >> That's not what the doc string says, I think? >> >> -- >> Optional second argument SECONDS and third argument MILLISEC >> specify a timeout; return after that much time even if there is >> no subprocess output. >> -- >> >> "even if"... > > But if output is available, accept-process-output will return > immediately, so "even if" really means "if". Yeah, that's true. But it should, in an case, not keep spinning (much) longer than the timeout, whether there's any output or not... >> Without the JUST-THIS-ONE parameter, accept-process-output seems to loop >> until the peer closes the connection. And then control is returned to >> Lisp world and the data is in the buffer. > > Since you are saying that the remote does respond, this would mean the > responses get lost somehow, or are consumed by other filter functions. > The question is how can that happen? What I'm seeing happening if I instrument the loop in the Lisp part with: (while (and (memq (process-status stream) '(open run)) (not (re-search-forward end-of-command nil t))) (push (buffer-string) my-output-list) (accept-process-output stream 0.05) (goto-char start)) is that when Emacs hangs, and I `C-g' after half a minute, I will get 1 "" (empty string) entry in that list, so the timeout isn't heeded. But -- in the buffer where this is happening, the data this is waiting for is present, so Emacs got the response and put it into the buffer -- but then just kept on spinning in wait_reading_process_output, apparently. I'll add some instrumentation to that function now and try to figure out why it's not heeding the timeout. It's so annoying that I haven't been able to make a simple test case to reproduce the error, though, because the hang only happens once every few hours... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-06 12:16 ` Lars Ingebrigtsen @ 2019-07-07 18:13 ` Lars Ingebrigtsen 2019-07-07 18:18 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-07-07 18:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201 Lars Ingebrigtsen <larsi@gnus.org> writes: > I'll add some instrumentation to that function now and try to figure out > why it's not heeding the timeout. It's so annoying that I haven't been > able to make a simple test case to reproduce the error, though, because > the hang only happens once every few hours... Amusingly enough, after adding some fprintfs to the function, I haven't seen a single hang. :-/ So this is even more timing dependent than I imagined, and I'll have to instrument this in a different way, I guess... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-07 18:13 ` Lars Ingebrigtsen @ 2019-07-07 18:18 ` Eli Zaretskii 2019-07-08 16:38 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-07-07 18:18 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201 > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org > Date: Sun, 07 Jul 2019 20:13:13 +0200 > > Lars Ingebrigtsen <larsi@gnus.org> writes: > > > I'll add some instrumentation to that function now and try to figure out > > why it's not heeding the timeout. It's so annoying that I haven't been > > able to make a simple test case to reproduce the error, though, because > > the hang only happens once every few hours... > > Amusingly enough, after adding some fprintfs to the function, I haven't > seen a single hang. :-/ Maybe add only one fprintf (assuming doing that doesn't make the problem go away), then move it around to see the control flow. It's less convenient and more time-consuming, but maybe there's no better alternative. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-07 18:18 ` Eli Zaretskii @ 2019-07-08 16:38 ` Lars Ingebrigtsen 2020-08-03 6:00 ` Lars Ingebrigtsen 0 siblings, 1 reply; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-07-08 16:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201 Eli Zaretskii <eliz@gnu.org> writes: > Maybe add only one fprintf (assuming doing that doesn't make the > problem go away), then move it around to see the control flow. It's > less convenient and more time-consuming, but maybe there's no better > alternative. Yup; I'll try different things to see if I can get it to break more reliably... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-08 16:38 ` Lars Ingebrigtsen @ 2020-08-03 6:00 ` Lars Ingebrigtsen 0 siblings, 0 replies; 46+ messages in thread From: Lars Ingebrigtsen @ 2020-08-03 6:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 24201 This was now apparently fixed by bug#40665 -- at least, I haven't seen a single of these hangs after running with that patch for a day. So I'm closing this bug report. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-04 13:04 ` Lars Ingebrigtsen 2019-07-04 19:05 ` Eli Zaretskii @ 2019-07-05 8:21 ` Eli Zaretskii 2019-07-05 13:13 ` Lars Ingebrigtsen 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2019-07-05 8:21 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: contovob, 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: "Basil L. Contovounesios" <contovob@tcd.ie>, 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Thu, 04 Jul 2019 15:04:54 +0200 > > I think this code looks suspicious: > > if (NILP (wait_for_cell) && just_wait_proc >= 0 > && timespec_valid_p (timer_delay) > && timespec_cmp (timer_delay, timeout) < 0) > { > if (!timespec_valid_p (now)) > now = current_timespec (); > struct timespec timeout_abs = timespec_add (now, timeout); > if (!timespec_valid_p (got_output_end_time) > || timespec_cmp (timeout_abs, got_output_end_time) < 0) > got_output_end_time = timeout_abs; > timeout = timer_delay; > } > else > got_output_end_time = invalid_timespec (); > > This is done only if JUST-THIS-ONE is set. There's also this: /* Wait till there is something to do. */ if (wait_proc && just_wait_proc) { if (wait_proc->infd < 0) /* Terminated. */ break; FD_SET (wait_proc->infd, &Available); check_delay = 0; check_write = 0; } In your case, wait_proc is non-NULL. Can you tell which of the various conditions cause accept-process-output to return after a timeout when you apply the patch to network-stream.el? Then we'd know for sure which part helps avoid the loop when JUST-THIS-ONE is non-nil. But what I'm missing is some description of the traffic that happens on the wire. Is the remote responding or isn't it, and if not, then do we have any idea why? ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-07-05 8:21 ` Eli Zaretskii @ 2019-07-05 13:13 ` Lars Ingebrigtsen 0 siblings, 0 replies; 46+ messages in thread From: Lars Ingebrigtsen @ 2019-07-05 13:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: contovob, 24201, eggert Eli Zaretskii <eliz@gnu.org> writes: > Can you tell which of the various conditions cause > accept-process-output to return after a timeout when you apply the > patch to network-stream.el? Then we'd know for sure which part helps > avoid the loop when JUST-THIS-ONE is non-nil. No, I'll add some debugging code to the function and the try to get it to hang to identify what makes JUST-THIS-ONE respect the timeout. > But what I'm missing is some description of the traffic that happens > on the wire. Is the remote responding or isn't it, and if not, then > do we have any idea why? Remote is indeed responding. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2019-06-25 19:15 ` Lars Ingebrigtsen 2019-06-25 21:57 ` Lars Ingebrigtsen @ 2019-06-26 16:29 ` Eli Zaretskii 1 sibling, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2019-06-26 16:29 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, eggert > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu > Date: Tue, 25 Jun 2019 21:15:17 +0200 > > So, the "finish" stuff seemed to show that we weren't hanging in > accept-process-output at all, but were instead just calling functions as > normal, and looping on the Lisp level? That's my current hypothesis, yes. > And, for instance, Ffuncall has a maybe_quit, so C-g should work in > these instances? But they don't, which is confusing. The loop is not in Ffuncall, it's in some Lisp that calls Ffuncall. Those are the two stack frames running bytecode above the frame with Ffuncall. > Is it even possible for the Lisp-ier functions in the backtrace to > inhibit C-g? Yes, they can set inhibit-quit. ^ permalink raw reply [flat|nested] 46+ messages in thread
* bug#24201: 25.1.50; TLS connections sometimes hang 2018-02-19 18:16 ` Lars Ingebrigtsen 2018-02-19 18:32 ` Eli Zaretskii @ 2018-02-19 18:37 ` Andreas Schwab 1 sibling, 0 replies; 46+ messages in thread From: Andreas Schwab @ 2018-02-19 18:37 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 24201, Paul Eggert On Feb 19 2018, Lars Ingebrigtsen <larsi@gnus.org> wrote: > Paul Eggert <eggert@cs.ucla.edu> writes: > >> Lars Ingebrigtsen wrote: >>> #0 pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60) >>> at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50 >> >> When you say that Emacs "hangs" surely you don't mean that the call to >> pthread_sigmask is hanging. So, can we assume that Emacs is >> busy-waiting for something? Is the CPU pegged? What does the busy-wait >> loop look like? > > It's using 100% CPU, so it's busy-waiting, but every time I stop it, > it's in that pthread_sigmask, as far as I'm able to tell... Since it's making a syscall the loop is likely spending a significant part of the time there. That doesn't mean it's reponsible for the busy wait. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2020-08-03 6:00 UTC | newest] Thread overview: 46+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-11 13:20 bug#24201: 25.1.50; TLS connections sometimes hang Lars Ingebrigtsen 2016-08-15 2:35 ` Paul Eggert 2016-10-10 10:36 ` Eli Zaretskii 2016-10-10 10:54 ` Lars Ingebrigtsen 2016-10-10 11:23 ` Eli Zaretskii 2017-09-02 12:40 ` Eli Zaretskii 2017-09-02 13:04 ` Lars Ingebrigtsen 2017-09-02 14:21 ` Eli Zaretskii 2018-02-18 17:57 ` Lars Ingebrigtsen 2018-02-19 16:52 ` Lars Ingebrigtsen 2018-02-19 17:56 ` Paul Eggert 2018-02-19 18:16 ` Lars Ingebrigtsen 2018-02-19 18:32 ` Eli Zaretskii 2018-02-19 19:06 ` Lars Ingebrigtsen 2018-02-19 19:57 ` Eli Zaretskii 2018-02-19 20:39 ` Lars Ingebrigtsen 2019-06-24 13:25 ` Lars Ingebrigtsen 2019-06-24 19:20 ` Eli Zaretskii 2019-06-24 20:46 ` Lars Ingebrigtsen 2019-06-25 19:15 ` Lars Ingebrigtsen 2019-06-25 21:57 ` Lars Ingebrigtsen 2019-06-26 16:32 ` Eli Zaretskii 2019-06-27 10:34 ` Lars Ingebrigtsen 2019-06-27 13:25 ` Eli Zaretskii 2019-06-27 19:28 ` Lars Ingebrigtsen 2019-06-28 6:19 ` Eli Zaretskii 2019-06-28 8:25 ` Lars Ingebrigtsen 2019-06-28 8:34 ` Eli Zaretskii 2019-06-28 9:55 ` Lars Ingebrigtsen 2019-06-28 12:26 ` Eli Zaretskii 2019-06-28 14:39 ` Basil L. Contovounesios 2019-06-28 14:50 ` Eli Zaretskii 2019-06-30 19:02 ` Lars Ingebrigtsen 2019-07-04 13:04 ` Lars Ingebrigtsen 2019-07-04 19:05 ` Eli Zaretskii 2019-07-05 12:59 ` Lars Ingebrigtsen 2019-07-05 18:03 ` Eli Zaretskii 2019-07-06 12:16 ` Lars Ingebrigtsen 2019-07-07 18:13 ` Lars Ingebrigtsen 2019-07-07 18:18 ` Eli Zaretskii 2019-07-08 16:38 ` Lars Ingebrigtsen 2020-08-03 6:00 ` Lars Ingebrigtsen 2019-07-05 8:21 ` Eli Zaretskii 2019-07-05 13:13 ` Lars Ingebrigtsen 2019-06-26 16:29 ` Eli Zaretskii 2018-02-19 18:37 ` Andreas Schwab
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).