all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#24201: 25.1.50; TLS connections sometimes hang
@ 2016-08-11 13:20 Lars Ingebrigtsen
  2016-08-15  2:35 ` Paul Eggert
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-11 13:20 UTC (permalink / raw)
  To: 24201


A couple of weeks ago, TLS connections started making Emacs hang.  I've
yet to find a way to reproduce it reliably, but it seems to involve
opening an async TLS connection while opening a synchronous TLS
connection, or something.

strace in these instances shows an infinite loop like this:

[pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0})
[pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0})
[pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0})
[pid 28250] pselect6(27, [4 6 8 9 10 14 15 16 17 18 19 20 22 24 26], [], NULL, {0, 0}, {NULL, 8}) = 1 (in [20], left {0, 0})

Does this ring a bell?  If not, I'll try to debug it further.

I think it started happening like two or three weeks ago.


In GNU Emacs 25.1.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 3.14.5)
 of 2016-07-20 built on stories
Repository revision: 60dd094a8c7bdbbff121c99f56f42910534e7cc1
Windowing system distributor 'The X.Org Foundation', version 11.0.11604000
System Description:	Debian GNU/Linux 8.5 (jessie)

Recent messages:
Quit opening connection to news.gmane.org
Quit [2 times]
Mark set
Opening nntp server on news.gmane.org...done
Mark set
Opening nntp server on news.gmane.org...done
Quit
Quit opening connection to news.gmane.org
Quit
Making completion list...

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Group

Minor modes in effect:
  gnus-agent-group-mode: t
  shell-dirtrack-mode: t
  diff-auto-refine-mode: t
  gnus-topic-mode: t
  gnus-undo-mode: t
  global-whitespace-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t

Load-path shadows:
/home/larsi/src/cddb.el/expect hides /home/larsi/lisp/expect
/home/larsi/src/cddb.el/captitle hides /home/larsi/lisp/captitle
/home/larsi/src/clock.el/clock hides /home/larsi/lisp/clock
~/pgnus/contrib/vcard hides /home/larsi/lisp/vcard
/home/larsi/src/pvr.el/pvr hides /home/larsi/lisp/pvr
~/lisp/zenirc-2.112/src/zenirc-example hides /home/larsi/lisp/zenirc-example
~/pgnus/contrib/compface hides /home/larsi/src/emacs/trunk/lisp/image/compface

Features:
(shadow emacsbug perl-mode org-element org-rmail org-mhe org-irc
org-info org-gnus org-docview org-bibtex bibtex org-bbdb org-w3m org
org-macro org-footnote org-pcomplete org-list org-faces org-entities
org-version ob-emacs-lisp ob ob-tangle ob-ref ob-lob ob-table ob-exp
org-src ob-keys ob-comint ob-core ob-eval org-compat org-macs
org-loaddefs cal-menu calendar cal-loaddefs ffap tabify imenu man
sh-script smie executable imap cc-mode cc-fonts cc-guess cc-menus
cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs smerge-mode crm
macros eieio-opt speedbar sb-image ezimage dframe etags texinfo
conf-mode nndir nnspool dired-aux grep compile vc-bzr vc-src vc-sccs
vc-svn vc-rcs vc-dir ewoc bug-reference noutline outline edebug log-edit
previews json shell pcomplete comint canlock flow-fill vc-annotate
log-view pcvs-util vc vc-dispatcher eww pp vc-git diff-mode map pulse
find-func thingatpt xref project ring misearch multi-isearch mailalias
smtpmail sendmail ecomplete copyright vc-cvs shr-color color gnus-html
help-fns radix-tree sort gnus-cite smiley ansi-color url-queue url-cache
mm-archive gnus-async gnus-dup qp gnus-ml gmane spam-gmane dns mm-url
disp-table gnus-fun gnus-mdrtn pop3 nndoc nnmbox utf-7 gnus-topic nnml
nnfolder network-stream starttls nnir spam-report spam spam-stat gnus-uu
yenc gnus-delay gnus-draft gnus-agent gnus-srvr gnus-score score-mode
nnvirtual nntp gnus-cache gnus-msg gnus-art mm-uu mml2015 mm-view
mml-smime smime dig gnus-sum nndraft nnmh gnus-group gnus-undo
gnus-start gnus-cloud nnimap utf7 netrc nnoo parse-time gnus-spec
gnus-win nnmail gnus-int gnus-range mail-source message format-spec
rfc822 mml mml-sec epa epg mailabbrev gmm-utils mailheader gnus nnheader
gnus-util rmail rmail-loaddefs mail-utils whitespace movie mkv shr svg
imdb dom pvr debug debbugs-gnu easy-mmode derived debbugs soap-client
mm-decode mm-bodies mm-encode url-http tls gnutls url-auth mail-parse
rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr url-gw nsm puny
url url-proxy url-privacy url-expand url-methods url-history url-cookie
url-domsuf url-util mailcap warnings rng-xsd rng-dt rng-util xsd-regexp
xml ido flyspell ispell benchmark w3m browse-url doc-view subr-x dired
dired-loaddefs image-mode timezone w3m-hist w3m-fb w3m-ems wid-edit
w3m-ccl ccl w3m-favicon w3m-image w3m-proc w3m-util add-log mail-extr
jka-compr cl finder-inf package epg-config url-handlers url-parse
auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache url-vars seq byte-opt gv bytecomp byte-compile cl-extra
help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame
cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 3259686 974810)
 (symbols 48 216414 408)
 (miscs 40 3192 5495)
 (strings 32 609506 170950)
 (string-bytes 1 57242968)
 (vectors 16 95117)
 (vector-slots 8 2432568 99890)
 (floats 8 10530 14727)
 (intervals 56 472696 10642)
 (buffers 976 311)
 (heap 1024 591462 804639))

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2016-08-11 13:20 bug#24201: 25.1.50; TLS connections sometimes hang Lars Ingebrigtsen
@ 2016-08-15  2:35 ` Paul Eggert
  2016-10-10 10:36   ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Paul Eggert @ 2016-08-15  2:35 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201

> A couple of weeks ago, TLS connections started making Emacs hang.

I installed several changes on August 3. They were in the area of nonblocking 
GnuTLS connections (Bug#23982) and perhaps could explain your problem. See the 
following commits:

967e2ef61dab8d2046f3285eefa71f3dcb9d9b60
f0b5d4bd3e17bc0a464b9ec69507413824568f9e
744cc9fa17d20450fc3cf1ca6bb71869abeeb77b
1a8d31123698ccf6f165e49fcfe16631d07a7aea
64edd14ca9cf6d1b957316a9a547e482c9ba811b

On August 2 I also installed a fix for a use-after-close problem that could be 
relevant. See:

53917616b0aca43b9ca85531d4dcb426885bdb73
a066fb1ceee373c982214c28206108c5fba01bf7





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2016-08-15  2:35 ` Paul Eggert
@ 2016-10-10 10:36   ` Eli Zaretskii
  2016-10-10 10:54     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2016-10-10 10:36 UTC (permalink / raw)
  To: Paul Eggert; +Cc: larsi, 24201

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 14 Aug 2016 19:35:45 -0700
> Cc: 24201@debbugs.gnu.org
> 
> > A couple of weeks ago, TLS connections started making Emacs hang.
> 
> I installed several changes on August 3. They were in the area of nonblocking 
> GnuTLS connections (Bug#23982) and perhaps could explain your problem. See the 
> following commits:
> 
> 967e2ef61dab8d2046f3285eefa71f3dcb9d9b60
> f0b5d4bd3e17bc0a464b9ec69507413824568f9e
> 744cc9fa17d20450fc3cf1ca6bb71869abeeb77b
> 1a8d31123698ccf6f165e49fcfe16631d07a7aea
> 64edd14ca9cf6d1b957316a9a547e482c9ba811b
> 
> On August 2 I also installed a fix for a use-after-close problem that could be 
> relevant. See:
> 
> 53917616b0aca43b9ca85531d4dcb426885bdb73
> a066fb1ceee373c982214c28206108c5fba01bf7

Should this bug be closed?  Lars, do you still see the problem, and if
you do, on what branch?

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2016-10-10 10:36   ` Eli Zaretskii
@ 2016-10-10 10:54     ` Lars Ingebrigtsen
  2016-10-10 11:23       ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2016-10-10 10:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, Paul Eggert

Eli Zaretskii <eliz@gnu.org> writes:

> Should this bug be closed?  Lars, do you still see the problem, and if
> you do, on what branch?

I still see this problem, and it's on the master branch.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2016-10-10 10:54     ` Lars Ingebrigtsen
@ 2016-10-10 11:23       ` Eli Zaretskii
  2017-09-02 12:40         ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2016-10-10 11:23 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  24201@debbugs.gnu.org
> Date: Mon, 10 Oct 2016 12:54:36 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Should this bug be closed?  Lars, do you still see the problem, and if
> > you do, on what branch?
> 
> I still see this problem, and it's on the master branch.

OK, thanks.  I moved the bug to block the release of 26.1, which will
be the first version released from what is now master.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2016-10-10 11:23       ` Eli Zaretskii
@ 2017-09-02 12:40         ` Eli Zaretskii
  2017-09-02 13:04           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2017-09-02 12:40 UTC (permalink / raw)
  To: larsi; +Cc: 24201, eggert

> Date: Mon, 10 Oct 2016 14:23:43 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu
> 
> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Cc: Paul Eggert <eggert@cs.ucla.edu>,  24201@debbugs.gnu.org
> > Date: Mon, 10 Oct 2016 12:54:36 +0200
> > 
> > Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > > Should this bug be closed?  Lars, do you still see the problem, and if
> > > you do, on what branch?
> > 
> > I still see this problem, and it's on the master branch.
> 
> OK, thanks.  I moved the bug to block the release of 26.1, which will
> be the first version released from what is now master.

Any news on this one?  It's been almost a year since the last
communication about this issue; if we still have the problem, why is
no one else seeing it?

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2017-09-02 12:40         ` Eli Zaretskii
@ 2017-09-02 13:04           ` Lars Ingebrigtsen
  2017-09-02 14:21             ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2017-09-02 13:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Mon, 10 Oct 2016 14:23:43 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 24201@debbugs.gnu.org, eggert@cs.ucla.edu
>> 
>> > From: Lars Ingebrigtsen <larsi@gnus.org>
>> > Cc: Paul Eggert <eggert@cs.ucla.edu>,  24201@debbugs.gnu.org
>> > Date: Mon, 10 Oct 2016 12:54:36 +0200
>> > 
>> > Eli Zaretskii <eliz@gnu.org> writes:
>> > 
>> > > Should this bug be closed?  Lars, do you still see the problem, and if
>> > > you do, on what branch?
>> > 
>> > I still see this problem, and it's on the master branch.
>> 
>> OK, thanks.  I moved the bug to block the release of 26.1, which will
>> be the first version released from what is now master.
>
> Any news on this one?  It's been almost a year since the last
> communication about this issue; if we still have the problem, why is
> no one else seeing it?

I still haven't found time to debug it, but I see the hangs daily (on
builds on different machines).  I'd guess few other people see this
because they don't use the network as intensely from Emacs (you need to
have two TLS connections being set up "simultaneously" to see the hang).

But I don't think this should be a release blocker, precisely because so
few people see the bug.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2017-09-02 13:04           ` Lars Ingebrigtsen
@ 2017-09-02 14:21             ` Eli Zaretskii
  2018-02-18 17:57               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2017-09-02 14:21 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

unblock 24655 by 24201
thanks

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Sat, 02 Sep 2017 15:04:31 +0200
> 
> > Any news on this one?  It's been almost a year since the last
> > communication about this issue; if we still have the problem, why is
> > no one else seeing it?
> 
> I still haven't found time to debug it, but I see the hangs daily (on
> builds on different machines).  I'd guess few other people see this
> because they don't use the network as intensely from Emacs (you need to
> have two TLS connections being set up "simultaneously" to see the hang).
> 
> But I don't think this should be a release blocker, precisely because so
> few people see the bug.

OK, I'm unblocking Emacs 26.1 due to this.

Hopefully, you will find tome to debug this at some point.

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2017-09-02 14:21             ` Eli Zaretskii
@ 2018-02-18 17:57               ` Lars Ingebrigtsen
  2018-02-19 16:52                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2018-02-18 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

OK, I finally found a way to semi-reliably reproduce one of these hangs
I've been seeing.

The symptoms are that I move from one Gnus newsgroup to another, and in
certain circumstances Emacs will then hang forever.  `C-g' does work,
but only after hitting it a number of times -- it varies how many, but
5-10 `C-g's usually allows Emacs to proceed.

I've included the xbacktrace and the backtrace from gdb below.  If I
interpret this correctly, it looks like we're in
wait_reading_process_output, and that code decides to let a timer run,
and the timer then tries to establish a TLS connection, then we'll hang
indefinitely.

I think.

Should I try to run this in an unoptimized build to get more values
filled in?

(gdb) xbacktrace
"accept-process-output" (0xffffb138)
"network-stream-get-response" (0xffffb3d8)
"network-stream-open-starttls" (0xffffb888)
"open-network-stream" (0xffffbb90)
"nntp-open-connection" (0xffffc090)
"nntp-open-server" (0xffffc3a0)
"nntp-possibly-change-group" (0xffffca80)
"nntp-with-open-group-function" (0xffffcd30)
"nntp-request-article" (0xffffcfb0)
"gnus-request-article" (0xffffd250)
"gnus-async-prefetch-article" (0xffffd5e8)
"apply" (0xffffd6f8)
"timer-event-handler" (0xffffd9b8)

#0  pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa4f0, oldmask=oldmask@entry=0x7fffffffa590)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
#1  0x000000000050b41e in block_interrupt_signal (oldset=oldset@entry=0x7fffffffa590)
    at sysdep.c:843
#2  0x00000000005c821d in really_call_select (arg=0x7fffffffa630) at thread.c:567
#3  0x00000000005c8789 in thread_select (func=<optimized out>, max_fds=max_fds@entry=16, 
    rfds=rfds@entry=0x7fffffffa700, wfds=wfds@entry=0x7fffffffa780, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffad50, sigmask=0x0) at thread.c:602
#4  0x00000000005e5a35 in xg_select (fds_lim=16, rfds=rfds@entry=0x7fffffffae40, 
    wfds=wfds@entry=0x7fffffffaec0, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffad50, sigmask=sigmask@entry=0x0) at xgselect.c:117
#5  0x00000000005a9f29 in wait_reading_process_output (time_limit=<optimized out>, 
    nsecs=<optimized out>, read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x40c3ba0, 
    just_wait_proc=0) at process.c:5379
#6  0x00000000005ab808 in Faccept_process_output (process=XIL(0x40c3ba5), 
    seconds=<optimized out>, millisec=<optimized out>, just_this_one=<optimized out>)
    at process.c:4667
#7  0x0000000000568404 in Ffuncall (nargs=<optimized out>, 
    args=args@entry=0x7fffffffb130) at eval.c:2818
#8  0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x4201a65), maxdepth=<optimized out>, args_template=<optimized out>, 
    nargs=nargs@entry=59, args=<optimized out>, args@entry=0x3b) at bytecode.c:632
#9  0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffb150), nargs=59, 
    nargs@entry=3, arg_vector=0x3b, arg_vector@entry=0x7fffffffb3d8) at eval.c:3019
#10 0x000000000056834b in Ffuncall (nargs=4, args=args@entry=0x7fffffffb3d0)
    at eval.c:2832
#11 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x4201805), maxdepth=<optimized out>, args_template=<optimized out>, 
    nargs=nargs@entry=58, args=<optimized out>, args@entry=0x3a) at bytecode.c:632
#12 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffb448), nargs=58, 
    nargs@entry=5, arg_vector=0x3a, arg_vector@entry=0x7fffffffb888) at eval.c:3019
#13 0x000000000056834b in Ffuncall (nargs=6, args=args@entry=0x7fffffffb880)
    at eval.c:2832
#14 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x42014f5), maxdepth=<optimized out>, args_template=<optimized out>, 
    nargs=nargs@entry=56, args=<optimized out>, args@entry=0x38) at bytecode.c:632
#15 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffb8d8), nargs=56, 
    nargs@entry=14, arg_vector=0x38, arg_vector@entry=0x7fffffffbb90) at eval.c:3019
#16 0x000000000056834b in Ffuncall (nargs=15, args=args@entry=0x7fffffffbb88)
    at eval.c:2832
#17 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x416a9c5), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#18 0x0000000000567f69 in funcall_lambda (fun=XIL(0x416ad05), nargs=nargs@entry=1, 
    arg_vector=arg_vector@entry=0x7fffffffc090) at eval.c:3101
#19 0x000000000056834b in Ffuncall (nargs=2, args=args@entry=0x7fffffffc088)
    at eval.c:2832
#20 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x4164515), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#21 0x0000000000567f69 in funcall_lambda (fun=XIL(0x4161925), nargs=nargs@entry=3, 
    arg_vector=arg_vector@entry=0x7fffffffc3a0) at eval.c:3101
#22 0x000000000056834b in Ffuncall (nargs=4, args=args@entry=0x7fffffffc398)
    at eval.c:2832
#23 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x416b7d5), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#24 0x0000000000567f69 in funcall_lambda (fun=XIL(0x416bb25), nargs=nargs@entry=3, 
    arg_vector=arg_vector@entry=0x7fffffffca80) at eval.c:3101
#25 0x000000000056834b in Ffuncall (nargs=4, args=args@entry=0x7fffffffca78)
    at eval.c:2832
#26 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x415c965), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#27 0x0000000000567f69 in funcall_lambda (fun=XIL(0x415ca15), nargs=nargs@entry=4, 
    arg_vector=arg_vector@entry=0x7fffffffcd30) at eval.c:3101
#28 0x000000000056834b in Ffuncall (nargs=5, args=args@entry=0x7fffffffcd28)
    at eval.c:2832
#29 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x4161285), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#30 0x0000000000567f69 in funcall_lambda (fun=XIL(0x41612b5), nargs=nargs@entry=4, 
    arg_vector=arg_vector@entry=0x7fffffffcfb0) at eval.c:3101
#31 0x000000000056834b in Ffuncall (nargs=5, args=args@entry=0x7fffffffcfa8)
    at eval.c:2832
#32 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x3e57535), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#33 0x0000000000567f69 in funcall_lambda (fun=XIL(0x3e1d7f5), nargs=nargs@entry=2, 
    arg_vector=arg_vector@entry=0x7fffffffd250) at eval.c:3101
#34 0x000000000056834b in Ffuncall (nargs=3, args=args@entry=0x7fffffffd248)
    at eval.c:2832
#35 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, 
    vector=XIL(0x41b5ba5), maxdepth=<optimized out>, 
    args_template=args_template@entry=XIL(0), nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x0) at bytecode.c:632
#36 0x0000000000567f69 in funcall_lambda (fun=XIL(0x424d585), nargs=nargs@entry=3, 
    arg_vector=arg_vector@entry=0x7fffffffd5e8) at eval.c:3101
#37 0x000000000056834b in Ffuncall (nargs=nargs@entry=4, args=0x7fffffffd5e0)
    at eval.c:2832
#38 0x0000000000569d70 in Fapply (nargs=2, args=<optimized out>) at eval.c:2438
#39 0x0000000000568404 in Ffuncall (nargs=<optimized out>, 
    args=args@entry=0x7fffffffd6f0) at eval.c:2818
#40 0x000000000059eea8 in exec_byte_code (bytestr=<optimized out>, vector=XIL(0x99b3dd), 
    maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=6, 
    args=<optimized out>, args@entry=0x6) at bytecode.c:632
#41 0x000000000056809f in funcall_lambda (fun=XIL(0x7fffffffd728), nargs=6, 
    nargs@entry=1, arg_vector=0x6, arg_vector@entry=0x7fffffffd9b8) at eval.c:3019
#42 0x000000000056834b in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffffd9b0)
    at eval.c:2832
#43 0x00000000005684ba in call1 (fn=fn@entry=XIL(0xc480), arg1=arg1@entry=XIL(0x4b72725))
    at eval.c:2669
#44 0x00000000004f8ebb in timer_check_2 (idle_timers=<optimized out>, 
    timers=<optimized out>) at keyboard.c:4330
#45 timer_check () at keyboard.c:4392
#46 0x00000000005a9367 in wait_reading_process_output (time_limit=time_limit@entry=30, 
    nsecs=nsecs@entry=0, read_kbd=-1, do_display=do_display@entry=true, 
    wait_for_cell=wait_for_cell@entry=XIL(0), wait_proc=wait_proc@entry=0x0, 
    just_wait_proc=0) at process.c:5144
#47 0x0000000000424060 in sit_for (timeout=<optimized out>, reading=reading@entry=true, 
    display_option=display_option@entry=1) at dispnew.c:5804
#48 0x00000000004feae2 in read_char (commandflag=commandflag@entry=1, 
    map=map@entry=XIL(0x4dca1c3), prev_event=<optimized out>, 
    used_mouse_menu=used_mouse_menu@entry=0x7fffffffe5ab, end_time=end_time@entry=0x0)
    at keyboard.c:2723
#49 0x00000000004ff7b7 in read_key_sequence (keybuf=keybuf@entry=0x7fffffffe6a0, 
    prompt=prompt@entry=XIL(0), dont_downcase_last=dont_downcase_last@entry=false, 
    can_return_switch_frame=can_return_switch_frame@entry=true, 
    fix_current_buffer=fix_current_buffer@entry=true, 
    prevent_redisplay=prevent_redisplay@entry=false, bufsize=30) at keyboard.c:9137
#50 0x00000000005013be in command_loop_1 () at keyboard.c:1370
#51 0x00000000005669ae in internal_condition_case (
    bfun=bfun@entry=0x5011a0 <command_loop_1>, handlers=handlers@entry=XIL(0x5280), 
    hfun=hfun@entry=0x4f8210 <cmd_error>) at eval.c:1332
#52 0x00000000004f2b94 in command_loop_2 (ignore=ignore@entry=XIL(0)) at keyboard.c:1111
#53 0x000000000056694c in internal_catch (tag=tag@entry=XIL(0xc750), 
    func=func@entry=0x4f2b70 <command_loop_2>, arg=arg@entry=XIL(0)) at eval.c:1097
#54 0x00000000004f2b50 in command_loop () at keyboard.c:1090
#55 0x00000000004f7df3 in recursive_edit_1 () at keyboard.c:696
#56 0x00000000004f8158 in Frecursive_edit () at keyboard.c:767
#57 0x000000000041a5c3 in main (argc=<optimized out>, argv=0x7fffffffea18)
    at emacs.c:1724

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-18 17:57               ` Lars Ingebrigtsen
@ 2018-02-19 16:52                 ` Lars Ingebrigtsen
  2018-02-19 17:56                   ` Paul Eggert
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2018-02-19 16:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

It seems more difficult to reproduce the bug in a non-optimised build,
but it finally happened:

#0  pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
#1  0x00000000005800ec in block_interrupt_signal (oldset=0x7fffffff8a60) at sysdep.c:843
#2  0x00000000006898fc in really_call_select (arg=0x7fffffff8bb0) at thread.c:575
#3  0x00000000005e08ed in flush_stack_call_func (func=0x689853 <really_call_select>, 
    arg=0x7fffffff8bb0) at alloc.c:5218
#4  0x000000000068999e in thread_select (func=0x7fffefb02420 <__pselect>, max_fds=16, 
    rfds=0x7fffffff90e0, wfds=0x7fffffff9060, efds=0x0, timeout=0x7fffffff9380, 
    sigmask=0x0) at thread.c:602
#5  0x00000000006b2ce1 in xg_select (fds_lim=16, rfds=0x7fffffff9420, 
    wfds=0x7fffffff93a0, efds=0x0, timeout=0x7fffffff9380, sigmask=0x0) at xgselect.c:117
#6  0x000000000065f6c5 in wait_reading_process_output (time_limit=0, nsecs=50000000, 
    read_kbd=0, do_display=false, wait_for_cell=XIL(0), wait_proc=0x150084d0, 
    just_wait_proc=0) at process.c:5379
#7  0x000000000065da81 in Faccept_process_output (process=XIL(0x150084d5), 
    seconds=XIL(0xd44b7f), millisec=make_number(50), just_this_one=XIL(0))
    at process.c:4667
#8  0x00000000006081b1 in funcall_subr (subr=0xc46580 <Saccept_process_output>, 
    numargs=3, args=0x7fffffff9768) at eval.c:2901
#9  0x0000000000607d46 in Ffuncall (nargs=4, args=0x7fffffff9760) at eval.c:2818
#10 0x000000000065096e in exec_byte_code (bytestr=XIL(0x43d1f04), vector=XIL(0x414c3a5), 
    maxdepth=make_number(7), args_template=make_number(771), nargs=3, 
    args=0x7fffffff9cb0) at bytecode.c:632
#11 0x000000000060877c in funcall_lambda (fun=XIL(0x414c3f5), nargs=3, 
    arg_vector=0x7fffffff9c98) at eval.c:3019
#12 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffff9c90) at eval.c:2820
#13 0x000000000065096e in exec_byte_code (bytestr=XIL(0x43d1d04), vector=XIL(0x42e7be5), 
    maxdepth=make_number(28), args_template=make_number(1285), nargs=5, 
    args=0x7fffffffa400) at bytecode.c:632
#14 0x000000000060877c in funcall_lambda (fun=XIL(0x414c325), nargs=5, 
    arg_vector=0x7fffffffa3d8) at eval.c:3019
#15 0x0000000000607d8a in Ffuncall (nargs=6, args=0x7fffffffa3d0) at eval.c:2820
#16 0x000000000065096e in exec_byte_code (bytestr=XIL(0x43d6684), vector=XIL(0x414c015), 
    maxdepth=make_number(22), args_template=make_number(1156), nargs=14, 
    args=0x7fffffffa990) at bytecode.c:632
#17 0x000000000060877c in funcall_lambda (fun=XIL(0x414c175), nargs=14, 
    arg_vector=0x7fffffffa970) at eval.c:3019
#18 0x0000000000607d8a in Ffuncall (nargs=15, args=0x7fffffffa968) at eval.c:2820
#19 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4235df4), vector=XIL(0x42361a5), 
    maxdepth=make_number(15), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#20 0x0000000000608b04 in funcall_lambda (fun=XIL(0x4235b35), nargs=1, 
    arg_vector=0x42361a5) at eval.c:3101
#21 0x0000000000607d8a in Ffuncall (nargs=2, args=0x7fffffffb088) at eval.c:2820
#22 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4230074), vector=XIL(0x422f5b5), 
    maxdepth=make_number(4), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#23 0x0000000000608b04 in funcall_lambda (fun=XIL(0x422f685), nargs=3, 
    arg_vector=0x422f5b5) at eval.c:3101
#24 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffffb5b8) at eval.c:2820
#25 0x000000000065096e in exec_byte_code (bytestr=XIL(0x42373c4), vector=XIL(0x4236965), 
    maxdepth=make_number(8), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#26 0x0000000000608b04 in funcall_lambda (fun=XIL(0x4236cb5), nargs=3, 
    arg_vector=0x4236965) at eval.c:3101
#27 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffffbeb8) at eval.c:2820
#28 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4226f24), vector=XIL(0x4228785), 
    maxdepth=make_number(4), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#29 0x0000000000608b04 in funcall_lambda (fun=XIL(0x4227ee5), nargs=4, 
    arg_vector=0x4228785) at eval.c:3101
#30 0x0000000000607d8a in Ffuncall (nargs=5, args=0x7fffffffc388) at eval.c:2820
#31 0x000000000065096e in exec_byte_code (bytestr=XIL(0x422e994), vector=XIL(0x422db75), 
    maxdepth=make_number(5), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#32 0x0000000000608b04 in funcall_lambda (fun=XIL(0x422dba5), nargs=4, 
    arg_vector=0x422db75) at eval.c:3101
#33 0x0000000000607d8a in Ffuncall (nargs=5, args=0x7fffffffc828) at eval.c:2820
#34 0x000000000065096e in exec_byte_code (bytestr=XIL(0x3b09454), vector=XIL(0x3f7ec65), 
    maxdepth=make_number(5), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#35 0x0000000000608b04 in funcall_lambda (fun=XIL(0x3eef535), nargs=2, 
    arg_vector=0x3f7ec65) at eval.c:3101
#36 0x0000000000607d8a in Ffuncall (nargs=3, args=0x7fffffffcce8) at eval.c:2820
#37 0x000000000065096e in exec_byte_code (bytestr=XIL(0x4e49874), vector=XIL(0x42f70d5), 
    maxdepth=make_number(7), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:632
#38 0x0000000000608b04 in funcall_lambda (fun=XIL(0x42d4fd5), nargs=3, 
    arg_vector=0x42f70d5) at eval.c:3101
#39 0x0000000000607d8a in Ffuncall (nargs=4, args=0x7fffffffd2a0) at eval.c:2820
#40 0x000000000060716c in Fapply (nargs=2, args=0x7fffffffd488) at eval.c:2438
#41 0x0000000000608049 in funcall_subr (subr=0xc42a00 <Sapply>, numargs=2, 
    args=0x7fffffffd488) at eval.c:2873
#42 0x0000000000607d46 in Ffuncall (nargs=3, args=0x7fffffffd480) at eval.c:2818
#43 0x000000000065096e in exec_byte_code (bytestr=XIL(0xa6864c), vector=XIL(0xa6866d), 
    maxdepth=make_number(10), args_template=make_number(257), nargs=1, 
    args=0x7fffffffd9f0) at bytecode.c:632
#44 0x000000000060877c in funcall_lambda (fun=XIL(0xa6861d), nargs=1, 
    arg_vector=0x7fffffffd9e8) at eval.c:3019
#45 0x0000000000607d8a in Ffuncall (nargs=2, args=0x7fffffffd9e0) at eval.c:2820
#46 0x0000000000607744 in call1 (fn=XIL(0xc480), arg1=XIL(0x11d06665)) at eval.c:2669
#47 0x000000000056763e in timer_check_2 (timers=XIL(0x9cd82a3), 
    idle_timers=XIL(0x9cd82d3)) at keyboard.c:4330
#48 0x000000000056776c in timer_check () at keyboard.c:4392
#49 0x0000000000565989 in readable_events (flags=1) at keyboard.c:3349
#50 0x000000000056bed8 in get_input_pending (flags=1) at keyboard.c:6805
#51 0x00000000005723a2 in detect_input_pending_run_timers (do_display=true)
    at keyboard.c:9943
#52 0x000000000065fb9d in wait_reading_process_output (time_limit=30, nsecs=0, 
    read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0)
    at process.c:5516
#53 0x0000000000425fd4 in sit_for (timeout=make_number(30), reading=true, 
    display_option=1) at dispnew.c:5804
#54 0x00000000005640ac in read_char (commandflag=1, map=XIL(0xb4357a3), 
    prev_event=XIL(0), used_mouse_menu=0x7fffffffe39f, end_time=0x0) at keyboard.c:2723
#55 0x0000000000570b12 in read_key_sequence (keybuf=0x7fffffffe560, bufsize=30, 
    prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, 
    fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9137
#56 0x0000000000560d38 in command_loop_1 () at keyboard.c:1370
#57 0x00000000006046ce in internal_condition_case (bfun=0x560902 <command_loop_1>, 
    handlers=XIL(0x5280), hfun=0x5600b8 <cmd_error>) at eval.c:1332
#58 0x00000000005605ec in command_loop_2 (ignore=XIL(0)) at keyboard.c:1111
#59 0x0000000000603fa0 in internal_catch (tag=XIL(0xc750), 
    func=0x5605bf <command_loop_2>, arg=XIL(0)) at eval.c:1097
#60 0x000000000056058a in command_loop () at keyboard.c:1090
#61 0x000000000055fc87 in recursive_edit_1 () at keyboard.c:696
#62 0x000000000055fe0b in Frecursive_edit () at keyboard.c:767
#63 0x000000000055da7c in main (argc=1, argv=0x7fffffffea18) at emacs.c:1724

Lisp Backtrace:
"accept-process-output" (0xffff9768)
"network-stream-get-response" (0xffff9c98)
"network-stream-open-starttls" (0xffffa3d8)
"open-network-stream" (0xffffa970)
"nntp-open-connection" (0xffffb090)
"nntp-open-server" (0xffffb5c0)
"nntp-possibly-change-group" (0xffffbec0)
"nntp-with-open-group-function" (0xffffc390)
"nntp-request-article" (0xffffc830)
"gnus-request-article" (0xffffccf0)
"gnus-async-prefetch-article" (0xffffd2a8)
"apply" (0xffffd488)
"timer-event-handler" (0xffffd9e8)


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 16:52                 ` Lars Ingebrigtsen
@ 2018-02-19 17:56                   ` Paul Eggert
  2018-02-19 18:16                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Paul Eggert @ 2018-02-19 17:56 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: 24201

Lars Ingebrigtsen wrote:
> #0  pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60)
>      at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50

When you say that Emacs "hangs" surely you don't mean that the call to 
pthread_sigmask is hanging. So, can we assume that Emacs is busy-waiting for 
something? Is the CPU pegged? What does the busy-wait loop look like?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 17:56                   ` Paul Eggert
@ 2018-02-19 18:16                     ` Lars Ingebrigtsen
  2018-02-19 18:32                       ` Eli Zaretskii
  2018-02-19 18:37                       ` Andreas Schwab
  0 siblings, 2 replies; 46+ messages in thread
From: Lars Ingebrigtsen @ 2018-02-19 18:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 24201

Paul Eggert <eggert@cs.ucla.edu> writes:

> Lars Ingebrigtsen wrote:
>> #0  pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60)
>>      at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
>
> When you say that Emacs "hangs" surely you don't mean that the call to
> pthread_sigmask is hanging. So, can we assume that Emacs is
> busy-waiting for something? Is the CPU pegged? What does the busy-wait
> loop look like?

It's using 100% CPU, so it's busy-waiting, but every time I stop it,
it's in that pthread_sigmask, as far as I'm able to tell...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 18:16                     ` Lars Ingebrigtsen
@ 2018-02-19 18:32                       ` Eli Zaretskii
  2018-02-19 19:06                         ` Lars Ingebrigtsen
  2018-02-19 18:37                       ` Andreas Schwab
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2018-02-19 18:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  24201@debbugs.gnu.org
> Date: Mon, 19 Feb 2018 19:16:50 +0100
> 
> Paul Eggert <eggert@cs.ucla.edu> writes:
> 
> > Lars Ingebrigtsen wrote:
> >> #0  pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60)
> >>      at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
> >
> > When you say that Emacs "hangs" surely you don't mean that the call to
> > pthread_sigmask is hanging. So, can we assume that Emacs is
> > busy-waiting for something? Is the CPU pegged? What does the busy-wait
> > loop look like?
> 
> It's using 100% CPU, so it's busy-waiting, but every time I stop it,
> it's in that pthread_sigmask, as far as I'm able to tell...

And if you then say "continue" or "finish", does it exit from the
pthread_sigmask call?  If not, some other thread is spinning.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 18:16                     ` Lars Ingebrigtsen
  2018-02-19 18:32                       ` Eli Zaretskii
@ 2018-02-19 18:37                       ` Andreas Schwab
  1 sibling, 0 replies; 46+ messages in thread
From: Andreas Schwab @ 2018-02-19 18:37 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, Paul Eggert

On Feb 19 2018, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> Paul Eggert <eggert@cs.ucla.edu> writes:
>
>> Lars Ingebrigtsen wrote:
>>> #0  pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffffff8a60)
>>>      at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
>>
>> When you say that Emacs "hangs" surely you don't mean that the call to
>> pthread_sigmask is hanging. So, can we assume that Emacs is
>> busy-waiting for something? Is the CPU pegged? What does the busy-wait
>> loop look like?
>
> It's using 100% CPU, so it's busy-waiting, but every time I stop it,
> it's in that pthread_sigmask, as far as I'm able to tell...

Since it's making a syscall the loop is likely spending a significant
part of the time there.  That doesn't mean it's reponsible for the busy
wait.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 18:32                       ` Eli Zaretskii
@ 2018-02-19 19:06                         ` Lars Ingebrigtsen
  2018-02-19 19:57                           ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2018-02-19 19:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> It's using 100% CPU, so it's busy-waiting, but every time I stop it,
>> it's in that pthread_sigmask, as far as I'm able to tell...
>
> And if you then say "continue" or "finish", does it exit from the
> pthread_sigmask call?  If not, some other thread is spinning.

Hm...  actually, I don't know.  I say "cont" and then break it again,
and it says that it's in a pthread_sigmask call, but I don't know
whether it's the same one...

The process has unfortunately exited now, but the next time this
happens, what's the best way to debug what's going on?  I'm not very
well versed in gdb usage.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 19:06                         ` Lars Ingebrigtsen
@ 2018-02-19 19:57                           ` Eli Zaretskii
  2018-02-19 20:39                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2018-02-19 19:57 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: eggert@cs.ucla.edu,  24201@debbugs.gnu.org
> Date: Mon, 19 Feb 2018 20:06:46 +0100
> 
> Hm...  actually, I don't know.  I say "cont" and then break it again,
> and it says that it's in a pthread_sigmask call, but I don't know
> whether it's the same one...
> 
> The process has unfortunately exited now, but the next time this
> happens, what's the best way to debug what's going on?  I'm not very
> well versed in gdb usage.

The best way is to start typing "finish" until it doesn't return from
the frame.  See the instructions in etc/DEBUG starting at "If Emacs is
in an infinite loop".  If something in the instructions is unclear,
please don't hesitate asking.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 19:57                           ` Eli Zaretskii
@ 2018-02-19 20:39                             ` Lars Ingebrigtsen
  2019-06-24 13:25                               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2018-02-19 20:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

> The best way is to start typing "finish" until it doesn't return from
> the frame.  See the instructions in etc/DEBUG starting at "If Emacs is
> in an infinite loop".  If something in the instructions is unclear,
> please don't hesitate asking.

Thanks; I'll give it a try (and read the document) the next time this
happens.  :-)  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2018-02-19 20:39                             ` Lars Ingebrigtsen
@ 2019-06-24 13:25                               ` Lars Ingebrigtsen
  2019-06-24 19:20                                 ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-24 13:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

I've now started looking at this again, and is running Emacs under gdb.

Last night, I got the hang again.  I C-z'd in gdb (i.e., SIGTSTP), and
then did "finish" until it returned sufficiently that we got back to the
hang, and the I repeated, the procedure a couple of times, until the
file "finish" returned to Emacs and Emacs was un-hung.

These hangs usually last for about a minute or so, with intervention
with gdb or not, so I would guess that the code stops hanging when the
peer finally gives up on the connection and hangs up or something.

Does the output from gdb have any smoking gun about what's making this
semi-infloop in accept-process-output?

(In case nobody remembers what this bug report is about: Sometimes, in
rare circumstances (like once every other day), when opening a TLS
connection from a timer (or probably any other asynchronous context),
and there's other TLS traffic already in happening, opening a new TLS
connection will hang.  I have no repeatable test case, though.)

Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa500, oldmask=oldmask@entry=0x7fffffffa5a0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
50	../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory.
(gdb) xbacktrace 
"accept-process-output" (0xffffb198)
"network-stream-get-response" (0xffffb428)
"network-stream-open-starttls" (0xffffb8d8)
"open-network-stream" (0xffffbbc0)
"nntp-open-connection" (0xffffc0a0)
"nntp-open-server" (0xffffc3a0)
"nntp-possibly-change-group" (0xffffca50)
"nntp-with-open-group-function" (0xffffccf0)
"nntp-request-article" (0xffffcf60)
"gnus-request-article" (0xffffd228)
"gnus-async-prefetch-article" (0xffffd538)
"apply" (0xffffd648)
"timer-event-handler" (0xffffd8f8)
(gdb) bt
#0  pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa500, oldmask=oldmask@entry=0x7fffffffa5a0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
#1  0x00005555556858be in block_interrupt_signal (
    oldset=oldset@entry=0x7fffffffa5a0) at sysdep.c:873
#2  0x000055555574cbed in really_call_select (arg=0x7fffffffa640)
    at thread.c:581
#3  0x000055555574d1d9 in thread_select (func=<optimized out>, 
    max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffa710, 
    wfds=wfds@entry=0x7fffffffa790, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffad60, sigmask=0x0) at thread.c:616
#4  0x000055555576aa75 in xg_select (fds_lim=23, 
    rfds=rfds@entry=0x7fffffffae90, wfds=wfds@entry=0x7fffffffaf10, 
    efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, 
    sigmask=sigmask@entry=0x0) at xgselect.c:117
#5  0x000055555572bb97 in wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, 
    read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555556830280, just_wait_proc=0)
    at process.c:5427
#6  0x000055555572d45b in Faccept_process_output (process=XIL(0x555556830285), 
    seconds=<optimized out>, millisec=<optimized out>, 
---Type <return> to continue, or q <return> to quit---q
just_this_one=<optimized Quit
(gdb) finish
Run till exit from #0  pthread_sigmask (how=how@entry=0, 
    newmask=<optimized out>, newmask@entry=0x7fffffffa500, 
    oldmask=oldmask@entry=0x7fffffffa5a0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
block_interrupt_signal (oldset=oldset@entry=0x7fffffffa5a0) at sysdep.c:874
warning: Source file is more recent than executable.
874	  pthread_sigmask (SIG_SETMASK, oldset, 0);
Value returned is $6 = 0
(gdb) finish
Run till exit from #0  block_interrupt_signal (
    oldset=oldset@entry=0x7fffffffa5a0) at sysdep.c:874
really_call_select (arg=0x7fffffffa640) at thread.c:583
583	  release_global_lock ();
(gdb) finish
Run till exit from #0  really_call_select (arg=0x7fffffffa640) at thread.c:583
thread_select (func=<optimized out>, max_fds=max_fds@entry=23, 
    rfds=rfds@entry=0x7fffffffa710, wfds=wfds@entry=0x7fffffffa790, 
    efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffad60, sigmask=0x0)
    at thread.c:618
618	}
(gdb) finish
Run till exit from #0  thread_select (func=<optimized out>, 
    max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffa710, 
    wfds=wfds@entry=0x7fffffffa790, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffad60, sigmask=0x0) at thread.c:618
xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffae90, 
    wfds=wfds@entry=0x7fffffffaf10, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffad60, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
120	  if (nfds < 0)
Value returned is $7 = 3
(gdb) finish
Run till exit from #0  xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffae90, 
    wfds=wfds@entry=0x7fffffffaf10, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffad60, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
wait_reading_process_output (time_limit=<optimized out>, 
    nsecs=<optimized out>, read_kbd=read_kbd@entry=0, 
    do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555556830280, just_wait_proc=0)
    at process.c:5447
5447	          if (nfds == 0)
Value returned is $8 = 2
(gdb) finish
Run till exit from #0  wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, 
    read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555556830280, just_wait_proc=0)
    at process.c:5447
Faccept_process_output (process=XIL(0x555556830285), seconds=<optimized out>, 
    millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715
4715	     ? Qnil : Qt);
Value returned is $9 = 74
(gdb) finish
Run till exit from #0  Faccept_process_output (process=XIL(0x555556830285), 
    seconds=<optimized out>, millisec=<optimized out>, 
    just_this_one=<optimized out>) at process.c:4715
0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb190)
    at eval.c:2803
2803	    val = funcall_subr (XSUBR (fun), numargs, args + 1);
Value returned is $10 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x00005555556ea3d9 in Ffuncall (nargs=3, 
    args=args@entry=0x7fffffffb190) at eval.c:2803
0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x35) at bytecode.c:633
warning: Source file is more recent than executable.
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $11 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x0000555555720e60 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x35) at bytecode.c:633
Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826
2826	  lisp_eval_depth--;
Value returned is $12 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  Ffuncall (nargs=4, args=args@entry=0x7fffffffb420)
    at eval.c:2826
0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x34) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $13 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  0x0000555555720e60 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x34) at bytecode.c:633

^Z
Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa800, oldmask=oldmask@entry=0x7fffffffa8a0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
50	../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory.
(gdb) xbacktrace 
"accept-process-output" (0xffffb498)
"network-stream-get-response" (0xffffb728)
"network-stream-open-starttls" (0xffffbbd8)
"open-network-stream" (0xffffbec0)
"nntp-open-connection" (0xffffc3a0)
"nntp-possibly-change-group" (0xffffca50)
"nntp-with-open-group-function" (0xffffccf0)
"nntp-request-article" (0xffffcf60)
"gnus-request-article" (0xffffd228)
"gnus-async-prefetch-article" (0xffffd538)
"apply" (0xffffd648)
"timer-event-handler" (0xffffd8f8)
(gdb) finish
Run till exit from #0  pthread_sigmask (how=how@entry=0, 
    newmask=<optimized out>, newmask@entry=0x7fffffffa800, 
    oldmask=oldmask@entry=0x7fffffffa8a0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
block_interrupt_signal (oldset=oldset@entry=0x7fffffffa8a0) at sysdep.c:874
874	  pthread_sigmask (SIG_SETMASK, oldset, 0);
Value returned is $14 = 0
(gdb) finish
Run till exit from #0  block_interrupt_signal (
    oldset=oldset@entry=0x7fffffffa8a0) at sysdep.c:874
really_call_select (arg=0x7fffffffa940) at thread.c:583
583	  release_global_lock ();
(gdb) finish
Run till exit from #0  really_call_select (arg=0x7fffffffa940) at thread.c:583
thread_select (func=<optimized out>, max_fds=max_fds@entry=23, 
    rfds=rfds@entry=0x7fffffffaa10, wfds=wfds@entry=0x7fffffffaa90, 
    efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb060, sigmask=0x0)
    at thread.c:618
618	}
(gdb) finish
Run till exit from #0  thread_select (func=<optimized out>, 
    max_fds=max_fds@entry=23, rfds=rfds@entry=0x7fffffffaa10, 
    wfds=wfds@entry=0x7fffffffaa90, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffb060, sigmask=0x0) at thread.c:618
xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffb190, 
    wfds=wfds@entry=0x7fffffffb210, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffb060, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
120	  if (nfds < 0)
Value returned is $15 = 3
(gdb) finish
Run till exit from #0  xg_select (fds_lim=23, rfds=rfds@entry=0x7fffffffb190, 
    wfds=wfds@entry=0x7fffffffb210, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffb060, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
wait_reading_process_output (time_limit=<optimized out>, 
    nsecs=<optimized out>, read_kbd=read_kbd@entry=0, 
    do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x55555b23b750, just_wait_proc=0)
    at process.c:5447
5447	          if (nfds == 0)
Value returned is $16 = 2
(gdb) finish
Run till exit from #0  wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, 
    read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x55555b23b750, just_wait_proc=0)
    at process.c:5447
Faccept_process_output (process=XIL(0x55555b23b755), seconds=<optimized out>, 
    millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715
4715	     ? Qnil : Qt);
Value returned is $17 = 74
(gdb) finish
Run till exit from #0  Faccept_process_output (process=XIL(0x55555b23b755), 
    seconds=<optimized out>, millisec=<optimized out>, 
    just_this_one=<optimized out>) at process.c:4715
0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb490)
    at eval.c:2803
2803	    val = funcall_subr (XSUBR (fun), numargs, args + 1);
Value returned is $18 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x00005555556ea3d9 in Ffuncall (nargs=3, 
    args=args@entry=0x7fffffffb490) at eval.c:2803
0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x31) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $19 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x0000555555720e60 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x31) at bytecode.c:633
Ffuncall (nargs=4, args=args@entry=0x7fffffffb720) at eval.c:2826
2826	  lisp_eval_depth--;
Value returned is $20 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  Ffuncall (nargs=4, args=args@entry=0x7fffffffb720)
    at eval.c:2826
0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x30) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $21 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  0x0000555555720e60 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x30) at bytecode.c:633

^Z
Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
0x00005555556c6c53 in vectorlike_nbytes (hdr=0x555556d1dd20) at alloc.c:2964
2964	  ptrdiff_t size = hdr->size & ~ARRAY_MARK_FLAG;
(gdb) finish
Run till exit from #0  0x00005555556c6c53 in vectorlike_nbytes (
    hdr=0x555556d1dd20) at alloc.c:2964
sweep_vectors () at alloc.c:3073
3073		      gcstat.total_vector_slots += nbytes / word_size;
Value returned is $22 = 1040
(gdb) finish
Run till exit from #0  sweep_vectors () at alloc.c:3073
gc_sweep () at alloc.c:6973
6973	  pdumper_clear_marks ();
(gdb) finish
Run till exit from #0  gc_sweep () at alloc.c:6973
garbage_collect_1 (gcst=0x7fffffffaa30) at alloc.c:5917
5917	  unmark_main_thread ();
(gdb) finish
Run till exit from #0  garbage_collect_1 (gcst=0x7fffffffaa30) at alloc.c:5917
garbage_collect () at alloc.c:5990
5990	}
Value returned is $23 = true
(gdb) finish
Run till exit from #0  garbage_collect () at alloc.c:5990
Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffffab20) at eval.c:2789
2789	  if (debug_on_next_call)
(gdb) finish
Run till exit from #0  Ffuncall (nargs=nargs@entry=2, 
    args=args@entry=0x7fffffffab20) at eval.c:2789
call1 (fn=fn@entry=XIL(0xc960), arg1=arg1@entry=XIL(0x5555570ee915))
    at eval.c:2657
2657	}
Value returned is $24 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  call1 (fn=fn@entry=XIL(0xc960), 
    arg1=arg1@entry=XIL(0x5555570ee915)) at eval.c:2657
timer_check_2 (idle_timers=XIL(0), timers=XIL(0x555571081263))
    at keyboard.c:4309
4309		      unbind_to (count, Qnil);
Value returned is $25 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  timer_check_2 (idle_timers=XIL(0), 
    timers=XIL(0x555571081263)) at keyboard.c:4309
timer_check () at keyboard.c:4368
4368	      nexttime = timer_check_2 (timers, idle_timers);
(gdb) finish
Run till exit from #0  timer_check () at keyboard.c:4368
0x0000555555673bf9 in readable_events (flags=flags@entry=1) at keyboard.c:3367
3367	    timer_check ();
Value returned is $26 = {tv_sec = 0, tv_nsec = 58418057}
(gdb) finish
Run till exit from #0  0x0000555555673bf9 in readable_events (
    flags=flags@entry=1) at keyboard.c:3367
0x0000555555674638 in get_input_pending (flags=flags@entry=1)
    at keyboard.c:6769
6769	  input_pending = (!NILP (Vquit_flag) || readable_events (flags));
Value returned is $27 = true
(gdb) finish
Run till exit from #0  0x0000555555674638 in get_input_pending (
    flags=flags@entry=1) at keyboard.c:6769
swallow_events (do_display=do_display@entry=false) at keyboard.c:4115
4115	  if (!input_pending && timers_run != old_timers_run && do_display)
Value returned is $28 = true
(gdb) finish
Run till exit from #0  swallow_events (do_display=do_display@entry=false)
    at keyboard.c:4115
wait_reading_process_output (time_limit=<optimized out>, 
    nsecs=<optimized out>, read_kbd=read_kbd@entry=0, 
    do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555556853170, just_wait_proc=0)
    at process.c:5605
5605	      if (! NILP (wait_for_cell) && ! NILP (XCAR (wait_for_cell)))
(gdb) finish
Run till exit from #0  wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, 
    read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555556853170, just_wait_proc=0)
    at process.c:5605
Faccept_process_output (process=XIL(0x555556853175), seconds=<optimized out>, 
    millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715
4715	     ? Qnil : Qt);
Value returned is $29 = 74
(gdb) finish
Run till exit from #0  Faccept_process_output (process=XIL(0x555556853175), 
    seconds=<optimized out>, millisec=<optimized out>, 
    just_this_one=<optimized out>) at process.c:4715
0x00005555556ea3d9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb190)
    at eval.c:2803
2803	    val = funcall_subr (XSUBR (fun), numargs, args + 1);
Value returned is $30 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x00005555556ea3d9 in Ffuncall (nargs=3, 
    args=args@entry=0x7fffffffb190) at eval.c:2803
0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x35) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $31 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x0000555555720e60 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x35) at bytecode.c:633
Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826
2826	  lisp_eval_depth--;
Value returned is $32 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  Ffuncall (nargs=4, args=args@entry=0x7fffffffb420)
    at eval.c:2826
0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x34) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $33 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  0x0000555555720e60 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x34) at bytecode.c:633

^Z
Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
0x00005555556ca268 in mark_object (arg=<optimized out>) at alloc.c:6571
6571		if (NILP (ptr->u.s.u.cdr))
(gdb) 



-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-24 13:25                               ` Lars Ingebrigtsen
@ 2019-06-24 19:20                                 ` Eli Zaretskii
  2019-06-24 20:46                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-24 19:20 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Mon, 24 Jun 2019 15:25:52 +0200
> 
> I've now started looking at this again, and is running Emacs under gdb.

Thanks.

> Last night, I got the hang again.  I C-z'd in gdb (i.e., SIGTSTP), and
> then did "finish" until it returned sufficiently that we got back to the
> hang, and the I repeated, the procedure a couple of times, until the
> file "finish" returned to Emacs and Emacs was un-hung.
> 
> These hangs usually last for about a minute or so, with intervention
> with gdb or not, so I would guess that the code stops hanging when the
> peer finally gives up on the connection and hangs up or something.
> 
> Does the output from gdb have any smoking gun about what's making this
> semi-infloop in accept-process-output?

It doesn't loop in accept-process-output, it loops several stack
frames above that.  accept-process-output returns when you type
"finish", so it by itself is not the culprit.

> (gdb) finish
> Run till exit from #0  Ffuncall (nargs=4, args=args@entry=0x7fffffffb420)
>     at eval.c:2826
> 0x0000555555720e60 in exec_byte_code (bytestr=<optimized out>, 
>     vector=<optimized out>, maxdepth=<optimized out>, 
>     args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
>     args@entry=0x34) at bytecode.c:633
> 633		    TOP = Ffuncall (op + 1, &TOP);
> Value returned is $13 = (union Lisp_X *) 0x0
> (gdb) finish
> Run till exit from #0  0x0000555555720e60 in exec_byte_code (
>     bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
>     args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
>     args@entry=0x34) at bytecode.c:633
> 
> ^Z

Each time "finish" doesn't return you have a call to funcall and 2
calls to exec_byte_code above it.  Can you tell what function is being
called by funcall?  It's easy: after you get to that frame, and see
this:

  Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826
  2826	  lisp_eval_depth--;
  Value returned is $12 = (union Lisp_X *) 0x0

don't type "finish", but instead do this:

  (gdb) p args[0]
  (gdb) xsymbol

You should see one of the Lisp functions you see in the Lisp
backtrace:

  "accept-process-output" (0xffffb498)
  "network-stream-get-response" (0xffffb728)
  "network-stream-open-starttls" (0xffffbbd8)
  "open-network-stream" (0xffffbec0)
  "nntp-open-connection" (0xffffc3a0)
  "nntp-possibly-change-group" (0xffffca50)
  "nntp-with-open-group-function" (0xffffccf0)
  "nntp-request-article" (0xffffcf60)
  "gnus-request-article" (0xffffd228)
  "gnus-async-prefetch-article" (0xffffd538)
  "apply" (0xffffd648)
  "timer-event-handler" (0xffffd8f8)

Once you see the name of that function, look at its caller's code, and
try to figure out what are the two levels of exec_byte_code above
Ffuncall doing, and through this why they might loop.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-24 19:20                                 ` Eli Zaretskii
@ 2019-06-24 20:46                                   ` Lars Ingebrigtsen
  2019-06-25 19:15                                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-24 20:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

> Each time "finish" doesn't return you have a call to funcall and 2
> calls to exec_byte_code above it.  Can you tell what function is being
> called by funcall?  It's easy: after you get to that frame, and see
> this:
>
>   Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826
>   2826	  lisp_eval_depth--;
>   Value returned is $12 = (union Lisp_X *) 0x0
>
> don't type "finish", but instead do this:
>
>   (gdb) p args[0]
>   (gdb) xsymbol

[...]

> Once you see the name of that function, look at its caller's code, and
> try to figure out what are the two levels of exec_byte_code above
> Ffuncall doing, and through this why they might loop.

Ah, OK, I'll try that the next time this happens.

Thanks; I'm a total novice at using gdb to debug Emacs...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-24 20:46                                   ` Lars Ingebrigtsen
@ 2019-06-25 19:15                                     ` Lars Ingebrigtsen
  2019-06-25 21:57                                       ` Lars Ingebrigtsen
  2019-06-26 16:29                                       ` Eli Zaretskii
  0 siblings, 2 replies; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-25 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Lars Ingebrigtsen <larsi@gnus.org> writes:

>>   Ffuncall (nargs=4, args=args@entry=0x7fffffffb420) at eval.c:2826
>>   2826	  lisp_eval_depth--;
>>   Value returned is $12 = (union Lisp_X *) 0x0
>>
>> don't type "finish", but instead do this:
>>
>>   (gdb) p args[0]
>>   (gdb) xsymbol

It's frustrating -- when running under gdb, this hang seems to happen a
lot less often, and it hasn't happen against since last time.  But I was
just looking at the code to try to understand the previous backtrace.

So, the "finish" stuff seemed to show that we weren't hanging in
accept-process-output at all, but were instead just calling functions as
normal, and looping on the Lisp level?  And, for instance, Ffuncall has
a maybe_quit, so C-g should work in these instances?  But they don't,
which is confusing.

Is it even possible for the Lisp-ier functions in the backtrace to
inhibit C-g?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-25 19:15                                     ` Lars Ingebrigtsen
@ 2019-06-25 21:57                                       ` Lars Ingebrigtsen
  2019-06-26 16:32                                         ` Eli Zaretskii
  2019-06-26 16:29                                       ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-25 21:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

OK, got another one.  I hit C-g a bunch in Emacs first, but got no
response, and then TSTP-d the process:

Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa550, oldmask=oldmask@entry=0x7fffffffa5f0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
50	../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory.
(gdb) finish
Run till exit from #0  pthread_sigmask (how=how@entry=0, 
    newmask=<optimized out>, newmask@entry=0x7fffffffa550, 
    oldmask=oldmask@entry=0x7fffffffa5f0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
block_interrupt_signal (oldset=oldset@entry=0x7fffffffa5f0) at sysdep.c:868
868	}
Value returned is $21 = 0
(gdb) finish
Run till exit from #0  block_interrupt_signal (
    oldset=oldset@entry=0x7fffffffa5f0) at sysdep.c:868
really_call_select (arg=0x7fffffffa690) at thread.c:594
594	  if (self->not_holding_lock)
(gdb) finish
Run till exit from #0  really_call_select (arg=0x7fffffffa690) at thread.c:594
thread_select (func=<optimized out>, max_fds=max_fds@entry=22, 
    rfds=rfds@entry=0x7fffffffa760, wfds=wfds@entry=0x7fffffffa7e0, 
    efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffadb0, sigmask=0x0)
    at thread.c:618
618	}
(gdb) finish
Run till exit from #0  thread_select (func=<optimized out>, 
    max_fds=max_fds@entry=22, rfds=rfds@entry=0x7fffffffa760, 
    wfds=wfds@entry=0x7fffffffa7e0, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffadb0, sigmask=0x0) at thread.c:618
xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffaee0, 
    wfds=wfds@entry=0x7fffffffaf60, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffadb0, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
120	  if (nfds < 0)
Value returned is $22 = 1
(gdb) finish
Run till exit from #0  xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffaee0, 
    wfds=wfds@entry=0x7fffffffaf60, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffadb0, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
wait_reading_process_output (time_limit=<optimized out>, 
    nsecs=<optimized out>, read_kbd=read_kbd@entry=0, 
    do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x55555c0176e0, just_wait_proc=0)
    at process.c:5447
5447	          if (nfds == 0)
Value returned is $23 = 1
(gdb) finish
Run till exit from #0  wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, 
    read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x55555c0176e0, just_wait_proc=0)
    at process.c:5447
Faccept_process_output (process=XIL(0x55555c0176e5), seconds=<optimized out>, 
    millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715
4715	     ? Qnil : Qt);
Value returned is $24 = 74
(gdb) finish
Run till exit from #0  Faccept_process_output (process=XIL(0x55555c0176e5), 
    seconds=<optimized out>, millisec=<optimized out>, 
    just_this_one=<optimized out>) at process.c:4715
0x00005555556ea7c9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb1e0)
    at eval.c:2803
2803	    val = funcall_subr (XSUBR (fun), numargs, args + 1);
Value returned is $25 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x00005555556ea7c9 in Ffuncall (nargs=3, 
    args=args@entry=0x7fffffffb1e0) at eval.c:2803
0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x35) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $26 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x0000555555721250 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x35) at bytecode.c:633
Ffuncall (nargs=4, args=args@entry=0x7fffffffb470) at eval.c:2826
2826	  lisp_eval_depth--;
Value returned is $27 = (union Lisp_X *) 0x0

---
OK, so here's I'm at the lisp_eval_depth--.
---

(gdb) p args[0]
$28 = XIL(0x1401750)
(gdb) xsymbol 
$29 = (struct Lisp_Symbol *) 0x555557199570
"network-stream-get-response"
(gdb) xbacktrace 
"network-stream-get-response" (0xffffb478)
"network-stream-open-starttls" (0xffffb928)
"open-network-stream" (0xffffbc10)
"nntp-open-connection" (0xffffc0f0)
"nntp-open-server" (0xffffc3f0)
"nntp-possibly-change-group" (0xffffcaa0)
"nntp-with-open-group-function" (0xffffcd40)
"nntp-request-article" (0xffffcfb0)
"gnus-request-article" (0xffffd278)
"gnus-async-prefetch-article" (0xffffd588)
"apply" (0xffffd698)
"timer-event-handler" (0xffffd948)


And then there's a couple more of the same things: Emacs continues after
a few more finishes, and I TSTP it:


(gdb) finish
Run till exit from #0  Ffuncall (nargs=4, args=args@entry=0x7fffffffb470)
    at eval.c:2826
0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x34) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $30 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  0x0000555555721250 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x34) at bytecode.c:633

^Z
Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
0x00005555556ca609 in cons_marked_p (c=0x55556da37da0) at alloc.c:3774
3774	    : XCONS_MARKED_P (c);
(gdb) xbacktrace 
"Automatic GC" (0x0)
"timer-event-handler" (0xffffaec8)
"accept-process-output" (0xffffb4e8)
"network-stream-get-response" (0xffffb778)
"network-stream-open-starttls" (0xffffbc28)
"open-network-stream" (0xffffbf10)
"nntp-open-connection" (0xffffc3f0)
"nntp-possibly-change-group" (0xffffcaa0)
"nntp-with-open-group-function" (0xffffcd40)
"nntp-request-article" (0xffffcfb0)
"gnus-request-article" (0xffffd278)
"gnus-async-prefetch-article" (0xffffd588)
"apply" (0xffffd698)
"timer-event-handler" (0xffffd948)
(gdb) finish
Run till exit from #0  0x00005555556ca609 in cons_marked_p (c=0x55556da37da0)
    at alloc.c:3774
mark_object (arg=<optimized out>) at alloc.c:6566
6566		if (cons_marked_p (ptr))
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6566
mark_object (arg=<optimized out>) at alloc.c:6580
6580		if (cdr_count == mark_object_loop_halt)
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6580
mark_object (arg=<optimized out>) at alloc.c:6580
6580		if (cdr_count == mark_object_loop_halt)
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6580
mark_object (arg=<optimized out>) at alloc.c:6580
6580		if (cdr_count == mark_object_loop_halt)
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6580
mark_object (arg=<optimized out>) at alloc.c:6580
6580		if (cdr_count == mark_object_loop_halt)
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6580
mark_localized_symbol (ptr=<optimized out>) at alloc.c:6228
6228	  mark_object (blv->defcell);
(gdb) finish
Run till exit from #0  mark_localized_symbol (ptr=<optimized out>)
    at alloc.c:6228
mark_object (arg=<optimized out>) at alloc.c:6544
6544		    break;
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6544
mark_vectorlike (header=0x555556e35a20) at alloc.c:6105
6105	  for (i = 0; i < size; i++) /* ...and then mark its elements.  */
(gdb) finish
Run till exit from #0  mark_vectorlike (header=0x555556e35a20) at alloc.c:6105
mark_object (arg=<optimized out>) at alloc.c:6531
6531		mark_object (ptr->u.s.plist);
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6531
mark_vectorlike (header=0x555556e51a40) at alloc.c:6105
6105	  for (i = 0; i < size; i++) /* ...and then mark its elements.  */
(gdb) finish
Run till exit from #0  mark_vectorlike (header=0x555556e51a40) at alloc.c:6105
mark_object (arg=<optimized out>) at alloc.c:6531
6531		mark_object (ptr->u.s.plist);
(gdb) finish
Run till exit from #0  mark_object (arg=<optimized out>) at alloc.c:6531
mark_vectorlike (header=0x555556e5a7b0) at alloc.c:6105
6105	  for (i = 0; i < size; i++) /* ...and then mark its elements.  */
(gdb) finish
Run till exit from #0  mark_vectorlike (header=0x555556e5a7b0) at alloc.c:6105
mark_object (arg=<optimized out>) at alloc.c:6531
6531		mark_object (ptr->u.s.plist);

But I landed inside a gc, so I just cont:

(gdb) cont
Continuing.
^Z
Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa550, oldmask=oldmask@entry=0x7fffffffa5f0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
50	../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory.
(gdb) finish
Run till exit from #0  pthread_sigmask (how=how@entry=0, 
    newmask=<optimized out>, newmask@entry=0x7fffffffa550, 
    oldmask=oldmask@entry=0x7fffffffa5f0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
block_interrupt_signal (oldset=oldset@entry=0x7fffffffa5f0) at sysdep.c:868
868	}
Value returned is $31 = 0
(gdb) cont
Continuing.
^Z

And then a new TSTP:

Thread 1 "emacs" received signal SIGTSTP, Stopped (user).
pthread_sigmask (how=how@entry=0, newmask=<optimized out>, 
    newmask@entry=0x7fffffffa850, oldmask=oldmask@entry=0x7fffffffa8f0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
50	../sysdeps/unix/sysv/linux/pthread_sigmask.c: No such file or directory.
(gdb) xbacktrace 
"accept-process-output" (0xffffb4e8)
"network-stream-get-response" (0xffffb778)
"network-stream-open-starttls" (0xffffbc28)
"open-network-stream" (0xffffbf10)
"nntp-open-connection" (0xffffc3f0)
"nntp-possibly-change-group" (0xffffcaa0)
"nntp-with-open-group-function" (0xffffcd40)
"nntp-request-article" (0xffffcfb0)
"gnus-request-article" (0xffffd278)
"gnus-async-prefetch-article" (0xffffd588)
"apply" (0xffffd698)
"timer-event-handler" (0xffffd948)
(gdb) finish
Run till exit from #0  pthread_sigmask (how=how@entry=0, 
    newmask=<optimized out>, newmask@entry=0x7fffffffa850, 
    oldmask=oldmask@entry=0x7fffffffa8f0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
block_interrupt_signal (oldset=oldset@entry=0x7fffffffa8f0) at sysdep.c:868
868	}
Value returned is $32 = 0
(gdb) finish
Run till exit from #0  block_interrupt_signal (
    oldset=oldset@entry=0x7fffffffa8f0) at sysdep.c:868
really_call_select (arg=0x7fffffffa990) at thread.c:583
583	  release_global_lock ();
(gdb) finish
Run till exit from #0  really_call_select (arg=0x7fffffffa990) at thread.c:583
thread_select (func=<optimized out>, max_fds=max_fds@entry=22, 
    rfds=rfds@entry=0x7fffffffaa60, wfds=wfds@entry=0x7fffffffaae0, 
    efds=efds@entry=0x0, timeout=timeout@entry=0x7fffffffb0b0, sigmask=0x0)
    at thread.c:618
618	}
(gdb) finish
Run till exit from #0  thread_select (func=<optimized out>, 
    max_fds=max_fds@entry=22, rfds=rfds@entry=0x7fffffffaa60, 
    wfds=wfds@entry=0x7fffffffaae0, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffb0b0, sigmask=0x0) at thread.c:618
xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffb1e0, 
    wfds=wfds@entry=0x7fffffffb260, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffb0b0, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
120	  if (nfds < 0)
Value returned is $33 = 3
(gdb) finish
Run till exit from #0  xg_select (fds_lim=22, rfds=rfds@entry=0x7fffffffb1e0, 
    wfds=wfds@entry=0x7fffffffb260, efds=efds@entry=0x0, 
    timeout=timeout@entry=0x7fffffffb0b0, sigmask=sigmask@entry=0x0)
    at xgselect.c:120
wait_reading_process_output (time_limit=<optimized out>, 
    nsecs=<optimized out>, read_kbd=read_kbd@entry=0, 
    do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555560255dc0, just_wait_proc=0)
    at process.c:5447
5447	          if (nfds == 0)
Value returned is $34 = 2
(gdb) finish
Run till exit from #0  wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, 
    read_kbd=read_kbd@entry=0, do_display=do_display@entry=false, 
    wait_for_cell=wait_for_cell@entry=XIL(0), 
    wait_proc=wait_proc@entry=0x555560255dc0, just_wait_proc=0)
    at process.c:5447
Faccept_process_output (process=XIL(0x555560255dc5), seconds=<optimized out>, 
    millisec=<optimized out>, just_this_one=<optimized out>) at process.c:4715
4715	     ? Qnil : Qt);
Value returned is $35 = 74
(gdb) finish
Run till exit from #0  Faccept_process_output (process=XIL(0x555560255dc5), 
    seconds=<optimized out>, millisec=<optimized out>, 
    just_this_one=<optimized out>) at process.c:4715
0x00005555556ea7c9 in Ffuncall (nargs=3, args=args@entry=0x7fffffffb4e0)
    at eval.c:2803
2803	    val = funcall_subr (XSUBR (fun), numargs, args + 1);
Value returned is $36 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x00005555556ea7c9 in Ffuncall (nargs=3, 
    args=args@entry=0x7fffffffb4e0) at eval.c:2803
0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x31) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $37 = (union Lisp_X *) 0xc5a0
(gdb) finish
Run till exit from #0  0x0000555555721250 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x31) at bytecode.c:633
Ffuncall (nargs=4, args=args@entry=0x7fffffffb770) at eval.c:2826
2826	  lisp_eval_depth--;
Value returned is $38 = (union Lisp_X *) 0x0
(gdb) p args[0]
$39 = XIL(0x1401750)
(gdb) xsymbol 
$40 = (struct Lisp_Symbol *) 0x555557199570
"network-stream-get-response"
(gdb) xbacktrace 
"network-stream-get-response" (0xffffb778)
"network-stream-open-starttls" (0xffffbc28)
"open-network-stream" (0xffffbf10)
"nntp-open-connection" (0xffffc3f0)
"nntp-possibly-change-group" (0xffffcaa0)
"nntp-with-open-group-function" (0xffffcd40)
"nntp-request-article" (0xffffcfb0)
"gnus-request-article" (0xffffd278)
"gnus-async-prefetch-article" (0xffffd588)
"apply" (0xffffd698)
"timer-event-handler" (0xffffd948)

So this looks very similar to the first one.

(gdb) finish
Run till exit from #0  Ffuncall (nargs=4, args=args@entry=0x7fffffffb770)
    at eval.c:2826
0x0000555555721250 in exec_byte_code (bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x30) at bytecode.c:633
633		    TOP = Ffuncall (op + 1, &TOP);
Value returned is $41 = (union Lisp_X *) 0x0
(gdb) finish
Run till exit from #0  0x0000555555721250 in exec_byte_code (
    bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=5, args=<optimized out>, 
    args@entry=0x30) at bytecode.c:633
Ffuncall (nargs=2, args=args@entry=0x7fffffffbbd8) at eval.c:2826
2826	  lisp_eval_depth--;
Value returned is $42 = (union Lisp_X *) 0x555559d0a544
(gdb) cont
Continuing.

And upon continuing, Emacs un-stuck with the following message:

Error running timer ‘gnus-async-prefetch-article’: (error "Server closed connection")


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-25 19:15                                     ` Lars Ingebrigtsen
  2019-06-25 21:57                                       ` Lars Ingebrigtsen
@ 2019-06-26 16:29                                       ` Eli Zaretskii
  1 sibling, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-26 16:29 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Tue, 25 Jun 2019 21:15:17 +0200
> 
> So, the "finish" stuff seemed to show that we weren't hanging in
> accept-process-output at all, but were instead just calling functions as
> normal, and looping on the Lisp level?

That's my current hypothesis, yes.

> And, for instance, Ffuncall has a maybe_quit, so C-g should work in
> these instances?  But they don't, which is confusing.

The loop is not in Ffuncall, it's in some Lisp that calls Ffuncall.
Those are the two stack frames running bytecode above the frame with
Ffuncall.

> Is it even possible for the Lisp-ier functions in the backtrace to
> inhibit C-g?

Yes, they can set inhibit-quit.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-25 21:57                                       ` Lars Ingebrigtsen
@ 2019-06-26 16:32                                         ` Eli Zaretskii
  2019-06-27 10:34                                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-26 16:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Tue, 25 Jun 2019 23:57:35 +0200
> 
> (gdb) finish
> Run till exit from #0  0x0000555555721250 in exec_byte_code (
>     bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, 
>     args_template=<optimized out>, nargs=nargs@entry=3, args=<optimized out>, 
>     args@entry=0x31) at bytecode.c:633
> Ffuncall (nargs=4, args=args@entry=0x7fffffffb770) at eval.c:2826
> 2826	  lisp_eval_depth--;
> Value returned is $38 = (union Lisp_X *) 0x0
> (gdb) p args[0]
> $39 = XIL(0x1401750)
> (gdb) xsymbol 
> $40 = (struct Lisp_Symbol *) 0x555557199570
> "network-stream-get-response"

OK, so that Ffuncall calls network-stream-get-response.  the next
question is what Lisp code runs 2 stack frames above, i.e. which Lisp
code calls some other Lisp code, which calls
network-stream-get-response?  Can you spot that?  The offending code
should be somewhere in network-stream-get-response or functions it
calls.

Once we identify the Lisp code which loops, we need to try to figure
out why it loops.

> And upon continuing, Emacs un-stuck with the following message:
> 
> Error running timer ‘gnus-async-prefetch-article’: (error "Server closed connection")

So probably we loop until the other side closes the connection?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-26 16:32                                         ` Eli Zaretskii
@ 2019-06-27 10:34                                           ` Lars Ingebrigtsen
  2019-06-27 13:25                                             ` Eli Zaretskii
  2019-06-28 14:39                                             ` Basil L. Contovounesios
  0 siblings, 2 replies; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-27 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

> OK, so that Ffuncall calls network-stream-get-response.  the next
> question is what Lisp code runs 2 stack frames above, i.e. which Lisp
> code calls some other Lisp code, which calls
> network-stream-get-response?  Can you spot that?  The offending code
> should be somewhere in network-stream-get-response or functions it
> calls.
>
> Once we identify the Lisp code which loops, we need to try to figure
> out why it loops.

I'm pretty sure it's this:

	(while (and (memq (process-status stream) '(open run))
		    (not (re-search-forward end-of-command nil t)))
	  (accept-process-output stream 0.05)
	  (goto-char start))

But I do not understand why C-g doesn't work: I've tried to follow the
call chain, and I can see no instances of binding `inhibit-quit'...

If C-g worked, it'd be easier to debug...

>> And upon continuing, Emacs un-stuck with the following message:
>> 
>> Error running timer ‘gnus-async-prefetch-article’: (error "Server
>> closed connection")
>
> So probably we loop until the other side closes the connection?

Yup.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-27 10:34                                           ` Lars Ingebrigtsen
@ 2019-06-27 13:25                                             ` Eli Zaretskii
  2019-06-27 19:28                                               ` Lars Ingebrigtsen
  2019-06-28 14:39                                             ` Basil L. Contovounesios
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-27 13:25 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Thu, 27 Jun 2019 12:34:28 +0200
> 
> > Once we identify the Lisp code which loops, we need to try to figure
> > out why it loops.
> 
> I'm pretty sure it's this:
> 
> 	(while (and (memq (process-status stream) '(open run))
> 		    (not (re-search-forward end-of-command nil t)))
> 	  (accept-process-output stream 0.05)
> 	  (goto-char start))

Any idea why this loops?  Does the other end not respond or something?

> But I do not understand why C-g doesn't work: I've tried to follow the
> call chain, and I can see no instances of binding `inhibit-quit'...

Is inhibit-quit in fact nil or not inside this loop?

> If C-g worked, it'd be easier to debug...

If you are running this under GDB, you can interrupt by "GDB ways",
they are described in etc/DEBUG.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-27 13:25                                             ` Eli Zaretskii
@ 2019-06-27 19:28                                               ` Lars Ingebrigtsen
  2019-06-28  6:19                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-27 19:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> I'm pretty sure it's this:
>> 
>> 	(while (and (memq (process-status stream) '(open run))
>> 		    (not (re-search-forward end-of-command nil t)))
>> 	  (accept-process-output stream 0.05)
>> 	  (goto-char start))
>
> Any idea why this loops?  Does the other end not respond or something?

I'm not sure -- I haven't been able to debug that at all.  Since C-g
doesn't work (and it happens so rarely and I have to catch it before the
peer hangs up), it makes it awkward...

>> But I do not understand why C-g doesn't work: I've tried to follow the
>> call chain, and I can see no instances of binding `inhibit-quit'...
>
> Is inhibit-quit in fact nil or not inside this loop?

Looking at the code, I can't see any signs of it being inhibited here,
but I'm waiting for the next time it happens so that I can inspect with
gdb...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-27 19:28                                               ` Lars Ingebrigtsen
@ 2019-06-28  6:19                                                 ` Eli Zaretskii
  2019-06-28  8:25                                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-28  6:19 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Thu, 27 Jun 2019 21:28:34 +0200
> 
> >> 	(while (and (memq (process-status stream) '(open run))
> >> 		    (not (re-search-forward end-of-command nil t)))
> >> 	  (accept-process-output stream 0.05)
> >> 	  (goto-char start))
> >
> > Any idea why this loops?  Does the other end not respond or something?
> 
> I'm not sure -- I haven't been able to debug that at all.  Since C-g
> doesn't work (and it happens so rarely and I have to catch it before the
> peer hangs up), it makes it awkward...

You could try using debug-on-event in conjunction with SIGUSR1/SIGUSR2
to enter the debugger when you need.

> > Is inhibit-quit in fact nil or not inside this loop?
> 
> Looking at the code, I can't see any signs of it being inhibited here,
> but I'm waiting for the next time it happens so that I can inspect with
> gdb...

You could also add a call to 'message' in that loop to print the value
of the flag.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-28  6:19                                                 ` Eli Zaretskii
@ 2019-06-28  8:25                                                   ` Lars Ingebrigtsen
  2019-06-28  8:34                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-28  8:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> I'm not sure -- I haven't been able to debug that at all.  Since C-g
>> doesn't work (and it happens so rarely and I have to catch it before the
>> peer hangs up), it makes it awkward...
>
> You could try using debug-on-event in conjunction with SIGUSR1/SIGUSR2
> to enter the debugger when you need.

Oh, that seems very useful; yes.  I'll set that up and try that the next
time it happens.

>> > Is inhibit-quit in fact nil or not inside this loop?
>> 
>> Looking at the code, I can't see any signs of it being inhibited here,
>> but I'm waiting for the next time it happens so that I can inspect with
>> gdb...
>
> You could also add a call to 'message' in that loop to print the value
> of the flag.

`inhibit-quit' is not set normally when that function is called (I just
checked), but perhaps...  Oh, I just thought something -- this function
(when it hangs) is always called from a timer.  Does the timer machinery
set `inhibit-quit'?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-28  8:25                                                   ` Lars Ingebrigtsen
@ 2019-06-28  8:34                                                     ` Eli Zaretskii
  2019-06-28  9:55                                                       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-28  8:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Fri, 28 Jun 2019 10:25:41 +0200
> 
> Oh, I just thought something -- this function (when it hangs) is
> always called from a timer.  Does the timer machinery set
> `inhibit-quit'?

Yes, it does: see timer_check_2.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-28  8:34                                                     ` Eli Zaretskii
@ 2019-06-28  9:55                                                       ` Lars Ingebrigtsen
  2019-06-28 12:26                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-28  9:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
>> Date: Fri, 28 Jun 2019 10:25:41 +0200
>> 
>> Oh, I just thought something -- this function (when it hangs) is
>> always called from a timer.  Does the timer machinery set
>> `inhibit-quit'?
>
> Yes, it does: see timer_check_2.

Great; now at least I know where to disable the `inhibit-quit' so that I
can debug the infloop in network-stream more conveniently.

Thanks for all the help with this, Eli.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-28  9:55                                                       ` Lars Ingebrigtsen
@ 2019-06-28 12:26                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-28 12:26 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Fri, 28 Jun 2019 11:55:09 +0200
> 
> Thanks for all the help with this, Eli.

I'm glad to be of any help I can extend.  Thanks for working on this
tricky problem.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-27 10:34                                           ` Lars Ingebrigtsen
  2019-06-27 13:25                                             ` Eli Zaretskii
@ 2019-06-28 14:39                                             ` Basil L. Contovounesios
  2019-06-28 14:50                                               ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Basil L. Contovounesios @ 2019-06-28 14:39 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eggert, 24201

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> OK, so that Ffuncall calls network-stream-get-response.  the next
>> question is what Lisp code runs 2 stack frames above, i.e. which Lisp
>> code calls some other Lisp code, which calls
>> network-stream-get-response?  Can you spot that?  The offending code
>> should be somewhere in network-stream-get-response or functions it
>> calls.
>>
>> Once we identify the Lisp code which loops, we need to try to figure
>> out why it loops.
>
> I'm pretty sure it's this:
>
> 	(while (and (memq (process-status stream) '(open run))
> 		    (not (re-search-forward end-of-command nil t)))
> 	  (accept-process-output stream 0.05)
> 	  (goto-char start))

This is probably unrelated, but isn't there a race condition between
checking process-status and accepting output, as described in
(info "(elisp) Accepting Output")?

Thanks,

-- 
Basil





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-28 14:39                                             ` Basil L. Contovounesios
@ 2019-06-28 14:50                                               ` Eli Zaretskii
  2019-06-30 19:02                                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-06-28 14:50 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: larsi, eggert, 24201

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: Eli Zaretskii <eliz@gnu.org>,  24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Fri, 28 Jun 2019 15:39:04 +0100
> 
> > 	(while (and (memq (process-status stream) '(open run))
> > 		    (not (re-search-forward end-of-command nil t)))
> > 	  (accept-process-output stream 0.05)
> > 	  (goto-char start))
> 
> This is probably unrelated, but isn't there a race condition between
> checking process-status and accepting output, as described in
> (info "(elisp) Accepting Output")?

Yes.  But the issue here is why it loops for such a long time, not why
it continues looping after the connection is closed.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-28 14:50                                               ` Eli Zaretskii
@ 2019-06-30 19:02                                                 ` Lars Ingebrigtsen
  2019-07-04 13:04                                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-30 19:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Basil L. Contovounesios, 24201, eggert

I'm able to reproduce the hang more often now by artificially closing
the connection a lot and then letting it be reopened, so I'm now getting
the hangs about once an hour when using Gnus.

I changed the inflooping function like thus to record what's in it:

(defun network-stream-get-response (stream start end-of-command)
  (when end-of-command
    (with-current-buffer (process-buffer stream)
      (save-excursion
	(goto-char start)
	(while (and (memq (process-status stream) '(open run))
		    (not (re-search-forward end-of-command nil t)))
          (push (list (format-time-string "%FT%T")
                      (point)
                      (current-buffer)
                      (buffer-string))
                lars-buffers)
	  (accept-process-output stream 0.05)
	  (goto-char start))

After making this change, I've only seen the hang twice, so this is all
very preliminary, but in both hangs, it didn't really loop there.

That is, in both instances, I got something like this as the top-level
element in `lars-buffers':

("2019-06-30T20:50:37" 1 #<buffer  *server news.gmane.org nntp  *Async Prefetch Article**> "")

and the next entries were from a successful connection.  And this is
after letting it hang for, say, ten seconds before `C-g'-ing it, so
there should be hundreds of elements there, since accept-process-output
is supposed to return after 0.05s.

What's in the buffer after `C-g' was, both times:

----

200 news.gmane.org InterNetNews NNRP server INN 2.6.1 ready (posting ok)^M

Process nntpd<3> connection broken by remote peer

----

I'll keep poking at this, but this is pointing at
`accept-process-output' having some kind of problem...  But we'll see;
perhaps these two hangs weren't typical...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-06-30 19:02                                                 ` Lars Ingebrigtsen
@ 2019-07-04 13:04                                                   ` Lars Ingebrigtsen
  2019-07-04 19:05                                                     ` Eli Zaretskii
  2019-07-05  8:21                                                     ` Eli Zaretskii
  0 siblings, 2 replies; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-04 13:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Basil L. Contovounesios, 24201, eggert

I've been poking more at this, and I think there really must be a bug
somewhere in accept-process-output.

With the following patch, connection never hangs:

diff --git a/lisp/net/network-stream.el b/lisp/net/network-stream.el
index 2b3292b71b..cdb33a59f1 100644
--- a/lisp/net/network-stream.el
+++ b/lisp/net/network-stream.el
@@ -376,7 +376,7 @@ network-stream-get-response
 	(goto-char start)
 	(while (and (memq (process-status stream) '(open run))
 		    (not (re-search-forward end-of-command nil t)))
-	  (accept-process-output stream 0.05)
+	  (accept-process-output stream 0.05 nil t)
 	  (goto-char start))
 	;; Return the data we got back, or nil if the process died.
 	(unless (= start (point))

It's the JUST-THIS-ONE parameter: If that's non-nil, then
accept-process-output returns after the timeout...  and we get the data.

Now, tracing the logic in wait_reading_process_output is rather...
difficult.  It's an 800 line function with lots of inputs from
everywhere.  But I think this code looks suspicious:

	  if (NILP (wait_for_cell) && just_wait_proc >= 0
	      && timespec_valid_p (timer_delay)
	      && timespec_cmp (timer_delay, timeout) < 0)
	    {
	      if (!timespec_valid_p (now))
		now = current_timespec ();
	      struct timespec timeout_abs = timespec_add (now, timeout);
	      if (!timespec_valid_p (got_output_end_time)
		  || timespec_cmp (timeout_abs, got_output_end_time) < 0)
		got_output_end_time = timeout_abs;
	      timeout = timer_delay;
	    }
	  else
	    got_output_end_time = invalid_timespec ();

This is done only if JUST-THIS-ONE is set.

But...  the timeout stuff is implemented in a very convoluted manner,
with reuse of certain variables that makes what this is trying to
achieve rather opaque. 

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply related	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-04 13:04                                                   ` Lars Ingebrigtsen
@ 2019-07-04 19:05                                                     ` Eli Zaretskii
  2019-07-05 12:59                                                       ` Lars Ingebrigtsen
  2019-07-05  8:21                                                     ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-07-04 19:05 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: contovob, 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: "Basil L. Contovounesios" <contovob@tcd.ie>,  24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Thu, 04 Jul 2019 15:04:54 +0200
> 
> With the following patch, connection never hangs:
> 
> diff --git a/lisp/net/network-stream.el b/lisp/net/network-stream.el
> index 2b3292b71b..cdb33a59f1 100644
> --- a/lisp/net/network-stream.el
> +++ b/lisp/net/network-stream.el
> @@ -376,7 +376,7 @@ network-stream-get-response
>  	(goto-char start)
>  	(while (and (memq (process-status stream) '(open run))
>  		    (not (re-search-forward end-of-command nil t)))
> -	  (accept-process-output stream 0.05)
> +	  (accept-process-output stream 0.05 nil t)
>  	  (goto-char start))
>  	;; Return the data we got back, or nil if the process died.
>  	(unless (= start (point))

Does this mean you have other process objects active at that time?

> It's the JUST-THIS-ONE parameter: If that's non-nil, then
> accept-process-output returns after the timeout...  and we get the data.

I don't understand: accept-process-output is supposed to hit the
timeout only when there's no data.  But you say "and we get the
data".  So what am I missing here?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-04 13:04                                                   ` Lars Ingebrigtsen
  2019-07-04 19:05                                                     ` Eli Zaretskii
@ 2019-07-05  8:21                                                     ` Eli Zaretskii
  2019-07-05 13:13                                                       ` Lars Ingebrigtsen
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-07-05  8:21 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: contovob, 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: "Basil L. Contovounesios" <contovob@tcd.ie>,  24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Thu, 04 Jul 2019 15:04:54 +0200
> 
> I think this code looks suspicious:
> 
> 	  if (NILP (wait_for_cell) && just_wait_proc >= 0
> 	      && timespec_valid_p (timer_delay)
> 	      && timespec_cmp (timer_delay, timeout) < 0)
> 	    {
> 	      if (!timespec_valid_p (now))
> 		now = current_timespec ();
> 	      struct timespec timeout_abs = timespec_add (now, timeout);
> 	      if (!timespec_valid_p (got_output_end_time)
> 		  || timespec_cmp (timeout_abs, got_output_end_time) < 0)
> 		got_output_end_time = timeout_abs;
> 	      timeout = timer_delay;
> 	    }
> 	  else
> 	    got_output_end_time = invalid_timespec ();
> 
> This is done only if JUST-THIS-ONE is set.

There's also this:

      /* Wait till there is something to do.  */

      if (wait_proc && just_wait_proc)
	{
	  if (wait_proc->infd < 0)  /* Terminated.  */
	    break;
	  FD_SET (wait_proc->infd, &Available);
	  check_delay = 0;
          check_write = 0;
	}

In your case, wait_proc is non-NULL.

Can you tell which of the various conditions cause
accept-process-output to return after a timeout when you apply the
patch to network-stream.el?  Then we'd know for sure which part helps
avoid the loop when JUST-THIS-ONE is non-nil.

But what I'm missing is some description of the traffic that happens
on the wire.  Is the remote responding or isn't it, and if not, then
do we have any idea why?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-04 19:05                                                     ` Eli Zaretskii
@ 2019-07-05 12:59                                                       ` Lars Ingebrigtsen
  2019-07-05 18:03                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-05 12:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: contovob, 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> -	  (accept-process-output stream 0.05)
>> +	  (accept-process-output stream 0.05 nil t)
>>  	  (goto-char start))
>>  	;; Return the data we got back, or nil if the process died.
>>  	(unless (= start (point))
>
> Does this mean you have other process objects active at that time?

Yes, normally when this code is triggered by the timer, there's other
networking happening more or less at the same time.

>> It's the JUST-THIS-ONE parameter: If that's non-nil, then
>> accept-process-output returns after the timeout...  and we get the data.
>
> I don't understand: accept-process-output is supposed to hit the
> timeout only when there's no data. 

That's not what the doc string says, I think?

--
Optional second argument SECONDS and third argument MILLISEC
specify a timeout; return after that much time even if there is
no subprocess output.
--

"even if"...  

> But you say "and we get the data".  So what am I missing here?

Without the JUST-THIS-ONE parameter, accept-process-output seems to loop
until the peer closes the connection.  And then control is returned to
Lisp world and the data is in the buffer.

The gdb-ing I did earlier seemed to say that this was not the case
(i.e., that it was looping in that loop above there), but Emacs
apparently behaves differently when stepping through in gdb than when
it's not...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-05  8:21                                                     ` Eli Zaretskii
@ 2019-07-05 13:13                                                       ` Lars Ingebrigtsen
  0 siblings, 0 replies; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-05 13:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: contovob, 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

> Can you tell which of the various conditions cause
> accept-process-output to return after a timeout when you apply the
> patch to network-stream.el?  Then we'd know for sure which part helps
> avoid the loop when JUST-THIS-ONE is non-nil.

No, I'll add some debugging code to the function and the try to get it
to hang to identify what makes JUST-THIS-ONE respect the timeout.

> But what I'm missing is some description of the traffic that happens
> on the wire.  Is the remote responding or isn't it, and if not, then
> do we have any idea why?

Remote is indeed responding.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-05 12:59                                                       ` Lars Ingebrigtsen
@ 2019-07-05 18:03                                                         ` Eli Zaretskii
  2019-07-06 12:16                                                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-07-05 18:03 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: contovob, 24201, eggert

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: contovob@tcd.ie,  24201@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Fri, 05 Jul 2019 14:59:52 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> -	  (accept-process-output stream 0.05)
> >> +	  (accept-process-output stream 0.05 nil t)
> >>  	  (goto-char start))
> >>  	;; Return the data we got back, or nil if the process died.
> >>  	(unless (= start (point))
> >
> > Does this mean you have other process objects active at that time?
> 
> Yes, normally when this code is triggered by the timer, there's other
> networking happening more or less at the same time.

So maybe another process steals the response?

> >> It's the JUST-THIS-ONE parameter: If that's non-nil, then
> >> accept-process-output returns after the timeout...  and we get the data.
> >
> > I don't understand: accept-process-output is supposed to hit the
> > timeout only when there's no data. 
> 
> That's not what the doc string says, I think?
> 
> --
> Optional second argument SECONDS and third argument MILLISEC
> specify a timeout; return after that much time even if there is
> no subprocess output.
> --
> 
> "even if"...  

But if output is available, accept-process-output will return
immediately, so "even if" really means "if".

> Without the JUST-THIS-ONE parameter, accept-process-output seems to loop
> until the peer closes the connection.  And then control is returned to
> Lisp world and the data is in the buffer.

Since you are saying that the remote does respond, this would mean the
responses get lost somehow, or are consumed by other filter functions.
The question is how can that happen?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-05 18:03                                                         ` Eli Zaretskii
@ 2019-07-06 12:16                                                           ` Lars Ingebrigtsen
  2019-07-07 18:13                                                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-06 12:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: contovob, 24201, eggert

Eli Zaretskii <eliz@gnu.org> writes:

>> Yes, normally when this code is triggered by the timer, there's other
>> networking happening more or less at the same time.
>
> So maybe another process steals the response?

That's possible...

>> That's not what the doc string says, I think?
>> 
>> --
>> Optional second argument SECONDS and third argument MILLISEC
>> specify a timeout; return after that much time even if there is
>> no subprocess output.
>> --
>> 
>> "even if"...  
>
> But if output is available, accept-process-output will return
> immediately, so "even if" really means "if".

Yeah, that's true.  But it should, in an case, not keep spinning (much)
longer than the timeout, whether there's any output or not...

>> Without the JUST-THIS-ONE parameter, accept-process-output seems to loop
>> until the peer closes the connection.  And then control is returned to
>> Lisp world and the data is in the buffer.
>
> Since you are saying that the remote does respond, this would mean the
> responses get lost somehow, or are consumed by other filter functions.
> The question is how can that happen?

What I'm seeing happening if I instrument the loop in the Lisp part with:

	(while (and (memq (process-status stream) '(open run))
		    (not (re-search-forward end-of-command nil t)))
          (push (buffer-string) my-output-list)
	  (accept-process-output stream 0.05)
	  (goto-char start))

is that when Emacs hangs, and I `C-g' after half a minute, I will get 1
"" (empty string) entry in that list, so the timeout isn't heeded.
But -- in the buffer where this is happening, the data this is waiting
for is present, so Emacs got the response and put it into the buffer --
but then just kept on spinning in wait_reading_process_output,
apparently.

I'll add some instrumentation to that function now and try to figure out
why it's not heeding the timeout.  It's so annoying that I haven't been
able to make a simple test case to reproduce the error, though, because
the hang only happens once every few hours...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-06 12:16                                                           ` Lars Ingebrigtsen
@ 2019-07-07 18:13                                                             ` Lars Ingebrigtsen
  2019-07-07 18:18                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-07 18:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I'll add some instrumentation to that function now and try to figure out
> why it's not heeding the timeout.  It's so annoying that I haven't been
> able to make a simple test case to reproduce the error, though, because
> the hang only happens once every few hours...

Amusingly enough, after adding some fprintfs to the function, I haven't
seen a single hang.  :-/

So this is even more timing dependent than I imagined, and I'll have to
instrument this in a different way, I guess...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-07 18:13                                                             ` Lars Ingebrigtsen
@ 2019-07-07 18:18                                                               ` Eli Zaretskii
  2019-07-08 16:38                                                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2019-07-07 18:18 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 24201

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 24201@debbugs.gnu.org
> Date: Sun, 07 Jul 2019 20:13:13 +0200
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > I'll add some instrumentation to that function now and try to figure out
> > why it's not heeding the timeout.  It's so annoying that I haven't been
> > able to make a simple test case to reproduce the error, though, because
> > the hang only happens once every few hours...
> 
> Amusingly enough, after adding some fprintfs to the function, I haven't
> seen a single hang.  :-/

Maybe add only one fprintf (assuming doing that doesn't make the
problem go away), then move it around to see the control flow.  It's
less convenient and more time-consuming, but maybe there's no better
alternative.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-07 18:18                                                               ` Eli Zaretskii
@ 2019-07-08 16:38                                                                 ` Lars Ingebrigtsen
  2020-08-03  6:00                                                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 46+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-08 16:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201

Eli Zaretskii <eliz@gnu.org> writes:

> Maybe add only one fprintf (assuming doing that doesn't make the
> problem go away), then move it around to see the control flow.  It's
> less convenient and more time-consuming, but maybe there's no better
> alternative.

Yup; I'll try different things to see if I can get it to break more
reliably...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#24201: 25.1.50; TLS connections sometimes hang
  2019-07-08 16:38                                                                 ` Lars Ingebrigtsen
@ 2020-08-03  6:00                                                                   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 46+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-03  6:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24201

This was now apparently fixed by bug#40665 -- at least, I haven't seen a
single of these hangs after running with that patch for a day.

So I'm closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-08-03  6:00 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-11 13:20 bug#24201: 25.1.50; TLS connections sometimes hang Lars Ingebrigtsen
2016-08-15  2:35 ` Paul Eggert
2016-10-10 10:36   ` Eli Zaretskii
2016-10-10 10:54     ` Lars Ingebrigtsen
2016-10-10 11:23       ` Eli Zaretskii
2017-09-02 12:40         ` Eli Zaretskii
2017-09-02 13:04           ` Lars Ingebrigtsen
2017-09-02 14:21             ` Eli Zaretskii
2018-02-18 17:57               ` Lars Ingebrigtsen
2018-02-19 16:52                 ` Lars Ingebrigtsen
2018-02-19 17:56                   ` Paul Eggert
2018-02-19 18:16                     ` Lars Ingebrigtsen
2018-02-19 18:32                       ` Eli Zaretskii
2018-02-19 19:06                         ` Lars Ingebrigtsen
2018-02-19 19:57                           ` Eli Zaretskii
2018-02-19 20:39                             ` Lars Ingebrigtsen
2019-06-24 13:25                               ` Lars Ingebrigtsen
2019-06-24 19:20                                 ` Eli Zaretskii
2019-06-24 20:46                                   ` Lars Ingebrigtsen
2019-06-25 19:15                                     ` Lars Ingebrigtsen
2019-06-25 21:57                                       ` Lars Ingebrigtsen
2019-06-26 16:32                                         ` Eli Zaretskii
2019-06-27 10:34                                           ` Lars Ingebrigtsen
2019-06-27 13:25                                             ` Eli Zaretskii
2019-06-27 19:28                                               ` Lars Ingebrigtsen
2019-06-28  6:19                                                 ` Eli Zaretskii
2019-06-28  8:25                                                   ` Lars Ingebrigtsen
2019-06-28  8:34                                                     ` Eli Zaretskii
2019-06-28  9:55                                                       ` Lars Ingebrigtsen
2019-06-28 12:26                                                         ` Eli Zaretskii
2019-06-28 14:39                                             ` Basil L. Contovounesios
2019-06-28 14:50                                               ` Eli Zaretskii
2019-06-30 19:02                                                 ` Lars Ingebrigtsen
2019-07-04 13:04                                                   ` Lars Ingebrigtsen
2019-07-04 19:05                                                     ` Eli Zaretskii
2019-07-05 12:59                                                       ` Lars Ingebrigtsen
2019-07-05 18:03                                                         ` Eli Zaretskii
2019-07-06 12:16                                                           ` Lars Ingebrigtsen
2019-07-07 18:13                                                             ` Lars Ingebrigtsen
2019-07-07 18:18                                                               ` Eli Zaretskii
2019-07-08 16:38                                                                 ` Lars Ingebrigtsen
2020-08-03  6:00                                                                   ` Lars Ingebrigtsen
2019-07-05  8:21                                                     ` Eli Zaretskii
2019-07-05 13:13                                                       ` Lars Ingebrigtsen
2019-06-26 16:29                                       ` Eli Zaretskii
2018-02-19 18:37                       ` Andreas Schwab

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.