* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output @ 2018-10-11 14:57 Basil L. Contovounesios 2018-10-12 8:07 ` Eli Zaretskii 2018-10-12 12:02 ` Michael Albinus 0 siblings, 2 replies; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-11 14:57 UTC (permalink / raw) To: 33018 [-- Attachment #1: test.el --] [-- Type: application/emacs-lisp, Size: 1892 bytes --] [-- Attachment #2: Type: text/plain, Size: 3547 bytes --] I attach a sample program test.el whose central function, test-slave, invokes wget asynchronously before waiting for the process to exit. The issue I'm facing is that running test-slave twice in succession, each time in a new thread, causes accept-process-output to hang with no output (unless a timeout argument is given, in which case the function returns nil) the second time around. When this happens, the process sentinel is never called, which is why I'm assuming accept-process-output is indeed "hanging" in some sense and it's not just that the process has already exited and so has no further output. I could very well be doing or assuming something incorrectly, but what baffles me is that the "hang" does not occur either when Emacs is run non-interactively, or when "https://en.wikipedia.org/wiki/Emacs" is replaced with "https://www.gnu.org/software/emacs/", or when test-slave is run in the current thread (and not in make-thread). Since I can reliably reproduce this on both an optimised build of master and a non-optimised build of emacs-26, I hope to be able to provide further insights using gdb as time allows. Please let me know if there are any specific details/output you would like me to provide. As a relatively inexperienced gdb user I welcome any tips and tricks for debugging threads and processes. Here are some ways test.el can be run to illustrate the issue: # All five processes exit successfully. emacs -batch -l test.el -f test-no-threads # All five processes (and threads) exit successfully. emacs -batch -l test.el -f test-threads # All five processes exit successfully. emacs -Q -l test.el -f test-no-threads # First process and thread exit successfully, # but accept-process-output starts timing out in second thread. # Warning: may leave empty wget-log files lying around. emacs -Q -l test.el -f test-threads Details of the two Emacs versions I'm using follow: In GNU Emacs 26.1.50 (build 2, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2018-10-11 built on thunk Repository revision: a7ebc6bf633bd3849ccab032dad6b1fd31b1ef43 Windowing system distributor 'The X.Org Foundation', version 11.0.12001000 System Description: Debian GNU/Linux testing (buster) Configured using: 'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4 -pipe' --config-cache --prefix=/home/blc/.local --program-suffix=26 --enable-checking=yes,glyphs --enable-check-lisp-object-type --with-mailutils --with-x-toolkit=lucid --with-modules --with-file-notification=yes --with-x' Configured features: XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD LCMS2 In GNU Emacs 27.0.50 (build 21, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2018-10-11 built on thunk Repository revision: 5bd8cfc14d4b0c78c07e65a583f42a10c4cbc06d Windowing system distributor 'The X.Org Foundation', version 11.0.12001000 System Description: Debian GNU/Linux buster/sid Configured using: 'configure --config-cache --prefix=/home/blc/.local --with-mailutils --with-x-toolkit=lucid --with-modules --with-file-notification=yes --with-x 'CC=ccache gcc' 'CFLAGS=-O2 -march=native -pipe'' Configured features: XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD JSON LCMS2 GMP Thanks, -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-11 14:57 bug#33018: 26.1.50; thread starvation with async processes and accept-process-output Basil L. Contovounesios @ 2018-10-12 8:07 ` Eli Zaretskii 2018-10-14 15:00 ` Basil L. Contovounesios 2018-10-12 12:02 ` Michael Albinus 1 sibling, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2018-10-12 8:07 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: 33018 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Date: Thu, 11 Oct 2018 15:57:50 +0100 > > I attach a sample program test.el whose central function, test-slave, > invokes wget asynchronously before waiting for the process to exit. > > The issue I'm facing is that running test-slave twice in succession, > each time in a new thread, causes accept-process-output to hang with no > output (unless a timeout argument is given, in which case the function > returns nil) the second time around. When the hang happens, is there any wget process still alive, or did they all exit? Please use OS tools to find that out, don't rely on what Emacs thinks. > I could very well be doing or assuming something incorrectly, but what > baffles me is that the "hang" does not occur either when Emacs is run > non-interactively, or when "https://en.wikipedia.org/wiki/Emacs" is > replaced with "https://www.gnu.org/software/emacs/" Could be different properties of the servers related to async connections, like TLS handshake or even async getaddrinfo. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-12 8:07 ` Eli Zaretskii @ 2018-10-14 15:00 ` Basil L. Contovounesios 2018-10-14 15:17 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-14 15:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 33018 Eli Zaretskii <eliz@gnu.org> writes: >> From: "Basil L. Contovounesios" <contovob@tcd.ie> >> Date: Thu, 11 Oct 2018 15:57:50 +0100 >> >> I attach a sample program test.el whose central function, test-slave, >> invokes wget asynchronously before waiting for the process to exit. >> >> The issue I'm facing is that running test-slave twice in succession, >> each time in a new thread, causes accept-process-output to hang with no >> output (unless a timeout argument is given, in which case the function >> returns nil) the second time around. > > When the hang happens, is there any wget process still alive, or did > they all exit? Please use OS tools to find that out, don't rely on > what Emacs thinks. When the hang happens, the wget process launched by the waiting thread is still alive but asleep (idle), as reported by ps and top. -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-14 15:00 ` Basil L. Contovounesios @ 2018-10-14 15:17 ` Eli Zaretskii 2018-10-14 15:36 ` Basil L. Contovounesios 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2018-10-14 15:17 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: 33018 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Cc: <33018@debbugs.gnu.org> > Date: Sun, 14 Oct 2018 16:00:56 +0100 > > > When the hang happens, is there any wget process still alive, or did > > they all exit? Please use OS tools to find that out, don't rely on > > what Emacs thinks. > > When the hang happens, the wget process launched by the waiting thread > is still alive but asleep (idle), as reported by ps and top. Any idea why is that? And if this is the situation, doesn't it explain why accept-process-output times out? ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-14 15:17 ` Eli Zaretskii @ 2018-10-14 15:36 ` Basil L. Contovounesios 0 siblings, 0 replies; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-14 15:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 33018 Eli Zaretskii <eliz@gnu.org> writes: >> From: "Basil L. Contovounesios" <contovob@tcd.ie> >> Cc: <33018@debbugs.gnu.org> >> Date: Sun, 14 Oct 2018 16:00:56 +0100 >> >> > When the hang happens, is there any wget process still alive, or did >> > they all exit? Please use OS tools to find that out, don't rely on >> > what Emacs thinks. >> >> When the hang happens, the wget process launched by the waiting thread >> is still alive but asleep (idle), as reported by ps and top. > > Any idea why is that? No idea, sorry. I haven't yet found the time to look into this deeper. > And if this is the situation, doesn't it explain why > accept-process-output times out? Possibly, but the question is why do we enter either of these situations (process sleeping and accept-process-output timing out) exclusively when using Emacs threads, no? By the way, the problem isn't specific to wget; the same thing happens with curl. I also find it strange that the first thread doesn't suffer from the same problem as the second thread, especially given each thread is created only after the last thread exited. -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-11 14:57 bug#33018: 26.1.50; thread starvation with async processes and accept-process-output Basil L. Contovounesios 2018-10-12 8:07 ` Eli Zaretskii @ 2018-10-12 12:02 ` Michael Albinus 2018-10-12 12:43 ` Eli Zaretskii 2018-10-14 15:17 ` Basil L. Contovounesios 1 sibling, 2 replies; 22+ messages in thread From: Michael Albinus @ 2018-10-12 12:02 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: 33018 "Basil L. Contovounesios" <contovob@tcd.ie> writes: Hi Basil, > I attach a sample program test.el whose central function, test-slave, > invokes wget asynchronously before waiting for the process to exit. > > The issue I'm facing is that running test-slave twice in succession, > each time in a new thread, causes accept-process-output to hang with no > output (unless a timeout argument is given, in which case the function > returns nil) the second time around. If you want a process to communicate in a given thread, you must call `set-process-thread'. See the elisp manual. > Thanks, Best regards, Michael. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-12 12:02 ` Michael Albinus @ 2018-10-12 12:43 ` Eli Zaretskii 2018-10-12 12:49 ` Michael Albinus 2018-10-14 15:17 ` Basil L. Contovounesios 1 sibling, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2018-10-12 12:43 UTC (permalink / raw) To: Michael Albinus; +Cc: contovob, 33018 > From: Michael Albinus <michael.albinus@gmx.de> > Date: Fri, 12 Oct 2018 14:02:46 +0200 > Cc: 33018@debbugs.gnu.org > > If you want a process to communicate in a given thread, you must call > `set-process-thread'. See the elisp manual. But the default is that the process is locked to the thread that created it, so it sounds like this should have just worked (if that is the problem). ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-12 12:43 ` Eli Zaretskii @ 2018-10-12 12:49 ` Michael Albinus 0 siblings, 0 replies; 22+ messages in thread From: Michael Albinus @ 2018-10-12 12:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: contovob, 33018 Eli Zaretskii <eliz@gnu.org> writes: >> If you want a process to communicate in a given thread, you must call >> `set-process-thread'. See the elisp manual. > > But the default is that the process is locked to the thread that > created it, so it sounds like this should have just worked (if that is > the problem). I'm not sure. In the branch feature/tramp-thread-safe there were also mysterious blockings in accept-process-output, until I've applied set-process-thread explicitly. At least it is worth a try. Best regards, Michael. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-12 12:02 ` Michael Albinus 2018-10-12 12:43 ` Eli Zaretskii @ 2018-10-14 15:17 ` Basil L. Contovounesios 2018-10-15 8:02 ` Michael Albinus 1 sibling, 1 reply; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-14 15:17 UTC (permalink / raw) To: Michael Albinus; +Cc: 33018 [-- Attachment #1: Type: text/plain, Size: 775 bytes --] Michael Albinus <michael.albinus@gmx.de> writes: > "Basil L. Contovounesios" <contovob@tcd.ie> writes: > >> I attach a sample program test.el whose central function, test-slave, >> invokes wget asynchronously before waiting for the process to exit. >> >> The issue I'm facing is that running test-slave twice in succession, >> each time in a new thread, causes accept-process-output to hang with no >> output (unless a timeout argument is given, in which case the function >> returns nil) the second time around. > > If you want a process to communicate in a given thread, you must call > `set-process-thread'. See the elisp manual. Thanks, this is the first thing I tried when earlier experiments started to hang. I tried both the following redundant but explicit call: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: test.diff --] [-- Type: text/x-diff, Size: 616 bytes --] diff -u --label /tmp/test.el --label \#\<buffer\ /tmp/test.el\> /tmp/test.el /tmp/buffer-content-hY0EDf --- /tmp/test.el +++ #<buffer /tmp/test.el> @@ -24,6 +24,7 @@ :command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs") :connection-type 'pipe :sentinel #'test-sentinel))) + (set-process-thread proc (current-thread)) (while (eq (process-status proc) 'run) (test-debug proc 'accept-output (accept-process-output proc 5))) (test-debug proc 'exit (process-status proc) (process-exit-status proc)))) Diff finished. Sun Oct 14 16:04:48 2018 [-- Attachment #3: Type: text/plain, Size: 295 bytes --] as well as replacing (current-thread) with nil, to unlock the process. Adding test-debug calls before and after accept-process-output revealed nothing out of the ordinary, and explicitly un/locking the process didn't fix the hang. Should I be calling set-process-thread elsewhere? -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-14 15:17 ` Basil L. Contovounesios @ 2018-10-15 8:02 ` Michael Albinus 2018-10-16 1:15 ` Basil L. Contovounesios 0 siblings, 1 reply; 22+ messages in thread From: Michael Albinus @ 2018-10-15 8:02 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: 33018 [-- Attachment #1: Type: text/plain, Size: 862 bytes --] "Basil L. Contovounesios" <contovob@tcd.ie> writes: Hi Basil, >> If you want a process to communicate in a given thread, you must call >> `set-process-thread'. See the elisp manual. > > Thanks, this is the first thing I tried when earlier experiments started > to hang. I tried both the following redundant but explicit call: Well, I've played with your example. As Eli said, `set-process-thread' is not needed here. With your original example, I could reproduce the problem, However, if I call emacs -l test.el -f test-threads the problem does NOT happen. My .emacs is quite long, so I didn't bisect in order to find out what makes the difference. I have changed your example a little bit wrt `thread-join', see appended. This version runs w/o any problem even if emacs is called with -Q. Maybe this helps you to debug further. Best regards, Michael. [-- Attachment #2: Type: text/plain, Size: 1919 bytes --] ;;; test.el --- test -*- lexical-binding: t -*- ;;; Code: (defun test-debug (proc &rest args) "Print `current-thread' and ARGS in timestamped `message'." (message "[%s] %s" (format-time-string "%T:%3N") (mapconcat #'prin1-to-string `(,(current-thread) ,(process-thread proc) ,@args) " "))) (defun test-sentinel (proc msg) "Kill buffer of PROC when it exits." (test-debug proc 'sentinel (process-status proc) (substring msg 0 -1)) (and (not (eq (process-status proc) 'run)) (buffer-live-p (process-buffer proc)) (kill-buffer (process-buffer proc)))) (defun test-slave () "Create an asynchronous process and wait for it to exit." (let ((proc (make-process :name "test" :buffer (generate-new-buffer " *test*") :command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs") :connection-type 'pipe :sentinel #'test-sentinel))) (while (eq (process-status proc) 'run) (test-debug proc 'accept-output (accept-process-output proc 5))) (test-debug proc 'exit (process-status proc) (process-exit-status proc)))) (defun test-master (async) "Successively run `test-slave'. With non-nil ASYNC, use a new thread for each invocation." (if async (progn (dotimes (i 5) (make-thread #'test-slave (format "test-%d" i))) (while (cdr (all-threads) (thread-join (cadr (all-threads)))))) (dotimes (i 5) (test-slave)))) (defun test-job (async) "Successively run `test-slave'. Like `test-master', but avoid blocking interactive sessions by running `test-master' in its own thread." (if noninteractive (test-master async) (make-thread (apply-partially #'test-master async) "test-master"))) (defun test-threads () "Run `test-master' with threads." (test-job t)) (defun test-no-threads () "Run `test-master' without threads." (test-job nil)) ;;; test.el ends here ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-15 8:02 ` Michael Albinus @ 2018-10-16 1:15 ` Basil L. Contovounesios 2018-10-16 13:54 ` Michael Albinus 2018-10-16 14:58 ` Eli Zaretskii 0 siblings, 2 replies; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-16 1:15 UTC (permalink / raw) To: Michael Albinus; +Cc: 33018 [-- Attachment #1: test.el --] [-- Type: application/emacs-lisp, Size: 2012 bytes --] [-- Attachment #2: Type: text/plain, Size: 912 bytes --] Michael Albinus <michael.albinus@gmx.de> writes: > Well, I've played with your example. As Eli said, `set-process-thread' > is not needed here. > > With your original example, I could reproduce the problem, However, if I > call > > emacs -l test.el -f test-threads > > the problem does NOT happen. My .emacs is quite long, so I didn't bisect > in order to find out what makes the difference. > > I have changed your example a little bit wrt `thread-join', see > appended. This version runs w/o any problem even if emacs is called with > -Q. Maybe this helps you to debug further. Thanks, creating all threads before waiting for any of them to exit indeed does not suffer from the same hang. Doing this twice (see attached update), however, still hangs. There's something about going through a complete create-join cycle more than once within a non-main-thread which is triggering this behaviour. -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-16 1:15 ` Basil L. Contovounesios @ 2018-10-16 13:54 ` Michael Albinus 2018-10-16 14:55 ` Basil L. Contovounesios 2018-10-16 14:58 ` Eli Zaretskii 1 sibling, 1 reply; 22+ messages in thread From: Michael Albinus @ 2018-10-16 13:54 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: 33018 [-- Attachment #1: Type: text/plain, Size: 518 bytes --] "Basil L. Contovounesios" <contovob@tcd.ie> writes: > There's something about going through a complete create-join cycle more > than once within a non-main-thread which is triggering this behaviour. I'm not sure that it is related to threads. It looks, like some of your processes do not exit properly, and then thread-join is blocked. I've modified your example, again. It runs perfectly. And during its work, you could call "M-x list-threads" and see how the threads are created and die. Best regards, Michael. [-- Attachment #2: Type: text/plain, Size: 1241 bytes --] diff -u /home/albinus/tmp/test.el.\~1\~ /home/albinus/tmp/test.el --- /home/albinus/tmp/test.el.~1~ 2018-10-16 14:46:47.000000000 +0200 +++ /home/albinus/tmp/test.el 2018-10-16 15:47:05.581276541 +0200 @@ -19,9 +19,9 @@ (defun test-slave () "Create an asynchronous process and wait for it to exit." (let ((proc (make-process - :name "test" + :name (thread-name (current-thread)) :buffer (generate-new-buffer " *test*") - :command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs") + :command `("echo" ,(thread-name (current-thread))) :connection-type 'pipe :sentinel #'test-sentinel))) (while (eq (process-status proc) 'run) @@ -32,10 +32,10 @@ "Successively run `test-slave'. With non-nil ASYNC, use a new thread for each invocation." (if async - (dotimes (_ 2) + (dotimes (i 100) (let (threads) - (dotimes (i 2) - (push (make-thread #'test-slave (format "test-%d" i)) threads)) + (dotimes (j 100) + (push (make-thread #'test-slave (format "test-%d-%d" i j)) threads)) (mapc #'thread-join threads))) (dotimes (_ 2) (test-slave)))) ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-16 13:54 ` Michael Albinus @ 2018-10-16 14:55 ` Basil L. Contovounesios 0 siblings, 0 replies; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-16 14:55 UTC (permalink / raw) To: Michael Albinus; +Cc: 33018 Michael Albinus <michael.albinus@gmx.de> writes: > "Basil L. Contovounesios" <contovob@tcd.ie> writes: > >> There's something about going through a complete create-join cycle more >> than once within a non-main-thread which is triggering this behaviour. > > I'm not sure that it is related to threads. It has to be, because there is never an issue when I run the same asynchronous wget processes without threads, and with threads the hang reliably occurs 100% of the time. > It looks, like some of your processes do not exit properly, and then > thread-join is blocked. Indeed, but there is something about the interaction of Emacs threads and subprocesses which is causing unsuccessful process termination. Note that I am not ruling out pilot error; I simply haven't debugged this issue any deeper yet. The fact that no-one has yet pointed out any obvious blunders on my part gives me more confidence that there is indeed some ghost in the wire. > I've modified your example, again. It runs perfectly. Indeed, there are many subprocess-within-a-thread examples which don't suffer from a hang, e.g. by using a different URL. I would like to get to the bottom of why network programs like wget/curl in particular eventually hang, though. Thanks, -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-16 1:15 ` Basil L. Contovounesios 2018-10-16 13:54 ` Michael Albinus @ 2018-10-16 14:58 ` Eli Zaretskii 2018-10-17 17:37 ` Basil L. Contovounesios 1 sibling, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2018-10-16 14:58 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: michael.albinus, 33018 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Date: Tue, 16 Oct 2018 02:15:27 +0100 > Cc: 33018@debbugs.gnu.org > > Thanks, creating all threads before waiting for any of them to exit > indeed does not suffer from the same hang. Doing this twice (see > attached update), however, still hangs. > > There's something about going through a complete create-join cycle more > than once within a non-main-thread which is triggering this behaviour. Can you attach a debugger to the wget process that's stuck, and see where it is stuck? You will probably need to rebuild wget with debug info, or install one from your package repository (if they offer such). This could give us hints for where to look further. Thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-16 14:58 ` Eli Zaretskii @ 2018-10-17 17:37 ` Basil L. Contovounesios 2018-10-17 17:56 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-17 17:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael.albinus, 33018 [-- Attachment #1: gdb.txt --] [-- Type: text/plain, Size: 3050 bytes --] #0 0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096) at ../sysdeps/unix/sysv/linux/write.c:27 #1 0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>, data=0x5555559c1b30, n=4096) at fileops.c:1203 #2 0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>, data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., to_do=to_do@entry=4096) at fileops.c:457 #3 0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, to_do=4096) at fileops.c:433 #4 0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>) at fileops.c:813 #5 0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>) at iofflush.c:40 #6 0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0, buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., bufsize=4096, skip=0x7fffffffd0e8, written=0x7fffffffd0e0) at retr.c:207 #7 0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4, out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20, qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498 #8 0x000055555557d09d in read_response_body (hs=0x7fffffffd9d0, sock=4, fp=0x7ffff7739760 <_IO_2_1_stdout_>, contlen=198224, contrange=0, chunked_transfer_encoding=false, url=0x5555555ff920 "https://en.wikipedia.org/wiki/Emacs", warc_timestamp_str=0x7fffffffd470 "\f\335\377\377\377\177", warc_request_uuid=0x7fffffffd440 ",", warc_ip=0x0, type=0x5555559c8fc0 "text/html", statcode=200, head=0x5555559b35d0 "HTTP/1.1 200 OK\r\nDate: Wed, 17 Oct 2018 17:29:35 GMT\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Length: 198224\r\nConnection: keep-alive\r\nServer: mw1325.eqiad.wmnet\r\nVary: Accept-Encoding,Cookie,A"...) at http.c:1717 #9 0x0000555555581b06 in gethttp (u=0x5555555ff8b0, original_url=0x5555555ff8b0, hs=0x7fffffffd9d0, dt=0x7fffffffdd0c, proxy=0x0, iri=0x5555555ff880, count=1) at http.c:4129 #10 0x0000555555582239 in http_loop (u=0x5555555ff8b0, original_url=0x5555555ff8b0, newloc=0x7fffffffdb68, local_file=0x7fffffffdb58, referer=0x0, dt=0x7fffffffdd0c, proxy=0x0, iri=0x5555555ff880) at http.c:4363 #11 0x0000555555592ca4 in retrieve_url (orig_parsed=0x5555555ff8b0, origurl=0x5555555ff490 "https://en.wikipedia.org/wiki/Emacs", file=0x7fffffffdd18, newloc=0x7fffffffdd10, refurl=0x0, dt=0x7fffffffdd0c, recursive=false, iri=0x5555555ff880, register_status=true) at retr.c:957 #12 0x000055555558b33e in main (argc=3, argv=0x7fffffffdef8) at main.c:2158 [-- Attachment #2: Type: text/plain, Size: 789 bytes --] Eli Zaretskii <eliz@gnu.org> writes: > Can you attach a debugger to the wget process that's stuck, and see > where it is stuck? You will probably need to rebuild wget with debug > info, or install one from your package repository (if they offer > such). This could give us hints for where to look further. I did the following: 0. Build wget 1.19.5 from Debian's repositories: apt-get build-dep wget apt-get source wget configure CC='ccache gcc' CFLAGS='-O0 -g3 -ggdb -gdwarf-4 -pipe' --config-cache --enable-assert --with-gnu-ld make 1. Substitute resulting wget file name in original test.el program 2. emacs26 -Q -l test.el -f test-threads 3. gdb -p <pid of stuck wget> 4. set logging on 5. bt I attach the resulting gdb.txt log file. Thanks, -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 17:37 ` Basil L. Contovounesios @ 2018-10-17 17:56 ` Eli Zaretskii 2018-10-17 18:05 ` Basil L. Contovounesios 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2018-10-17 17:56 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: michael.albinus, 33018 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> > Date: Wed, 17 Oct 2018 18:37:00 +0100 > > #0 0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096) > at ../sysdeps/unix/sysv/linux/write.c:27 > #1 0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>, > data=0x5555559c1b30, n=4096) at fileops.c:1203 > #2 0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>, > data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., to_do=to_do@entry=4096) at fileops.c:457 > #3 0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, > to_do=4096) at fileops.c:433 > #4 0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>) > at fileops.c:813 > #5 0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>) at iofflush.c:40 > #6 0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0, > buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., bufsize=4096, skip=0x7fffffffd0e8, > written=0x7fffffffd0e0) at retr.c:207 > #7 0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4, > out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20, > qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498 Looks like the buffer of the pipe through which Emacs reads the stuff is full, and wget waits for some space there? ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 17:56 ` Eli Zaretskii @ 2018-10-17 18:05 ` Basil L. Contovounesios 2018-10-17 18:20 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-17 18:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael.albinus, 33018 Eli Zaretskii <eliz@gnu.org> writes: >> From: "Basil L. Contovounesios" <contovob@tcd.ie> >> Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> >> Date: Wed, 17 Oct 2018 18:37:00 +0100 >> >> #0 0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096) >> at ../sysdeps/unix/sysv/linux/write.c:27 >> #1 0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>, >> data=0x5555559c1b30, n=4096) at fileops.c:1203 >> #2 0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>, >> data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a >> href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is >> designed for manipulating pieces of text, although it is capable of formatting >> and print"..., to_do=to_do@entry=4096) at fileops.c:457 >> #3 0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, >> to_do=4096) at fileops.c:433 >> #4 0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>) >> at fileops.c:813 >> #5 0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>) >> at iofflush.c:40 >> #6 0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0, >> buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a >> href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is >> designed for manipulating pieces of text, although it is capable of formatting >> and print"..., bufsize=4096, skip=0x7fffffffd0e8, >> written=0x7fffffffd0e0) at retr.c:207 >> #7 0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4, >> out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20, >> qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498 > > Looks like the buffer of the pipe through which Emacs reads the stuff > is full, and wget waits for some space there? Would that imply that different threads/processes are (re)using the same buffer/pipe? FWIW, strace -p <pip of stuck emacs> gives: pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8} and strace -p <pip of stuck wget> gives: write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096 The former reminded me of bug#24201: https://debbugs.gnu.org/24201 -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 18:05 ` Basil L. Contovounesios @ 2018-10-17 18:20 ` Eli Zaretskii 2018-10-17 18:25 ` Eli Zaretskii 2018-10-17 20:02 ` Basil L. Contovounesios 0 siblings, 2 replies; 22+ messages in thread From: Eli Zaretskii @ 2018-10-17 18:20 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: michael.albinus, 33018 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> > Date: Wed, 17 Oct 2018 19:05:59 +0100 > > > Looks like the buffer of the pipe through which Emacs reads the stuff > > is full, and wget waits for some space there? > > Would that imply that different threads/processes are (re)using the same > buffer/pipe? Could be, but it's more likely that Emacs simply doesn't read the output from wget. > pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8} > > and strace -p <pip of stuck wget> gives: > > write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096 > > The former reminded me of bug#24201: https://debbugs.gnu.org/24201 I could be wrong, but it doesn't look similar to me. In that bug, the CPU was pegged, whereas you said that CPU is idle. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 18:20 ` Eli Zaretskii @ 2018-10-17 18:25 ` Eli Zaretskii 2018-10-17 20:46 ` Basil L. Contovounesios 2018-10-17 20:02 ` Basil L. Contovounesios 1 sibling, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2018-10-17 18:25 UTC (permalink / raw) To: contovob; +Cc: michael.albinus, 33018 > Date: Wed, 17 Oct 2018 21:20:22 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: michael.albinus@gmx.de, 33018@debbugs.gnu.org > > > From: "Basil L. Contovounesios" <contovob@tcd.ie> > > Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> > > Date: Wed, 17 Oct 2018 19:05:59 +0100 > > > > > Looks like the buffer of the pipe through which Emacs reads the stuff > > > is full, and wget waits for some space there? > > > > Would that imply that different threads/processes are (re)using the same > > buffer/pipe? > > Could be, but it's more likely that Emacs simply doesn't read the > output from wget. I think the relevant code should be instrumented to show which thread waits for what process(es). Btw, are you sure this is not a bug in your program? Michael caused your program to work twice by simple changes, AFAIU, no? ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 18:25 ` Eli Zaretskii @ 2018-10-17 20:46 ` Basil L. Contovounesios 2018-10-20 8:34 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-17 20:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael.albinus, 33018 Eli Zaretskii <eliz@gnu.org> writes: >> Date: Wed, 17 Oct 2018 21:20:22 +0300 >> From: Eli Zaretskii <eliz@gnu.org> >> Cc: michael.albinus@gmx.de, 33018@debbugs.gnu.org >> >> > From: "Basil L. Contovounesios" <contovob@tcd.ie> >> > Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> >> > Date: Wed, 17 Oct 2018 19:05:59 +0100 >> > >> > > Looks like the buffer of the pipe through which Emacs reads the stuff >> > > is full, and wget waits for some space there? >> > >> > Would that imply that different threads/processes are (re)using the same >> > buffer/pipe? >> >> Could be, but it's more likely that Emacs simply doesn't read the >> output from wget. > > I think the relevant code should be instrumented to show which thread > waits for what process(es). Each thread launches a single wget process which it then waits for before dying, and current-thread is always eq to that thread around calls to accept-process-output. Or are you talking about some other type of thread? Either way, I'll report back when I've had a deeper look into what Emacs is doing, unless someone beats me to it. > Btw, are you sure this is not a bug in your program? No, but the fact that it reliably works when Emacs is run with -batch, and reliably hangs when run with -Q is at least somewhat intriguing. > Michael caused your program to work twice by simple changes, AFAIU, > no? Michael avoided the hang by rewriting the create-join-...-create-join sequence as create-create-...-join-join. But if the latter is done twice from within the same master thread, the hang still occurs. As I said: > There's something about going through a complete create-join cycle more > than once within a non-main-thread which is triggering this behaviour. (Actually, I haven't checked whether the hang occurs when two create-join cycles are completed within main-thread; I was just specifically describing my sample program.) In his second rewrite, Michael replaced wget with echo, which does not suffer from any hangs. As I said: > Indeed, there are many subprocess-within-a-thread examples which don't > suffer from a hang, e.g. by using a different URL. I would like to get > to the bottom of why network programs like wget/curl in particular > eventually hang, though. In other words, I don't (yet) see why my recipe shouldn't work, and I'm curious to eventually get to the bottom of this. I'm sorry I've been talking more than doing, but university and house hunting will dominate my free time for the next few weeks. Thanks, -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 20:46 ` Basil L. Contovounesios @ 2018-10-20 8:34 ` Eli Zaretskii 0 siblings, 0 replies; 22+ messages in thread From: Eli Zaretskii @ 2018-10-20 8:34 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: michael.albinus, 33018 > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> > Date: Wed, 17 Oct 2018 21:46:39 +0100 > > > I think the relevant code should be instrumented to show which thread > > waits for what process(es). > > Each thread launches a single wget process which it then waits for > before dying, and current-thread is always eq to that thread around > calls to accept-process-output. Or are you talking about some other > type of thread? I was talking about low-level details: we set up the file-descriptor mask passed to pselect, to tell it which descriptors to wait on. The call to accept-process-output is supposed to arrange for the descriptor where the corresponding process will write to be one of those on which the corresponding pselect will wait. I was thinking that perhaps we become confused and don't ask pselect called by a thread to wait on the process which was launched by that thread. > Either way, I'll report back when I've had a deeper look into what Emacs > is doing, unless someone beats me to it. Thanks. > I'm sorry I've been talking more than doing, but university and house > hunting will dominate my free time for the next few weeks. No need to be sorry, we all have our lives. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output 2018-10-17 18:20 ` Eli Zaretskii 2018-10-17 18:25 ` Eli Zaretskii @ 2018-10-17 20:02 ` Basil L. Contovounesios 1 sibling, 0 replies; 22+ messages in thread From: Basil L. Contovounesios @ 2018-10-17 20:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael.albinus, 33018 Eli Zaretskii <eliz@gnu.org> writes: >> From: "Basil L. Contovounesios" <contovob@tcd.ie> >> Cc: <michael.albinus@gmx.de>, <33018@debbugs.gnu.org> >> Date: Wed, 17 Oct 2018 19:05:59 +0100 >> >> pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8} >> >> and strace -p <pip of stuck wget> gives: >> >> write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096 >> >> The former reminded me of bug#24201: https://debbugs.gnu.org/24201 > > I could be wrong, but it doesn't look similar to me. In that bug, the > CPU was pegged, whereas you said that CPU is idle. Right, I was only reminded of that bug because of the common accept-process-output/pselect hang in the context of a network-related process. I'm not suggesting the underlying cause is the same. -- Basil ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-10-20 8:34 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-10-11 14:57 bug#33018: 26.1.50; thread starvation with async processes and accept-process-output Basil L. Contovounesios 2018-10-12 8:07 ` Eli Zaretskii 2018-10-14 15:00 ` Basil L. Contovounesios 2018-10-14 15:17 ` Eli Zaretskii 2018-10-14 15:36 ` Basil L. Contovounesios 2018-10-12 12:02 ` Michael Albinus 2018-10-12 12:43 ` Eli Zaretskii 2018-10-12 12:49 ` Michael Albinus 2018-10-14 15:17 ` Basil L. Contovounesios 2018-10-15 8:02 ` Michael Albinus 2018-10-16 1:15 ` Basil L. Contovounesios 2018-10-16 13:54 ` Michael Albinus 2018-10-16 14:55 ` Basil L. Contovounesios 2018-10-16 14:58 ` Eli Zaretskii 2018-10-17 17:37 ` Basil L. Contovounesios 2018-10-17 17:56 ` Eli Zaretskii 2018-10-17 18:05 ` Basil L. Contovounesios 2018-10-17 18:20 ` Eli Zaretskii 2018-10-17 18:25 ` Eli Zaretskii 2018-10-17 20:46 ` Basil L. Contovounesios 2018-10-20 8:34 ` Eli Zaretskii 2018-10-17 20:02 ` Basil L. Contovounesios
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).