unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
@ 2018-10-11 14:57 Basil L. Contovounesios
  2018-10-12  8:07 ` Eli Zaretskii
  2018-10-12 12:02 ` Michael Albinus
  0 siblings, 2 replies; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-11 14:57 UTC (permalink / raw)
  To: 33018

[-- Attachment #1: test.el --]
[-- Type: application/emacs-lisp, Size: 1892 bytes --]

[-- Attachment #2: Type: text/plain, Size: 3547 bytes --]


I attach a sample program test.el whose central function, test-slave,
invokes wget asynchronously before waiting for the process to exit.

The issue I'm facing is that running test-slave twice in succession,
each time in a new thread, causes accept-process-output to hang with no
output (unless a timeout argument is given, in which case the function
returns nil) the second time around.

When this happens, the process sentinel is never called, which is why
I'm assuming accept-process-output is indeed "hanging" in some sense and
it's not just that the process has already exited and so has no further
output.

I could very well be doing or assuming something incorrectly, but what
baffles me is that the "hang" does not occur either when Emacs is run
non-interactively, or when "https://en.wikipedia.org/wiki/Emacs" is
replaced with "https://www.gnu.org/software/emacs/", or when test-slave
is run in the current thread (and not in make-thread).

Since I can reliably reproduce this on both an optimised build of master
and a non-optimised build of emacs-26, I hope to be able to provide
further insights using gdb as time allows.  Please let me know if there
are any specific details/output you would like me to provide.  As a
relatively inexperienced gdb user I welcome any tips and tricks for
debugging threads and processes.

Here are some ways test.el can be run to illustrate the issue:

# All five processes exit successfully.
emacs -batch -l test.el -f test-no-threads

# All five processes (and threads) exit successfully.
emacs -batch -l test.el -f test-threads

# All five processes exit successfully.
emacs -Q -l test.el -f test-no-threads

# First process and thread exit successfully,
# but accept-process-output starts timing out in second thread.
# Warning: may leave empty wget-log files lying around.
emacs -Q -l test.el -f test-threads

Details of the two Emacs versions I'm using follow:

In GNU Emacs 26.1.50 (build 2, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2018-10-11 built on thunk
Repository revision: a7ebc6bf633bd3849ccab032dad6b1fd31b1ef43
Windowing system distributor 'The X.Org Foundation', version 11.0.12001000
System Description:	Debian GNU/Linux testing (buster)

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4 -pipe'
 --config-cache --prefix=/home/blc/.local --program-suffix=26
 --enable-checking=yes,glyphs --enable-check-lisp-object-type
 --with-mailutils --with-x-toolkit=lucid --with-modules
 --with-file-notification=yes --with-x'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT
ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD
LCMS2

In GNU Emacs 27.0.50 (build 21, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2018-10-11 built on thunk
Repository revision: 5bd8cfc14d4b0c78c07e65a583f42a10c4cbc06d
Windowing system distributor 'The X.Org Foundation', version 11.0.12001000
System Description: Debian GNU/Linux buster/sid

Configured using:
 'configure --config-cache --prefix=/home/blc/.local --with-mailutils
 --with-x-toolkit=lucid --with-modules --with-file-notification=yes
 --with-x 'CC=ccache gcc' 'CFLAGS=-O2 -march=native -pipe''

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT
ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD
JSON LCMS2 GMP

Thanks,

-- 
Basil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-11 14:57 bug#33018: 26.1.50; thread starvation with async processes and accept-process-output Basil L. Contovounesios
@ 2018-10-12  8:07 ` Eli Zaretskii
  2018-10-14 15:00   ` Basil L. Contovounesios
  2018-10-12 12:02 ` Michael Albinus
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-12  8:07 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 33018

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Date: Thu, 11 Oct 2018 15:57:50 +0100
> 
> I attach a sample program test.el whose central function, test-slave,
> invokes wget asynchronously before waiting for the process to exit.
> 
> The issue I'm facing is that running test-slave twice in succession,
> each time in a new thread, causes accept-process-output to hang with no
> output (unless a timeout argument is given, in which case the function
> returns nil) the second time around.

When the hang happens, is there any wget process still alive, or did
they all exit?  Please use OS tools to find that out, don't rely on
what Emacs thinks.

> I could very well be doing or assuming something incorrectly, but what
> baffles me is that the "hang" does not occur either when Emacs is run
> non-interactively, or when "https://en.wikipedia.org/wiki/Emacs" is
> replaced with "https://www.gnu.org/software/emacs/"

Could be different properties of the servers related to async
connections, like TLS handshake or even async getaddrinfo.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-11 14:57 bug#33018: 26.1.50; thread starvation with async processes and accept-process-output Basil L. Contovounesios
  2018-10-12  8:07 ` Eli Zaretskii
@ 2018-10-12 12:02 ` Michael Albinus
  2018-10-12 12:43   ` Eli Zaretskii
  2018-10-14 15:17   ` Basil L. Contovounesios
  1 sibling, 2 replies; 22+ messages in thread
From: Michael Albinus @ 2018-10-12 12:02 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 33018

"Basil L. Contovounesios" <contovob@tcd.ie> writes:

Hi Basil,

> I attach a sample program test.el whose central function, test-slave,
> invokes wget asynchronously before waiting for the process to exit.
>
> The issue I'm facing is that running test-slave twice in succession,
> each time in a new thread, causes accept-process-output to hang with no
> output (unless a timeout argument is given, in which case the function
> returns nil) the second time around.

If you want a process to communicate in a given thread, you must call
`set-process-thread'. See the elisp manual.

> Thanks,

Best regards, Michael.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-12 12:02 ` Michael Albinus
@ 2018-10-12 12:43   ` Eli Zaretskii
  2018-10-12 12:49     ` Michael Albinus
  2018-10-14 15:17   ` Basil L. Contovounesios
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-12 12:43 UTC (permalink / raw)
  To: Michael Albinus; +Cc: contovob, 33018

> From: Michael Albinus <michael.albinus@gmx.de>
> Date: Fri, 12 Oct 2018 14:02:46 +0200
> Cc: 33018@debbugs.gnu.org
> 
> If you want a process to communicate in a given thread, you must call
> `set-process-thread'. See the elisp manual.

But the default is that the process is locked to the thread that
created it, so it sounds like this should have just worked (if that is
the problem).





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-12 12:43   ` Eli Zaretskii
@ 2018-10-12 12:49     ` Michael Albinus
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Albinus @ 2018-10-12 12:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: contovob, 33018

Eli Zaretskii <eliz@gnu.org> writes:

>> If you want a process to communicate in a given thread, you must call
>> `set-process-thread'. See the elisp manual.
>
> But the default is that the process is locked to the thread that
> created it, so it sounds like this should have just worked (if that is
> the problem).

I'm not sure. In the branch feature/tramp-thread-safe there were also
mysterious blockings in accept-process-output, until I've applied
set-process-thread explicitly.

At least it is worth a try.

Best regards, Michael.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-12  8:07 ` Eli Zaretskii
@ 2018-10-14 15:00   ` Basil L. Contovounesios
  2018-10-14 15:17     ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-14 15:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33018

Eli Zaretskii <eliz@gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob@tcd.ie>
>> Date: Thu, 11 Oct 2018 15:57:50 +0100
>> 
>> I attach a sample program test.el whose central function, test-slave,
>> invokes wget asynchronously before waiting for the process to exit.
>> 
>> The issue I'm facing is that running test-slave twice in succession,
>> each time in a new thread, causes accept-process-output to hang with no
>> output (unless a timeout argument is given, in which case the function
>> returns nil) the second time around.
>
> When the hang happens, is there any wget process still alive, or did
> they all exit?  Please use OS tools to find that out, don't rely on
> what Emacs thinks.

When the hang happens, the wget process launched by the waiting thread
is still alive but asleep (idle), as reported by ps and top.

-- 
Basil





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-12 12:02 ` Michael Albinus
  2018-10-12 12:43   ` Eli Zaretskii
@ 2018-10-14 15:17   ` Basil L. Contovounesios
  2018-10-15  8:02     ` Michael Albinus
  1 sibling, 1 reply; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-14 15:17 UTC (permalink / raw)
  To: Michael Albinus; +Cc: 33018

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

Michael Albinus <michael.albinus@gmx.de> writes:

> "Basil L. Contovounesios" <contovob@tcd.ie> writes:
>
>> I attach a sample program test.el whose central function, test-slave,
>> invokes wget asynchronously before waiting for the process to exit.
>>
>> The issue I'm facing is that running test-slave twice in succession,
>> each time in a new thread, causes accept-process-output to hang with no
>> output (unless a timeout argument is given, in which case the function
>> returns nil) the second time around.
>
> If you want a process to communicate in a given thread, you must call
> `set-process-thread'. See the elisp manual.

Thanks, this is the first thing I tried when earlier experiments started
to hang.  I tried both the following redundant but explicit call:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: test.diff --]
[-- Type: text/x-diff, Size: 616 bytes --]

diff -u --label /tmp/test.el --label \#\<buffer\ /tmp/test.el\> /tmp/test.el /tmp/buffer-content-hY0EDf
--- /tmp/test.el
+++ #<buffer /tmp/test.el>
@@ -24,6 +24,7 @@
                :command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs")
                :connection-type 'pipe
                :sentinel #'test-sentinel)))
+    (set-process-thread proc (current-thread))
     (while (eq (process-status proc) 'run)
       (test-debug proc 'accept-output (accept-process-output proc 5)))
     (test-debug proc 'exit (process-status proc) (process-exit-status proc))))

Diff finished.  Sun Oct 14 16:04:48 2018

[-- Attachment #3: Type: text/plain, Size: 295 bytes --]


as well as replacing (current-thread) with nil, to unlock the process.

Adding test-debug calls before and after accept-process-output revealed
nothing out of the ordinary, and explicitly un/locking the process
didn't fix the hang.  Should I be calling set-process-thread elsewhere?

-- 
Basil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-14 15:00   ` Basil L. Contovounesios
@ 2018-10-14 15:17     ` Eli Zaretskii
  2018-10-14 15:36       ` Basil L. Contovounesios
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-14 15:17 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 33018

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: <33018@debbugs.gnu.org>
> Date: Sun, 14 Oct 2018 16:00:56 +0100
> 
> > When the hang happens, is there any wget process still alive, or did
> > they all exit?  Please use OS tools to find that out, don't rely on
> > what Emacs thinks.
> 
> When the hang happens, the wget process launched by the waiting thread
> is still alive but asleep (idle), as reported by ps and top.

Any idea why is that?

And if this is the situation, doesn't it explain why
accept-process-output times out?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-14 15:17     ` Eli Zaretskii
@ 2018-10-14 15:36       ` Basil L. Contovounesios
  0 siblings, 0 replies; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-14 15:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33018

Eli Zaretskii <eliz@gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob@tcd.ie>
>> Cc: <33018@debbugs.gnu.org>
>> Date: Sun, 14 Oct 2018 16:00:56 +0100
>> 
>> > When the hang happens, is there any wget process still alive, or did
>> > they all exit?  Please use OS tools to find that out, don't rely on
>> > what Emacs thinks.
>> 
>> When the hang happens, the wget process launched by the waiting thread
>> is still alive but asleep (idle), as reported by ps and top.
>
> Any idea why is that?

No idea, sorry.  I haven't yet found the time to look into this deeper.

> And if this is the situation, doesn't it explain why
> accept-process-output times out?

Possibly, but the question is why do we enter either of these situations
(process sleeping and accept-process-output timing out) exclusively when
using Emacs threads, no?

By the way, the problem isn't specific to wget; the same thing happens
with curl.  I also find it strange that the first thread doesn't suffer
from the same problem as the second thread, especially given each thread
is created only after the last thread exited.

-- 
Basil





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-14 15:17   ` Basil L. Contovounesios
@ 2018-10-15  8:02     ` Michael Albinus
  2018-10-16  1:15       ` Basil L. Contovounesios
  0 siblings, 1 reply; 22+ messages in thread
From: Michael Albinus @ 2018-10-15  8:02 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 33018

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]

"Basil L. Contovounesios" <contovob@tcd.ie> writes:

Hi Basil,

>> If you want a process to communicate in a given thread, you must call
>> `set-process-thread'. See the elisp manual.
>
> Thanks, this is the first thing I tried when earlier experiments started
> to hang.  I tried both the following redundant but explicit call:

Well, I've played with your example. As Eli said, `set-process-thread'
is not needed here.

With your original example, I could reproduce the problem, However, if I
call

emacs -l test.el -f test-threads

the problem does NOT happen. My .emacs is quite long, so I didn't bisect
in order to find out what makes the difference.

I have changed your example a little bit wrt `thread-join', see
appended. This version runs w/o any problem even if emacs is called with
-Q. Maybe this helps you to debug further.

Best regards, Michael.


[-- Attachment #2: Type: text/plain, Size: 1919 bytes --]

;;; test.el --- test -*- lexical-binding: t -*-

;;; Code:

(defun test-debug (proc &rest args)
  "Print `current-thread' and ARGS in timestamped `message'."
  (message
   "[%s] %s"
   (format-time-string "%T:%3N")
   (mapconcat
    #'prin1-to-string `(,(current-thread) ,(process-thread proc) ,@args) " ")))

(defun test-sentinel (proc msg)
  "Kill buffer of PROC when it exits."
  (test-debug proc 'sentinel (process-status proc) (substring msg 0 -1))
  (and (not (eq (process-status proc) 'run))
       (buffer-live-p (process-buffer proc))
       (kill-buffer   (process-buffer proc))))

(defun test-slave ()
  "Create an asynchronous process and wait for it to exit."
  (let ((proc (make-process
               :name "test"
               :buffer (generate-new-buffer " *test*")
               :command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs")
               :connection-type 'pipe
               :sentinel #'test-sentinel)))
    (while (eq (process-status proc) 'run)
      (test-debug proc 'accept-output (accept-process-output proc 5)))
    (test-debug proc 'exit (process-status proc) (process-exit-status proc))))

(defun test-master (async)
  "Successively run `test-slave'.
With non-nil ASYNC, use a new thread for each invocation."
    (if async
	(progn
	  (dotimes (i 5) (make-thread #'test-slave (format "test-%d" i)))
	  (while (cdr (all-threads) (thread-join (cadr (all-threads))))))
      (dotimes (i 5)
	(test-slave))))

(defun test-job (async)
  "Successively run `test-slave'.
Like `test-master', but avoid blocking interactive sessions by
running `test-master' in its own thread."
  (if noninteractive
      (test-master async)
    (make-thread (apply-partially #'test-master async) "test-master")))

(defun test-threads ()
  "Run `test-master' with threads."
  (test-job t))

(defun test-no-threads ()
  "Run `test-master' without threads."
  (test-job nil))

;;; test.el ends here

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-15  8:02     ` Michael Albinus
@ 2018-10-16  1:15       ` Basil L. Contovounesios
  2018-10-16 13:54         ` Michael Albinus
  2018-10-16 14:58         ` Eli Zaretskii
  0 siblings, 2 replies; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-16  1:15 UTC (permalink / raw)
  To: Michael Albinus; +Cc: 33018

[-- Attachment #1: test.el --]
[-- Type: application/emacs-lisp, Size: 2012 bytes --]

[-- Attachment #2: Type: text/plain, Size: 912 bytes --]


Michael Albinus <michael.albinus@gmx.de> writes:

> Well, I've played with your example. As Eli said, `set-process-thread'
> is not needed here.
>
> With your original example, I could reproduce the problem, However, if I
> call
>
> emacs -l test.el -f test-threads
>
> the problem does NOT happen. My .emacs is quite long, so I didn't bisect
> in order to find out what makes the difference.
>
> I have changed your example a little bit wrt `thread-join', see
> appended. This version runs w/o any problem even if emacs is called with
> -Q. Maybe this helps you to debug further.

Thanks, creating all threads before waiting for any of them to exit
indeed does not suffer from the same hang.  Doing this twice (see
attached update), however, still hangs.

There's something about going through a complete create-join cycle more
than once within a non-main-thread which is triggering this behaviour.

-- 
Basil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-16  1:15       ` Basil L. Contovounesios
@ 2018-10-16 13:54         ` Michael Albinus
  2018-10-16 14:55           ` Basil L. Contovounesios
  2018-10-16 14:58         ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Michael Albinus @ 2018-10-16 13:54 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 33018

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

"Basil L. Contovounesios" <contovob@tcd.ie> writes:

> There's something about going through a complete create-join cycle more
> than once within a non-main-thread which is triggering this behaviour.

I'm not sure that it is related to threads. It looks, like some of your
processes do not exit properly, and then thread-join is blocked.

I've modified your example, again. It runs perfectly. And during its
work, you could call "M-x list-threads" and see how the threads are
created and die.

Best regards, Michael.


[-- Attachment #2: Type: text/plain, Size: 1241 bytes --]

diff -u /home/albinus/tmp/test.el.\~1\~ /home/albinus/tmp/test.el
--- /home/albinus/tmp/test.el.~1~	2018-10-16 14:46:47.000000000 +0200
+++ /home/albinus/tmp/test.el	2018-10-16 15:47:05.581276541 +0200
@@ -19,9 +19,9 @@
 (defun test-slave ()
   "Create an asynchronous process and wait for it to exit."
   (let ((proc (make-process
-               :name "test"
+               :name (thread-name (current-thread))
                :buffer (generate-new-buffer " *test*")
-               :command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs")
+               :command `("echo" ,(thread-name (current-thread)))
                :connection-type 'pipe
                :sentinel #'test-sentinel)))
     (while (eq (process-status proc) 'run)
@@ -32,10 +32,10 @@
   "Successively run `test-slave'.
 With non-nil ASYNC, use a new thread for each invocation."
   (if async
-      (dotimes (_ 2)
+      (dotimes (i 100)
         (let (threads)
-          (dotimes (i 2)
-            (push (make-thread #'test-slave (format "test-%d" i)) threads))
+          (dotimes (j 100)
+            (push (make-thread #'test-slave (format "test-%d-%d" i j)) threads))
           (mapc #'thread-join threads)))
     (dotimes (_ 2)
       (test-slave))))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-16 13:54         ` Michael Albinus
@ 2018-10-16 14:55           ` Basil L. Contovounesios
  0 siblings, 0 replies; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-16 14:55 UTC (permalink / raw)
  To: Michael Albinus; +Cc: 33018

Michael Albinus <michael.albinus@gmx.de> writes:

> "Basil L. Contovounesios" <contovob@tcd.ie> writes:
>
>> There's something about going through a complete create-join cycle more
>> than once within a non-main-thread which is triggering this behaviour.
>
> I'm not sure that it is related to threads.

It has to be, because there is never an issue when I run the same
asynchronous wget processes without threads, and with threads the hang
reliably occurs 100% of the time.

> It looks, like some of your processes do not exit properly, and then
> thread-join is blocked.

Indeed, but there is something about the interaction of Emacs threads
and subprocesses which is causing unsuccessful process termination.

Note that I am not ruling out pilot error; I simply haven't debugged
this issue any deeper yet.  The fact that no-one has yet pointed out any
obvious blunders on my part gives me more confidence that there is
indeed some ghost in the wire.

> I've modified your example, again. It runs perfectly.

Indeed, there are many subprocess-within-a-thread examples which don't
suffer from a hang, e.g. by using a different URL.  I would like to get
to the bottom of why network programs like wget/curl in particular
eventually hang, though.

Thanks,

-- 
Basil





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-16  1:15       ` Basil L. Contovounesios
  2018-10-16 13:54         ` Michael Albinus
@ 2018-10-16 14:58         ` Eli Zaretskii
  2018-10-17 17:37           ` Basil L. Contovounesios
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-16 14:58 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: michael.albinus, 33018

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Date: Tue, 16 Oct 2018 02:15:27 +0100
> Cc: 33018@debbugs.gnu.org
> 
> Thanks, creating all threads before waiting for any of them to exit
> indeed does not suffer from the same hang.  Doing this twice (see
> attached update), however, still hangs.
> 
> There's something about going through a complete create-join cycle more
> than once within a non-main-thread which is triggering this behaviour.

Can you attach a debugger to the wget process that's stuck, and see
where it is stuck?  You will probably need to rebuild wget with debug
info, or install one from your package repository (if they offer
such).  This could give us hints for where to look further.

Thanks.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-16 14:58         ` Eli Zaretskii
@ 2018-10-17 17:37           ` Basil L. Contovounesios
  2018-10-17 17:56             ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-17 17:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, 33018

[-- Attachment #1: gdb.txt --]
[-- Type: text/plain, Size: 3050 bytes --]

#0  0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096)
    at ../sysdeps/unix/sysv/linux/write.c:27
#1  0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>, 
    data=0x5555559c1b30, n=4096) at fileops.c:1203
#2  0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>, 
    data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., to_do=to_do@entry=4096) at fileops.c:457
#3  0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, 
    to_do=4096) at fileops.c:433
#4  0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
    at fileops.c:813
#5  0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>) at iofflush.c:40
#6  0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0, 
    buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., bufsize=4096, skip=0x7fffffffd0e8, 
    written=0x7fffffffd0e0) at retr.c:207
#7  0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4, 
    out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20, 
    qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498
#8  0x000055555557d09d in read_response_body (hs=0x7fffffffd9d0, sock=4, 
    fp=0x7ffff7739760 <_IO_2_1_stdout_>, contlen=198224, contrange=0, 
    chunked_transfer_encoding=false, url=0x5555555ff920 "https://en.wikipedia.org/wiki/Emacs", 
    warc_timestamp_str=0x7fffffffd470 "\f\335\377\377\377\177", 
    warc_request_uuid=0x7fffffffd440 ",", warc_ip=0x0, type=0x5555559c8fc0 "text/html", 
    statcode=200, 
    head=0x5555559b35d0 "HTTP/1.1 200 OK\r\nDate: Wed, 17 Oct 2018 17:29:35 GMT\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Length: 198224\r\nConnection: keep-alive\r\nServer: mw1325.eqiad.wmnet\r\nVary: Accept-Encoding,Cookie,A"...) at http.c:1717
#9  0x0000555555581b06 in gethttp (u=0x5555555ff8b0, original_url=0x5555555ff8b0, 
    hs=0x7fffffffd9d0, dt=0x7fffffffdd0c, proxy=0x0, iri=0x5555555ff880, count=1) at http.c:4129
#10 0x0000555555582239 in http_loop (u=0x5555555ff8b0, original_url=0x5555555ff8b0, 
    newloc=0x7fffffffdb68, local_file=0x7fffffffdb58, referer=0x0, dt=0x7fffffffdd0c, 
    proxy=0x0, iri=0x5555555ff880) at http.c:4363
#11 0x0000555555592ca4 in retrieve_url (orig_parsed=0x5555555ff8b0, 
    origurl=0x5555555ff490 "https://en.wikipedia.org/wiki/Emacs", file=0x7fffffffdd18, 
    newloc=0x7fffffffdd10, refurl=0x0, dt=0x7fffffffdd0c, recursive=false, iri=0x5555555ff880, 
    register_status=true) at retr.c:957
#12 0x000055555558b33e in main (argc=3, argv=0x7fffffffdef8) at main.c:2158

[-- Attachment #2: Type: text/plain, Size: 789 bytes --]


Eli Zaretskii <eliz@gnu.org> writes:

> Can you attach a debugger to the wget process that's stuck, and see
> where it is stuck?  You will probably need to rebuild wget with debug
> info, or install one from your package repository (if they offer
> such).  This could give us hints for where to look further.

I did the following:

0. Build wget 1.19.5 from Debian's repositories:
   apt-get build-dep wget
   apt-get source wget
   configure CC='ccache gcc' CFLAGS='-O0 -g3 -ggdb -gdwarf-4 -pipe'
             --config-cache --enable-assert --with-gnu-ld
   make
1. Substitute resulting wget file name in original test.el program
2. emacs26 -Q -l test.el -f test-threads
3. gdb -p <pid of stuck wget>
4. set logging on
5. bt

I attach the resulting gdb.txt log file.

Thanks,

-- 
Basil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 17:37           ` Basil L. Contovounesios
@ 2018-10-17 17:56             ` Eli Zaretskii
  2018-10-17 18:05               ` Basil L. Contovounesios
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-17 17:56 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: michael.albinus, 33018

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
> Date: Wed, 17 Oct 2018 18:37:00 +0100
> 
> #0  0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096)
>     at ../sysdeps/unix/sysv/linux/write.c:27
> #1  0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>, 
>     data=0x5555559c1b30, n=4096) at fileops.c:1203
> #2  0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>, 
>     data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., to_do=to_do@entry=4096) at fileops.c:457
> #3  0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, 
>     to_do=4096) at fileops.c:433
> #4  0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
>     at fileops.c:813
> #5  0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>) at iofflush.c:40
> #6  0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0, 
>     buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., bufsize=4096, skip=0x7fffffffd0e8, 
>     written=0x7fffffffd0e0) at retr.c:207
> #7  0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4, 
>     out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20, 
>     qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498

Looks like the buffer of the pipe through which Emacs reads the stuff
is full, and wget waits for some space there?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 17:56             ` Eli Zaretskii
@ 2018-10-17 18:05               ` Basil L. Contovounesios
  2018-10-17 18:20                 ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-17 18:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, 33018

Eli Zaretskii <eliz@gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob@tcd.ie>
>> Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
>> Date: Wed, 17 Oct 2018 18:37:00 +0100
>> 
>> #0  0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096)
>>     at ../sysdeps/unix/sysv/linux/write.c:27
>> #1  0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>, 
>>     data=0x5555559c1b30, n=4096) at fileops.c:1203
>> #2  0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>, 
>>     data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a
>> href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is
>> designed for manipulating pieces of text, although it is capable of formatting
>> and print"..., to_do=to_do@entry=4096) at fileops.c:457
>> #3  0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, 
>>     to_do=4096) at fileops.c:433
>> #4  0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
>>     at fileops.c:813
>> #5 0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
>> at iofflush.c:40
>> #6  0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0, 
>>     buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a
>> href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is
>> designed for manipulating pieces of text, although it is capable of formatting
>> and print"..., bufsize=4096, skip=0x7fffffffd0e8,
>>     written=0x7fffffffd0e0) at retr.c:207
>> #7  0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4, 
>>     out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20, 
>>     qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498
>
> Looks like the buffer of the pipe through which Emacs reads the stuff
> is full, and wget waits for some space there?

Would that imply that different threads/processes are (re)using the same
buffer/pipe?

FWIW, strace -p <pip of stuck emacs> gives:

pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8}

and strace -p <pip of stuck wget> gives:

write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096

The former reminded me of bug#24201: https://debbugs.gnu.org/24201

-- 
Basil





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 18:05               ` Basil L. Contovounesios
@ 2018-10-17 18:20                 ` Eli Zaretskii
  2018-10-17 18:25                   ` Eli Zaretskii
  2018-10-17 20:02                   ` Basil L. Contovounesios
  0 siblings, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-17 18:20 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: michael.albinus, 33018

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
> Date: Wed, 17 Oct 2018 19:05:59 +0100
> 
> > Looks like the buffer of the pipe through which Emacs reads the stuff
> > is full, and wget waits for some space there?
> 
> Would that imply that different threads/processes are (re)using the same
> buffer/pipe?

Could be, but it's more likely that Emacs simply doesn't read the
output from wget.

> pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8}
> 
> and strace -p <pip of stuck wget> gives:
> 
> write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096
> 
> The former reminded me of bug#24201: https://debbugs.gnu.org/24201

I could be wrong, but it doesn't look similar to me.  In that bug, the
CPU was pegged, whereas you said that CPU is idle.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 18:20                 ` Eli Zaretskii
@ 2018-10-17 18:25                   ` Eli Zaretskii
  2018-10-17 20:46                     ` Basil L. Contovounesios
  2018-10-17 20:02                   ` Basil L. Contovounesios
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-17 18:25 UTC (permalink / raw)
  To: contovob; +Cc: michael.albinus, 33018

> Date: Wed, 17 Oct 2018 21:20:22 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: michael.albinus@gmx.de, 33018@debbugs.gnu.org
> 
> > From: "Basil L. Contovounesios" <contovob@tcd.ie>
> > Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
> > Date: Wed, 17 Oct 2018 19:05:59 +0100
> > 
> > > Looks like the buffer of the pipe through which Emacs reads the stuff
> > > is full, and wget waits for some space there?
> > 
> > Would that imply that different threads/processes are (re)using the same
> > buffer/pipe?
> 
> Could be, but it's more likely that Emacs simply doesn't read the
> output from wget.

I think the relevant code should be instrumented to show which thread
waits for what process(es).

Btw, are you sure this is not a bug in your program?  Michael caused
your program to work twice by simple changes, AFAIU, no?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 18:20                 ` Eli Zaretskii
  2018-10-17 18:25                   ` Eli Zaretskii
@ 2018-10-17 20:02                   ` Basil L. Contovounesios
  1 sibling, 0 replies; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-17 20:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, 33018

Eli Zaretskii <eliz@gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob@tcd.ie>
>> Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
>> Date: Wed, 17 Oct 2018 19:05:59 +0100
>> 
>> pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8}
>> 
>> and strace -p <pip of stuck wget> gives:
>> 
>> write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096
>> 
>> The former reminded me of bug#24201: https://debbugs.gnu.org/24201
>
> I could be wrong, but it doesn't look similar to me.  In that bug, the
> CPU was pegged, whereas you said that CPU is idle.

Right, I was only reminded of that bug because of the common
accept-process-output/pselect hang in the context of a network-related
process.  I'm not suggesting the underlying cause is the same.

-- 
Basil





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 18:25                   ` Eli Zaretskii
@ 2018-10-17 20:46                     ` Basil L. Contovounesios
  2018-10-20  8:34                       ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Basil L. Contovounesios @ 2018-10-17 20:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, 33018

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Wed, 17 Oct 2018 21:20:22 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: michael.albinus@gmx.de, 33018@debbugs.gnu.org
>> 
>> > From: "Basil L. Contovounesios" <contovob@tcd.ie>
>> > Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
>> > Date: Wed, 17 Oct 2018 19:05:59 +0100
>> > 
>> > > Looks like the buffer of the pipe through which Emacs reads the stuff
>> > > is full, and wget waits for some space there?
>> > 
>> > Would that imply that different threads/processes are (re)using the same
>> > buffer/pipe?
>> 
>> Could be, but it's more likely that Emacs simply doesn't read the
>> output from wget.
>
> I think the relevant code should be instrumented to show which thread
> waits for what process(es).

Each thread launches a single wget process which it then waits for
before dying, and current-thread is always eq to that thread around
calls to accept-process-output.  Or are you talking about some other
type of thread?

Either way, I'll report back when I've had a deeper look into what Emacs
is doing, unless someone beats me to it.

> Btw, are you sure this is not a bug in your program?

No, but the fact that it reliably works when Emacs is run with -batch,
and reliably hangs when run with -Q is at least somewhat intriguing.

> Michael caused your program to work twice by simple changes, AFAIU,
> no?

Michael avoided the hang by rewriting the create-join-...-create-join
sequence as create-create-...-join-join.  But if the latter is done
twice from within the same master thread, the hang still occurs.  As I
said:

> There's something about going through a complete create-join cycle more
> than once within a non-main-thread which is triggering this behaviour.

(Actually, I haven't checked whether the hang occurs when two
 create-join cycles are completed within main-thread; I was just
 specifically describing my sample program.)

In his second rewrite, Michael replaced wget with echo, which does not
suffer from any hangs.  As I said:

> Indeed, there are many subprocess-within-a-thread examples which don't
> suffer from a hang, e.g. by using a different URL.  I would like to get
> to the bottom of why network programs like wget/curl in particular
> eventually hang, though.

In other words, I don't (yet) see why my recipe shouldn't work, and I'm
curious to eventually get to the bottom of this.

I'm sorry I've been talking more than doing, but university and house
hunting will dominate my free time for the next few weeks.

Thanks,

-- 
Basil





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#33018: 26.1.50; thread starvation with async processes and accept-process-output
  2018-10-17 20:46                     ` Basil L. Contovounesios
@ 2018-10-20  8:34                       ` Eli Zaretskii
  0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2018-10-20  8:34 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: michael.albinus, 33018

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: <michael.albinus@gmx.de>,  <33018@debbugs.gnu.org>
> Date: Wed, 17 Oct 2018 21:46:39 +0100
> 
> > I think the relevant code should be instrumented to show which thread
> > waits for what process(es).
> 
> Each thread launches a single wget process which it then waits for
> before dying, and current-thread is always eq to that thread around
> calls to accept-process-output.  Or are you talking about some other
> type of thread?

I was talking about low-level details: we set up the file-descriptor
mask passed to pselect, to tell it which descriptors to wait on.  The
call to accept-process-output is supposed to arrange for the
descriptor where the corresponding process will write to be one of
those on which the corresponding pselect will wait.  I was thinking
that perhaps we become confused and don't ask pselect called by a
thread to wait on the process which was launched by that thread.

> Either way, I'll report back when I've had a deeper look into what Emacs
> is doing, unless someone beats me to it.

Thanks.

> I'm sorry I've been talking more than doing, but university and house
> hunting will dominate my free time for the next few weeks.

No need to be sorry, we all have our lives.





^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-10-20  8:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-11 14:57 bug#33018: 26.1.50; thread starvation with async processes and accept-process-output Basil L. Contovounesios
2018-10-12  8:07 ` Eli Zaretskii
2018-10-14 15:00   ` Basil L. Contovounesios
2018-10-14 15:17     ` Eli Zaretskii
2018-10-14 15:36       ` Basil L. Contovounesios
2018-10-12 12:02 ` Michael Albinus
2018-10-12 12:43   ` Eli Zaretskii
2018-10-12 12:49     ` Michael Albinus
2018-10-14 15:17   ` Basil L. Contovounesios
2018-10-15  8:02     ` Michael Albinus
2018-10-16  1:15       ` Basil L. Contovounesios
2018-10-16 13:54         ` Michael Albinus
2018-10-16 14:55           ` Basil L. Contovounesios
2018-10-16 14:58         ` Eli Zaretskii
2018-10-17 17:37           ` Basil L. Contovounesios
2018-10-17 17:56             ` Eli Zaretskii
2018-10-17 18:05               ` Basil L. Contovounesios
2018-10-17 18:20                 ` Eli Zaretskii
2018-10-17 18:25                   ` Eli Zaretskii
2018-10-17 20:46                     ` Basil L. Contovounesios
2018-10-20  8:34                       ` Eli Zaretskii
2018-10-17 20:02                   ` Basil L. Contovounesios

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).