unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
       [not found] ` <20210116184949.3C17C211A5@vcs0.savannah.gnu.org>
@ 2021-01-19 15:10   ` Eli Zaretskii
  2021-01-19 18:21     ` Philipp Stephani
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-19 15:10 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: phst, emacs-devel

> Date: Sat, 16 Jan 2021 13:49:49 -0500 (EST)
> From: p.stephani2@gmail.com (Philipp Stephani)
> 
> branch: scratch/sigchld-fd
> commit 8f0ce42d3eb9b212424a4a25a376287ffc94a73e
> Author: Philipp Stephani <phst@google.com>
> Commit: Philipp Stephani <phst@google.com>
> 
>     Fix deadlock when receiving SIGCHLD during 'pselect'.
>     
>     If we receive and handle a SIGCHLD signal for a process while waiting
>     for that process, 'pselect' might never return.  Instead, we have to
>     explicitly 'pselect' that the process status has changed.  We do this
>     by writing to a pipe in the SIGCHLD handler and having
>     'wait_reading_process_output' select on it.
>     
>     * src/process.c (child_signal_init): New helper function to create a
>     pipe for SIGCHLD notifications.
>     (child_signal_read, child_signal_notify): New helper functions to
>     read from/write to the child signal pipe.
>     (create_process): Initialize the child signal pipe on first use.
>     (handle_child_signal): Notify waiters that a process status has
>     changed.
>     (wait_reading_process_output): Make sure that we also catch
>     SIGCHLD/process status changes.
>     
>     * test/src/process-tests.el
>     (process-tests/fd-setsize-no-crash/make-process): Remove workaround,
>     which is no longer needed.

Philipp, can you please elaborate about this changeset (which was in
the meantime merged with master)?  The comments you added mention some
kind of deadlock, but don't describe what kind and in which situations
it could happen, and without that it's hard to understand what
problem(s) it tries to solve.

In general, pselect is supposed to return with EINTR when SIGCHLD
happoens while we are inside the call to pselect, and EINTR seems to
be already handled by wait_reading_process_output.  So I wonder why we
need that additional "self-pipe" to be watched by pselect.

In addition, AFAIU this pipe should not be needed on MS-Windows, where
the pselect emulation waits on the sub-process handles together with
the other file descriptors, and so gets awakened when a process exits
or dies.  But again, without knowing the exact situations against
which this changeset tries to protect, it is hard to make a decision.

Thanks.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 15:10   ` scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect' Eli Zaretskii
@ 2021-01-19 18:21     ` Philipp Stephani
  2021-01-19 19:14       ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Stephani @ 2021-01-19 18:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers

Am Di., 19. Jan. 2021 um 16:09 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > Date: Sat, 16 Jan 2021 13:49:49 -0500 (EST)
> > From: p.stephani2@gmail.com (Philipp Stephani)
> >
> > branch: scratch/sigchld-fd
> > commit 8f0ce42d3eb9b212424a4a25a376287ffc94a73e
> > Author: Philipp Stephani <phst@google.com>
> > Commit: Philipp Stephani <phst@google.com>
> >
> >     Fix deadlock when receiving SIGCHLD during 'pselect'.
> >
> >     If we receive and handle a SIGCHLD signal for a process while waiting
> >     for that process, 'pselect' might never return.  Instead, we have to
> >     explicitly 'pselect' that the process status has changed.  We do this
> >     by writing to a pipe in the SIGCHLD handler and having
> >     'wait_reading_process_output' select on it.
> >
> >     * src/process.c (child_signal_init): New helper function to create a
> >     pipe for SIGCHLD notifications.
> >     (child_signal_read, child_signal_notify): New helper functions to
> >     read from/write to the child signal pipe.
> >     (create_process): Initialize the child signal pipe on first use.
> >     (handle_child_signal): Notify waiters that a process status has
> >     changed.
> >     (wait_reading_process_output): Make sure that we also catch
> >     SIGCHLD/process status changes.
> >
> >     * test/src/process-tests.el
> >     (process-tests/fd-setsize-no-crash/make-process): Remove workaround,
> >     which is no longer needed.
>
> Philipp, can you please elaborate about this changeset (which was in
> the meantime merged with master)?  The comments you added mention some
> kind of deadlock, but don't describe what kind and in which situations
> it could happen, and without that it's hard to understand what
> problem(s) it tries to solve.

I had hoped that the fixed unit test and the commit description would
be clear enough. The issue appears to be that accept-process-output
frequently hangs (doesn't return) even though the process has already
finished. That's what process-tests/fd-setsize-no-crash/make-process
relies on. After some printf debugging, I concluded that
wait_reading_process_output wouldn't be notified about the process
status change in any way, and thus hang forever. A form like (sit-for
0.1) would trigger the process status change to be processed.

>
> In general, pselect is supposed to return with EINTR when SIGCHLD
> happoens while we are inside the call to pselect, and EINTR seems to
> be already handled by wait_reading_process_output.  So I wonder why we
> need that additional "self-pipe" to be watched by pselect.

Yes, I'm wondering about that as well, but it's definitely the
behavior I see. Before commiting to master, I ran the test
process-tests/fd-setsize-no-crash/make-process multiple times with and
without the commit, and the outcome was clear: without the commit
accept-process-output would frequently hang, with the commit it never
hangs.
This is pure speculation, but I could imagine multiple things going on:
- Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD.
- Maybe EINTR is returned too early, before the signal handler got the
chance to update the process status.

>
> In addition, AFAIU this pipe should not be needed on MS-Windows, where
> the pselect emulation waits on the sub-process handles together with
> the other file descriptors, and so gets awakened when a process exits
> or dies.  But again, without knowing the exact situations against
> which this changeset tries to protect, it is hard to make a decision.
>

It's definitely not needed on Windows, which has a superior mechanism
anyway (process handles are waitable objects in Windows). I opted to
create the additional pipe on Windows as well - the costs should be
small, and it keeps the code more consistent between the operating
systems.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 18:21     ` Philipp Stephani
@ 2021-01-19 19:14       ` Eli Zaretskii
  2021-01-19 20:22         ` Philipp Stephani
  2021-01-19 20:46         ` Andreas Schwab
  0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-19 19:14 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: phst, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 19 Jan 2021 19:21:39 +0100
> Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com>
> > In general, pselect is supposed to return with EINTR when SIGCHLD
> > happoens while we are inside the call to pselect, and EINTR seems to
> > be already handled by wait_reading_process_output.  So I wonder why we
> > need that additional "self-pipe" to be watched by pselect.
> 
> Yes, I'm wondering about that as well, but it's definitely the
> behavior I see. Before commiting to master, I ran the test
> process-tests/fd-setsize-no-crash/make-process multiple times with and
> without the commit, and the outcome was clear: without the commit
> accept-process-output would frequently hang, with the commit it never
> hangs.
> This is pure speculation, but I could imagine multiple things going on:
> - Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD.
> - Maybe EINTR is returned too early, before the signal handler got the
> chance to update the process status.

I'd be happier if we had some direct evidence to these effects.  I'd
also be surprised to hear that pselect doesn't return with EINTR when
SIGCHLD comes in.  It is more likely that SIGCHLD is delivered before
we call pselect, but if that is the case, we should be able to
reliably detect that, I think.

> > In addition, AFAIU this pipe should not be needed on MS-Windows, where
> > the pselect emulation waits on the sub-process handles together with
> > the other file descriptors, and so gets awakened when a process exits
> > or dies.  But again, without knowing the exact situations against
> > which this changeset tries to protect, it is hard to make a decision.
> 
> It's definitely not needed on Windows, which has a superior mechanism
> anyway (process handles are waitable objects in Windows). I opted to
> create the additional pipe on Windows as well - the costs should be
> small, and it keeps the code more consistent between the operating
> systems.

The thing is, on Windows we can only wait on up to 64 handles (unless
we complicate the code with multilevel wait, that is), so every
unnecessary descriptor we need to wait on means we can support fewer
simultaneous subprocesses.  We are already limited to just 32
subprocesses, which is quite low a number.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 19:14       ` Eli Zaretskii
@ 2021-01-19 20:22         ` Philipp Stephani
  2021-01-20 15:05           ` Eli Zaretskii
  2021-01-19 20:46         ` Andreas Schwab
  1 sibling, 1 reply; 23+ messages in thread
From: Philipp Stephani @ 2021-01-19 20:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers

Am Di., 19. Jan. 2021 um 20:14 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Tue, 19 Jan 2021 19:21:39 +0100
> > Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com>
> > > In general, pselect is supposed to return with EINTR when SIGCHLD
> > > happoens while we are inside the call to pselect, and EINTR seems to
> > > be already handled by wait_reading_process_output.  So I wonder why we
> > > need that additional "self-pipe" to be watched by pselect.
> >
> > Yes, I'm wondering about that as well, but it's definitely the
> > behavior I see. Before commiting to master, I ran the test
> > process-tests/fd-setsize-no-crash/make-process multiple times with and
> > without the commit, and the outcome was clear: without the commit
> > accept-process-output would frequently hang, with the commit it never
> > hangs.
> > This is pure speculation, but I could imagine multiple things going on:
> > - Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD.
> > - Maybe EINTR is returned too early, before the signal handler got the
> > chance to update the process status.
>
> I'd be happier if we had some direct evidence to these effects.  I'd
> also be surprised to hear that pselect doesn't return with EINTR when
> SIGCHLD comes in.  It is more likely that SIGCHLD is delivered before
> we call pselect, but if that is the case, we should be able to
> reliably detect that, I think.

So I've added a ton of logging to process.c, and the series of events
I observe (without the patch) is as follows (line numbers are
approximate due to the logging statements):

process.c:4729: Faccept_process_output: enter
process.c:5139: wait_reading_process_output: enter
process.c:5193: wait_reading_process_output: outer loop
process.c:5322: wait_reading_process_output: update_tick = 261,
process_tick = 261
process.c:5554: wait_reading_process_output: before pselect; max_desc = 1019
process.c:5601: wait_reading_process_output: after pselect: nfds = -1
process.c:5641: wait_reading_process_output: EINTR
process.c:5193: wait_reading_process_output: outer loop
process.c:5322: wait_reading_process_output: update_tick = 261,
process_tick = 261
process.c:7189: handle_child_signal: enter
process.c:7234: handle_child_signal: process test 5: change status to
0; new process_tick = 262
process.c:144: handle_child_signal: leave
process.c:5554: wait_reading_process_output: before pselect; max_desc = 1017

and then Emacs hangs.
So wait_reading_process_output indeed first receives EINTR, loops
around, and checks for the process_tick change before SIGCHLD is
handled. By the time it reruns pselect it's too late.

>
> > > In addition, AFAIU this pipe should not be needed on MS-Windows, where
> > > the pselect emulation waits on the sub-process handles together with
> > > the other file descriptors, and so gets awakened when a process exits
> > > or dies.  But again, without knowing the exact situations against
> > > which this changeset tries to protect, it is hard to make a decision.
> >
> > It's definitely not needed on Windows, which has a superior mechanism
> > anyway (process handles are waitable objects in Windows). I opted to
> > create the additional pipe on Windows as well - the costs should be
> > small, and it keeps the code more consistent between the operating
> > systems.
>
> The thing is, on Windows we can only wait on up to 64 handles (unless
> we complicate the code with multilevel wait, that is), so every
> unnecessary descriptor we need to wait on means we can support fewer
> simultaneous subprocesses.  We are already limited to just 32
> subprocesses, which is quite low a number.

OK, that's a good point, I didn't know about this limitation. I'll see
that I can remove the pipe on Windows.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 19:14       ` Eli Zaretskii
  2021-01-19 20:22         ` Philipp Stephani
@ 2021-01-19 20:46         ` Andreas Schwab
  2021-01-19 20:58           ` Philipp Stephani
  2021-01-20  3:37           ` Eli Zaretskii
  1 sibling, 2 replies; 23+ messages in thread
From: Andreas Schwab @ 2021-01-19 20:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, Philipp Stephani, emacs-devel

On Jan 19 2021, Eli Zaretskii wrote:

> I'd be happier if we had some direct evidence to these effects.  I'd
> also be surprised to hear that pselect doesn't return with EINTR when
> SIGCHLD comes in.  It is more likely that SIGCHLD is delivered before
> we call pselect, but if that is the case, we should be able to
> reliably detect that, I think.

If you want reliable detection of SIGCHLD, you need to block the signal
around pselect and let pselect unblock it, atomically.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 20:46         ` Andreas Schwab
@ 2021-01-19 20:58           ` Philipp Stephani
  2021-01-20 18:07             ` Andreas Schwab
  2021-01-20  3:37           ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Philipp Stephani @ 2021-01-19 20:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, Philipp Stephani, Emacs developers

Am Di., 19. Jan. 2021 um 21:46 Uhr schrieb Andreas Schwab
<schwab@linux-m68k.org>:
>
> On Jan 19 2021, Eli Zaretskii wrote:
>
> > I'd be happier if we had some direct evidence to these effects.  I'd
> > also be surprised to hear that pselect doesn't return with EINTR when
> > SIGCHLD comes in.  It is more likely that SIGCHLD is delivered before
> > we call pselect, but if that is the case, we should be able to
> > reliably detect that, I think.
>
> If you want reliable detection of SIGCHLD, you need to block the signal
> around pselect and let pselect unblock it, atomically.

Right. It looks like we're not doing that because we want to support
systems that don't have pselect (and it's gnulib emulation can't be
race-free).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 20:46         ` Andreas Schwab
  2021-01-19 20:58           ` Philipp Stephani
@ 2021-01-20  3:37           ` Eli Zaretskii
  2021-01-20  8:37             ` Andreas Schwab
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20  3:37 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Philipp Stephani <p.stephani2@gmail.com>,  phst@google.com,
>   emacs-devel@gnu.org
> Date: Tue, 19 Jan 2021 21:46:46 +0100
> 
> On Jan 19 2021, Eli Zaretskii wrote:
> 
> > I'd be happier if we had some direct evidence to these effects.  I'd
> > also be surprised to hear that pselect doesn't return with EINTR when
> > SIGCHLD comes in.  It is more likely that SIGCHLD is delivered before
> > we call pselect, but if that is the case, we should be able to
> > reliably detect that, I think.
> 
> If you want reliable detection of SIGCHLD, you need to block the signal
> around pselect and let pselect unblock it, atomically.

AFAIU, this is an issue for programs that install a SIGCHLD handler
immediately before calling pselect.  But Emacs has a SIGCHLD handler
installed at all times, so if the signal hits outside of the pselect
call, we should be able to detect that reliably.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20  3:37           ` Eli Zaretskii
@ 2021-01-20  8:37             ` Andreas Schwab
  2021-01-20  8:52               ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20  8:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel

On Jan 20 2021, Eli Zaretskii wrote:

> AFAIU, this is an issue for programs that install a SIGCHLD handler
> immediately before calling pselect.  But Emacs has a SIGCHLD handler
> installed at all times, so if the signal hits outside of the pselect
> call, we should be able to detect that reliably.

No, the race between checking for the signal and entering pselect still
exists.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20  8:37             ` Andreas Schwab
@ 2021-01-20  8:52               ` Eli Zaretskii
  2021-01-20  9:18                 ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20  8:52 UTC (permalink / raw)
  To: emacs-devel, Andreas Schwab; +Cc: phst, p.stephani2

On January 20, 2021 10:37:31 AM GMT+02:00, Andreas Schwab <schwab@linux-m68k.org> wrote:
> On Jan 20 2021, Eli Zaretskii wrote:
> 
> > AFAIU, this is an issue for programs that install a SIGCHLD handler
> > immediately before calling pselect.  But Emacs has a SIGCHLD handler
> > installed at all times, so if the signal hits outside of the pselect
> > call, we should be able to detect that reliably.
> 
> No, the race between checking for the signal and entering pselect
> still
> exists.

Please tell more, especially what is meant by "checking for the signal".  We have a handler installed, so who is or should be checking for it?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20  8:52               ` Eli Zaretskii
@ 2021-01-20  9:18                 ` Andreas Schwab
  2021-01-20 10:14                   ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20  9:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel

On Jan 20 2021, Eli Zaretskii wrote:

> Please tell more, especially what is meant by "checking for the signal".  We have a handler installed, so who is or should be checking for it?

The check that the signal occured and what to pass to pselect so it
doesn't block waiting for the condition that just occured.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20  9:18                 ` Andreas Schwab
@ 2021-01-20 10:14                   ` Eli Zaretskii
  2021-01-20 10:17                     ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20 10:14 UTC (permalink / raw)
  To: emacs-devel, Andreas Schwab; +Cc: phst, p.stephani2

On January 20, 2021 11:18:19 AM GMT+02:00, Andreas Schwab <schwab@linux-m68k.org> wrote:
> On Jan 20 2021, Eli Zaretskii wrote:
> 
> > Please tell more, especially what is meant by "checking for the
> signal".  We have a handler installed, so who is or should be checking
> for it?
> 
> The check that the signal occured and what to pass to pselect so it
> doesn't block waiting for the condition that just occured.

If/when the signal arrives, the handler removes the file descriptor of the process from the set on which pselect will wait.  Isn't that sufficient to prevent us from waiting for a dead process?
If not, why not?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 10:14                   ` Eli Zaretskii
@ 2021-01-20 10:17                     ` Andreas Schwab
  2021-01-20 15:30                       ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20 10:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel

On Jan 20 2021, Eli Zaretskii wrote:

> If/when the signal arrives, the handler removes the file descriptor of the process from the set on which pselect will wait.  Isn't that sufficient to prevent us from waiting for a dead process?

Who removes it?

> If not, why not?

What happens when the signal occurs between setting up the set and
calling pselect?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 20:22         ` Philipp Stephani
@ 2021-01-20 15:05           ` Eli Zaretskii
  2021-01-23 17:36             ` Philipp Stephani
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20 15:05 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: phst, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 19 Jan 2021 21:22:04 +0100
> Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com>
> process.c:4729: Faccept_process_output: enter
> process.c:5139: wait_reading_process_output: enter
> process.c:5193: wait_reading_process_output: outer loop
> process.c:5322: wait_reading_process_output: update_tick = 261,
> process_tick = 261
> process.c:5554: wait_reading_process_output: before pselect; max_desc = 1019
> process.c:5601: wait_reading_process_output: after pselect: nfds = -1
> process.c:5641: wait_reading_process_output: EINTR
> process.c:5193: wait_reading_process_output: outer loop
> process.c:5322: wait_reading_process_output: update_tick = 261,
> process_tick = 261
> process.c:7189: handle_child_signal: enter
> process.c:7234: handle_child_signal: process test 5: change status to
> 0; new process_tick = 262
> process.c:144: handle_child_signal: leave
> process.c:5554: wait_reading_process_output: before pselect; max_desc = 1017
> 
> and then Emacs hangs.
> So wait_reading_process_output indeed first receives EINTR, loops
> around, and checks for the process_tick change before SIGCHLD is
> handled. By the time it reruns pselect it's too late.

OK, thanks.  So the scenario is that the SIGCHLD handler is called
_after_ pselect returns with EINTR, and by that time we already call
pselect again, whereas the logic in wait_reading_process_output is
based on the assumption that the signal handler will be called
_before_ pselect returns?  (We should have this in the comments to the
child_signal_* functions, so that we remember why we introduced this
machinery.)

First, is this a valid behavior, or is this a bug in the OS?  I
couldn't find any place where it is documented what Posix says about
the relative timing of calling the handler vs syscall returning with
EINTR -- is the behavior you see allowed?  I do see many places that
explicitly or implicitly assume that a syscall will return with EINTR
_after_ the signal handler returns.

If this _is_ valid behavior, I wonder whether we could somehow amend
the loop and its logic to handle this case without introducing
additional file descriptors to wait.  For example, when we get EINTR,
we could add another pselect call with a short timeout, so that the
handler has time to run and update the process tick before we loop
around.  Or maybe, when we get EINTR, we should loop over the
processes and see which one(s) have exited.  Such changes would
basically leave the overall logic intact, which is IMO better for more
complex use cases that involve other Lisp threads.

Or maybe the problem here is that more than one signal is involved, so
that one of them interrupts pselect before the other (our SIGCHLD)
gets delivered and calls the SIGCHLD handler?

In any case, did you consider what happens with the self-pipe method
when more than one Lisp thread is waiting in pselect?

> > The thing is, on Windows we can only wait on up to 64 handles (unless
> > we complicate the code with multilevel wait, that is), so every
> > unnecessary descriptor we need to wait on means we can support fewer
> > simultaneous subprocesses.  We are already limited to just 32
> > subprocesses, which is quite low a number.
> 
> OK, that's a good point, I didn't know about this limitation. I'll see
> that I can remove the pipe on Windows.

Just making those child_signal_* function be no-ops on MS-Windows, and
#ifedf'ing away that call to FD_SET of child_fd, should be okay, I
think.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 10:17                     ` Andreas Schwab
@ 2021-01-20 15:30                       ` Eli Zaretskii
  2021-01-20 15:48                         ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20 15:30 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: emacs-devel@gnu.org,  phst@google.com, p.stephani2@gmail.com
> Date: Wed, 20 Jan 2021 11:17:24 +0100
> 
> On Jan 20 2021, Eli Zaretskii wrote:
> 
> > If/when the signal arrives, the handler removes the file descriptor of the process from the set on which pselect will wait.  Isn't that sufficient to prevent us from waiting for a dead process?
> 
> Who removes it?

The SIGCHLD signal handler.  Or I thought it did, here:

	  /* If process has terminated, stop waiting for its output.  */
	  if (WIFSIGNALED (status) || WIFEXITED (status))
	    {
	      bool clear_desc_flag = 0;
	      p->alive = 0;
	      if (p->infd >= 0)
		clear_desc_flag = 1;

	      /* clear_desc_flag avoids a compiler bug in Microsoft C.  */
	      if (clear_desc_flag)
		delete_read_fd (p->infd);  <<<<<<<<<<<<<<<<<<<<<<<<<
	    }

But it looks like it only removes the process's file descriptor from
the set to be used for setting up arguments to pselect the _next_
iteration through the loop.

So maybe this is the problem we should solve: make the SIGCHLD handler
reset the bit of the process's output from the fd_set passed to
pselect.  Then, if SIGCHLD hits at any time before pselect is called,
it will not wait for output from that process.

> > If not, why not?
> 
> What happens when the signal occurs between setting up the set and
> calling pselect?

Yes, see above.

But it sounds like the situation shown in Philipp's log is different.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 15:30                       ` Eli Zaretskii
@ 2021-01-20 15:48                         ` Andreas Schwab
  2021-01-20 16:40                           ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20 15:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel

On Jan 20 2021, Eli Zaretskii wrote:

> So maybe this is the problem we should solve: make the SIGCHLD handler
> reset the bit of the process's output from the fd_set passed to
> pselect.

You cannot do that.  That creates races all over the place.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 15:48                         ` Andreas Schwab
@ 2021-01-20 16:40                           ` Eli Zaretskii
  2021-01-20 16:44                             ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20 16:40 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: emacs-devel@gnu.org,  phst@google.com,  p.stephani2@gmail.com
> Date: Wed, 20 Jan 2021 16:48:43 +0100
> 
> On Jan 20 2021, Eli Zaretskii wrote:
> 
> > So maybe this is the problem we should solve: make the SIGCHLD handler
> > reset the bit of the process's output from the fd_set passed to
> > pselect.
> 
> You cannot do that.  That creates races all over the place.

What kind of races, and where in our code? please elaborate.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 16:40                           ` Eli Zaretskii
@ 2021-01-20 16:44                             ` Andreas Schwab
  2021-01-20 17:03                               ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20 16:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel

On Jan 20 2021, Eli Zaretskii wrote:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: emacs-devel@gnu.org,  phst@google.com,  p.stephani2@gmail.com
>> Date: Wed, 20 Jan 2021 16:48:43 +0100
>> 
>> On Jan 20 2021, Eli Zaretskii wrote:
>> 
>> > So maybe this is the problem we should solve: make the SIGCHLD handler
>> > reset the bit of the process's output from the fd_set passed to
>> > pselect.
>> 
>> You cannot do that.  That creates races all over the place.
>
> What kind of races, and where in our code? please elaborate.

The signal can occur anytime.  When it interrupts the setup of the
pselect call, chaos will ensure.  The only way to avoid that is to block
the signal around the call.

Doing anything non-trivial in a signal handler is bound to problems.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 16:44                             ` Andreas Schwab
@ 2021-01-20 17:03                               ` Eli Zaretskii
  2021-01-20 17:30                                 ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-20 17:03 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: emacs-devel@gnu.org,  phst@google.com,  p.stephani2@gmail.com
> Date: Wed, 20 Jan 2021 17:44:59 +0100
> 
> >> You cannot do that.  That creates races all over the place.
> >
> > What kind of races, and where in our code? please elaborate.
> 
> The signal can occur anytime.  When it interrupts the setup of the
> pselect call, chaos will ensure.  The only way to avoid that is to block
> the signal around the call.

What I described would be in the signal handler.

> Doing anything non-trivial in a signal handler is bound to problems.

Yes, I know.  I think resetting a bit in a variable is more trivial
than, say, writing to a pipe, something that we do now.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 17:03                               ` Eli Zaretskii
@ 2021-01-20 17:30                                 ` Andreas Schwab
  0 siblings, 0 replies; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20 17:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel

On Jan 20 2021, Eli Zaretskii wrote:

> What I described would be in the signal handler.

That's exactly the problem.  Without synchronisation, it will fail
miserably.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-19 20:58           ` Philipp Stephani
@ 2021-01-20 18:07             ` Andreas Schwab
  0 siblings, 0 replies; 23+ messages in thread
From: Andreas Schwab @ 2021-01-20 18:07 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: Eli Zaretskii, Philipp Stephani, Emacs developers

On Jan 19 2021, Philipp Stephani wrote:

> Right. It looks like we're not doing that because we want to support
> systems that don't have pselect (and it's gnulib emulation can't be
> race-free).

For those that don't provide a proper pselect, the situation wouldn't be
worse than today.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-20 15:05           ` Eli Zaretskii
@ 2021-01-23 17:36             ` Philipp Stephani
  2021-01-23 18:23               ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Stephani @ 2021-01-23 17:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers

Am Mi., 20. Jan. 2021 um 16:05 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > > The thing is, on Windows we can only wait on up to 64 handles (unless
> > > we complicate the code with multilevel wait, that is), so every
> > > unnecessary descriptor we need to wait on means we can support fewer
> > > simultaneous subprocesses.  We are already limited to just 32
> > > subprocesses, which is quite low a number.
> >
> > OK, that's a good point, I didn't know about this limitation. I'll see
> > that I can remove the pipe on Windows.
>
> Just making those child_signal_* function be no-ops on MS-Windows, and
> #ifedf'ing away that call to FD_SET of child_fd, should be okay, I
> think.

I was about to take a stab at this, but looks like you beat me to it :-)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-23 17:36             ` Philipp Stephani
@ 2021-01-23 18:23               ` Eli Zaretskii
  2021-01-23 18:30                 ` Philipp Stephani
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2021-01-23 18:23 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: phst, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 23 Jan 2021 18:36:48 +0100
> Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com>
> > Just making those child_signal_* function be no-ops on MS-Windows, and
> > #ifedf'ing away that call to FD_SET of child_fd, should be okay, I
> > think.
> 
> I was about to take a stab at this, but looks like you beat me to it :-)

Yes, I had a few free minutes.

Thanks anyway.

P.S. What about child_signal_read, which doesn't seem to be used by
any configuration? should we delete it?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
  2021-01-23 18:23               ` Eli Zaretskii
@ 2021-01-23 18:30                 ` Philipp Stephani
  0 siblings, 0 replies; 23+ messages in thread
From: Philipp Stephani @ 2021-01-23 18:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers

Am Sa., 23. Jan. 2021 um 19:23 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:

>
> P.S. What about child_signal_read, which doesn't seem to be used by
> any configuration? should we delete it?

It's used in line 7205.



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-01-23 18:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210116184947.2105.45267@vcs0.savannah.gnu.org>
     [not found] ` <20210116184949.3C17C211A5@vcs0.savannah.gnu.org>
2021-01-19 15:10   ` scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect' Eli Zaretskii
2021-01-19 18:21     ` Philipp Stephani
2021-01-19 19:14       ` Eli Zaretskii
2021-01-19 20:22         ` Philipp Stephani
2021-01-20 15:05           ` Eli Zaretskii
2021-01-23 17:36             ` Philipp Stephani
2021-01-23 18:23               ` Eli Zaretskii
2021-01-23 18:30                 ` Philipp Stephani
2021-01-19 20:46         ` Andreas Schwab
2021-01-19 20:58           ` Philipp Stephani
2021-01-20 18:07             ` Andreas Schwab
2021-01-20  3:37           ` Eli Zaretskii
2021-01-20  8:37             ` Andreas Schwab
2021-01-20  8:52               ` Eli Zaretskii
2021-01-20  9:18                 ` Andreas Schwab
2021-01-20 10:14                   ` Eli Zaretskii
2021-01-20 10:17                     ` Andreas Schwab
2021-01-20 15:30                       ` Eli Zaretskii
2021-01-20 15:48                         ` Andreas Schwab
2021-01-20 16:40                           ` Eli Zaretskii
2021-01-20 16:44                             ` Andreas Schwab
2021-01-20 17:03                               ` Eli Zaretskii
2021-01-20 17:30                                 ` Andreas Schwab

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).