* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. [not found] ` <20210116184949.3C17C211A5@vcs0.savannah.gnu.org> @ 2021-01-19 15:10 ` Eli Zaretskii 2021-01-19 18:21 ` Philipp Stephani 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-19 15:10 UTC (permalink / raw) To: Philipp Stephani; +Cc: phst, emacs-devel > Date: Sat, 16 Jan 2021 13:49:49 -0500 (EST) > From: p.stephani2@gmail.com (Philipp Stephani) > > branch: scratch/sigchld-fd > commit 8f0ce42d3eb9b212424a4a25a376287ffc94a73e > Author: Philipp Stephani <phst@google.com> > Commit: Philipp Stephani <phst@google.com> > > Fix deadlock when receiving SIGCHLD during 'pselect'. > > If we receive and handle a SIGCHLD signal for a process while waiting > for that process, 'pselect' might never return. Instead, we have to > explicitly 'pselect' that the process status has changed. We do this > by writing to a pipe in the SIGCHLD handler and having > 'wait_reading_process_output' select on it. > > * src/process.c (child_signal_init): New helper function to create a > pipe for SIGCHLD notifications. > (child_signal_read, child_signal_notify): New helper functions to > read from/write to the child signal pipe. > (create_process): Initialize the child signal pipe on first use. > (handle_child_signal): Notify waiters that a process status has > changed. > (wait_reading_process_output): Make sure that we also catch > SIGCHLD/process status changes. > > * test/src/process-tests.el > (process-tests/fd-setsize-no-crash/make-process): Remove workaround, > which is no longer needed. Philipp, can you please elaborate about this changeset (which was in the meantime merged with master)? The comments you added mention some kind of deadlock, but don't describe what kind and in which situations it could happen, and without that it's hard to understand what problem(s) it tries to solve. In general, pselect is supposed to return with EINTR when SIGCHLD happoens while we are inside the call to pselect, and EINTR seems to be already handled by wait_reading_process_output. So I wonder why we need that additional "self-pipe" to be watched by pselect. In addition, AFAIU this pipe should not be needed on MS-Windows, where the pselect emulation waits on the sub-process handles together with the other file descriptors, and so gets awakened when a process exits or dies. But again, without knowing the exact situations against which this changeset tries to protect, it is hard to make a decision. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 15:10 ` scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect' Eli Zaretskii @ 2021-01-19 18:21 ` Philipp Stephani 2021-01-19 19:14 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Philipp Stephani @ 2021-01-19 18:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers Am Di., 19. Jan. 2021 um 16:09 Uhr schrieb Eli Zaretskii <eliz@gnu.org>: > > > Date: Sat, 16 Jan 2021 13:49:49 -0500 (EST) > > From: p.stephani2@gmail.com (Philipp Stephani) > > > > branch: scratch/sigchld-fd > > commit 8f0ce42d3eb9b212424a4a25a376287ffc94a73e > > Author: Philipp Stephani <phst@google.com> > > Commit: Philipp Stephani <phst@google.com> > > > > Fix deadlock when receiving SIGCHLD during 'pselect'. > > > > If we receive and handle a SIGCHLD signal for a process while waiting > > for that process, 'pselect' might never return. Instead, we have to > > explicitly 'pselect' that the process status has changed. We do this > > by writing to a pipe in the SIGCHLD handler and having > > 'wait_reading_process_output' select on it. > > > > * src/process.c (child_signal_init): New helper function to create a > > pipe for SIGCHLD notifications. > > (child_signal_read, child_signal_notify): New helper functions to > > read from/write to the child signal pipe. > > (create_process): Initialize the child signal pipe on first use. > > (handle_child_signal): Notify waiters that a process status has > > changed. > > (wait_reading_process_output): Make sure that we also catch > > SIGCHLD/process status changes. > > > > * test/src/process-tests.el > > (process-tests/fd-setsize-no-crash/make-process): Remove workaround, > > which is no longer needed. > > Philipp, can you please elaborate about this changeset (which was in > the meantime merged with master)? The comments you added mention some > kind of deadlock, but don't describe what kind and in which situations > it could happen, and without that it's hard to understand what > problem(s) it tries to solve. I had hoped that the fixed unit test and the commit description would be clear enough. The issue appears to be that accept-process-output frequently hangs (doesn't return) even though the process has already finished. That's what process-tests/fd-setsize-no-crash/make-process relies on. After some printf debugging, I concluded that wait_reading_process_output wouldn't be notified about the process status change in any way, and thus hang forever. A form like (sit-for 0.1) would trigger the process status change to be processed. > > In general, pselect is supposed to return with EINTR when SIGCHLD > happoens while we are inside the call to pselect, and EINTR seems to > be already handled by wait_reading_process_output. So I wonder why we > need that additional "self-pipe" to be watched by pselect. Yes, I'm wondering about that as well, but it's definitely the behavior I see. Before commiting to master, I ran the test process-tests/fd-setsize-no-crash/make-process multiple times with and without the commit, and the outcome was clear: without the commit accept-process-output would frequently hang, with the commit it never hangs. This is pure speculation, but I could imagine multiple things going on: - Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD. - Maybe EINTR is returned too early, before the signal handler got the chance to update the process status. > > In addition, AFAIU this pipe should not be needed on MS-Windows, where > the pselect emulation waits on the sub-process handles together with > the other file descriptors, and so gets awakened when a process exits > or dies. But again, without knowing the exact situations against > which this changeset tries to protect, it is hard to make a decision. > It's definitely not needed on Windows, which has a superior mechanism anyway (process handles are waitable objects in Windows). I opted to create the additional pipe on Windows as well - the costs should be small, and it keeps the code more consistent between the operating systems. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 18:21 ` Philipp Stephani @ 2021-01-19 19:14 ` Eli Zaretskii 2021-01-19 20:22 ` Philipp Stephani 2021-01-19 20:46 ` Andreas Schwab 0 siblings, 2 replies; 23+ messages in thread From: Eli Zaretskii @ 2021-01-19 19:14 UTC (permalink / raw) To: Philipp Stephani; +Cc: phst, emacs-devel > From: Philipp Stephani <p.stephani2@gmail.com> > Date: Tue, 19 Jan 2021 19:21:39 +0100 > Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com> > > In general, pselect is supposed to return with EINTR when SIGCHLD > > happoens while we are inside the call to pselect, and EINTR seems to > > be already handled by wait_reading_process_output. So I wonder why we > > need that additional "self-pipe" to be watched by pselect. > > Yes, I'm wondering about that as well, but it's definitely the > behavior I see. Before commiting to master, I ran the test > process-tests/fd-setsize-no-crash/make-process multiple times with and > without the commit, and the outcome was clear: without the commit > accept-process-output would frequently hang, with the commit it never > hangs. > This is pure speculation, but I could imagine multiple things going on: > - Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD. > - Maybe EINTR is returned too early, before the signal handler got the > chance to update the process status. I'd be happier if we had some direct evidence to these effects. I'd also be surprised to hear that pselect doesn't return with EINTR when SIGCHLD comes in. It is more likely that SIGCHLD is delivered before we call pselect, but if that is the case, we should be able to reliably detect that, I think. > > In addition, AFAIU this pipe should not be needed on MS-Windows, where > > the pselect emulation waits on the sub-process handles together with > > the other file descriptors, and so gets awakened when a process exits > > or dies. But again, without knowing the exact situations against > > which this changeset tries to protect, it is hard to make a decision. > > It's definitely not needed on Windows, which has a superior mechanism > anyway (process handles are waitable objects in Windows). I opted to > create the additional pipe on Windows as well - the costs should be > small, and it keeps the code more consistent between the operating > systems. The thing is, on Windows we can only wait on up to 64 handles (unless we complicate the code with multilevel wait, that is), so every unnecessary descriptor we need to wait on means we can support fewer simultaneous subprocesses. We are already limited to just 32 subprocesses, which is quite low a number. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 19:14 ` Eli Zaretskii @ 2021-01-19 20:22 ` Philipp Stephani 2021-01-20 15:05 ` Eli Zaretskii 2021-01-19 20:46 ` Andreas Schwab 1 sibling, 1 reply; 23+ messages in thread From: Philipp Stephani @ 2021-01-19 20:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers Am Di., 19. Jan. 2021 um 20:14 Uhr schrieb Eli Zaretskii <eliz@gnu.org>: > > > From: Philipp Stephani <p.stephani2@gmail.com> > > Date: Tue, 19 Jan 2021 19:21:39 +0100 > > Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com> > > > In general, pselect is supposed to return with EINTR when SIGCHLD > > > happoens while we are inside the call to pselect, and EINTR seems to > > > be already handled by wait_reading_process_output. So I wonder why we > > > need that additional "self-pipe" to be watched by pselect. > > > > Yes, I'm wondering about that as well, but it's definitely the > > behavior I see. Before commiting to master, I ran the test > > process-tests/fd-setsize-no-crash/make-process multiple times with and > > without the commit, and the outcome was clear: without the commit > > accept-process-output would frequently hang, with the commit it never > > hangs. > > This is pure speculation, but I could imagine multiple things going on: > > - Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD. > > - Maybe EINTR is returned too early, before the signal handler got the > > chance to update the process status. > > I'd be happier if we had some direct evidence to these effects. I'd > also be surprised to hear that pselect doesn't return with EINTR when > SIGCHLD comes in. It is more likely that SIGCHLD is delivered before > we call pselect, but if that is the case, we should be able to > reliably detect that, I think. So I've added a ton of logging to process.c, and the series of events I observe (without the patch) is as follows (line numbers are approximate due to the logging statements): process.c:4729: Faccept_process_output: enter process.c:5139: wait_reading_process_output: enter process.c:5193: wait_reading_process_output: outer loop process.c:5322: wait_reading_process_output: update_tick = 261, process_tick = 261 process.c:5554: wait_reading_process_output: before pselect; max_desc = 1019 process.c:5601: wait_reading_process_output: after pselect: nfds = -1 process.c:5641: wait_reading_process_output: EINTR process.c:5193: wait_reading_process_output: outer loop process.c:5322: wait_reading_process_output: update_tick = 261, process_tick = 261 process.c:7189: handle_child_signal: enter process.c:7234: handle_child_signal: process test 5: change status to 0; new process_tick = 262 process.c:144: handle_child_signal: leave process.c:5554: wait_reading_process_output: before pselect; max_desc = 1017 and then Emacs hangs. So wait_reading_process_output indeed first receives EINTR, loops around, and checks for the process_tick change before SIGCHLD is handled. By the time it reruns pselect it's too late. > > > > In addition, AFAIU this pipe should not be needed on MS-Windows, where > > > the pselect emulation waits on the sub-process handles together with > > > the other file descriptors, and so gets awakened when a process exits > > > or dies. But again, without knowing the exact situations against > > > which this changeset tries to protect, it is hard to make a decision. > > > > It's definitely not needed on Windows, which has a superior mechanism > > anyway (process handles are waitable objects in Windows). I opted to > > create the additional pipe on Windows as well - the costs should be > > small, and it keeps the code more consistent between the operating > > systems. > > The thing is, on Windows we can only wait on up to 64 handles (unless > we complicate the code with multilevel wait, that is), so every > unnecessary descriptor we need to wait on means we can support fewer > simultaneous subprocesses. We are already limited to just 32 > subprocesses, which is quite low a number. OK, that's a good point, I didn't know about this limitation. I'll see that I can remove the pipe on Windows. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 20:22 ` Philipp Stephani @ 2021-01-20 15:05 ` Eli Zaretskii 2021-01-23 17:36 ` Philipp Stephani 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 15:05 UTC (permalink / raw) To: Philipp Stephani; +Cc: phst, emacs-devel > From: Philipp Stephani <p.stephani2@gmail.com> > Date: Tue, 19 Jan 2021 21:22:04 +0100 > Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com> > process.c:4729: Faccept_process_output: enter > process.c:5139: wait_reading_process_output: enter > process.c:5193: wait_reading_process_output: outer loop > process.c:5322: wait_reading_process_output: update_tick = 261, > process_tick = 261 > process.c:5554: wait_reading_process_output: before pselect; max_desc = 1019 > process.c:5601: wait_reading_process_output: after pselect: nfds = -1 > process.c:5641: wait_reading_process_output: EINTR > process.c:5193: wait_reading_process_output: outer loop > process.c:5322: wait_reading_process_output: update_tick = 261, > process_tick = 261 > process.c:7189: handle_child_signal: enter > process.c:7234: handle_child_signal: process test 5: change status to > 0; new process_tick = 262 > process.c:144: handle_child_signal: leave > process.c:5554: wait_reading_process_output: before pselect; max_desc = 1017 > > and then Emacs hangs. > So wait_reading_process_output indeed first receives EINTR, loops > around, and checks for the process_tick change before SIGCHLD is > handled. By the time it reruns pselect it's too late. OK, thanks. So the scenario is that the SIGCHLD handler is called _after_ pselect returns with EINTR, and by that time we already call pselect again, whereas the logic in wait_reading_process_output is based on the assumption that the signal handler will be called _before_ pselect returns? (We should have this in the comments to the child_signal_* functions, so that we remember why we introduced this machinery.) First, is this a valid behavior, or is this a bug in the OS? I couldn't find any place where it is documented what Posix says about the relative timing of calling the handler vs syscall returning with EINTR -- is the behavior you see allowed? I do see many places that explicitly or implicitly assume that a syscall will return with EINTR _after_ the signal handler returns. If this _is_ valid behavior, I wonder whether we could somehow amend the loop and its logic to handle this case without introducing additional file descriptors to wait. For example, when we get EINTR, we could add another pselect call with a short timeout, so that the handler has time to run and update the process tick before we loop around. Or maybe, when we get EINTR, we should loop over the processes and see which one(s) have exited. Such changes would basically leave the overall logic intact, which is IMO better for more complex use cases that involve other Lisp threads. Or maybe the problem here is that more than one signal is involved, so that one of them interrupts pselect before the other (our SIGCHLD) gets delivered and calls the SIGCHLD handler? In any case, did you consider what happens with the self-pipe method when more than one Lisp thread is waiting in pselect? > > The thing is, on Windows we can only wait on up to 64 handles (unless > > we complicate the code with multilevel wait, that is), so every > > unnecessary descriptor we need to wait on means we can support fewer > > simultaneous subprocesses. We are already limited to just 32 > > subprocesses, which is quite low a number. > > OK, that's a good point, I didn't know about this limitation. I'll see > that I can remove the pipe on Windows. Just making those child_signal_* function be no-ops on MS-Windows, and #ifedf'ing away that call to FD_SET of child_fd, should be okay, I think. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 15:05 ` Eli Zaretskii @ 2021-01-23 17:36 ` Philipp Stephani 2021-01-23 18:23 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Philipp Stephani @ 2021-01-23 17:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers Am Mi., 20. Jan. 2021 um 16:05 Uhr schrieb Eli Zaretskii <eliz@gnu.org>: > > > > The thing is, on Windows we can only wait on up to 64 handles (unless > > > we complicate the code with multilevel wait, that is), so every > > > unnecessary descriptor we need to wait on means we can support fewer > > > simultaneous subprocesses. We are already limited to just 32 > > > subprocesses, which is quite low a number. > > > > OK, that's a good point, I didn't know about this limitation. I'll see > > that I can remove the pipe on Windows. > > Just making those child_signal_* function be no-ops on MS-Windows, and > #ifedf'ing away that call to FD_SET of child_fd, should be okay, I > think. I was about to take a stab at this, but looks like you beat me to it :-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-23 17:36 ` Philipp Stephani @ 2021-01-23 18:23 ` Eli Zaretskii 2021-01-23 18:30 ` Philipp Stephani 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-23 18:23 UTC (permalink / raw) To: Philipp Stephani; +Cc: phst, emacs-devel > From: Philipp Stephani <p.stephani2@gmail.com> > Date: Sat, 23 Jan 2021 18:36:48 +0100 > Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com> > > Just making those child_signal_* function be no-ops on MS-Windows, and > > #ifedf'ing away that call to FD_SET of child_fd, should be okay, I > > think. > > I was about to take a stab at this, but looks like you beat me to it :-) Yes, I had a few free minutes. Thanks anyway. P.S. What about child_signal_read, which doesn't seem to be used by any configuration? should we delete it? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-23 18:23 ` Eli Zaretskii @ 2021-01-23 18:30 ` Philipp Stephani 0 siblings, 0 replies; 23+ messages in thread From: Philipp Stephani @ 2021-01-23 18:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Philipp Stephani, Emacs developers Am Sa., 23. Jan. 2021 um 19:23 Uhr schrieb Eli Zaretskii <eliz@gnu.org>: > > P.S. What about child_signal_read, which doesn't seem to be used by > any configuration? should we delete it? It's used in line 7205. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 19:14 ` Eli Zaretskii 2021-01-19 20:22 ` Philipp Stephani @ 2021-01-19 20:46 ` Andreas Schwab 2021-01-19 20:58 ` Philipp Stephani 2021-01-20 3:37 ` Eli Zaretskii 1 sibling, 2 replies; 23+ messages in thread From: Andreas Schwab @ 2021-01-19 20:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, Philipp Stephani, emacs-devel On Jan 19 2021, Eli Zaretskii wrote: > I'd be happier if we had some direct evidence to these effects. I'd > also be surprised to hear that pselect doesn't return with EINTR when > SIGCHLD comes in. It is more likely that SIGCHLD is delivered before > we call pselect, but if that is the case, we should be able to > reliably detect that, I think. If you want reliable detection of SIGCHLD, you need to block the signal around pselect and let pselect unblock it, atomically. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 20:46 ` Andreas Schwab @ 2021-01-19 20:58 ` Philipp Stephani 2021-01-20 18:07 ` Andreas Schwab 2021-01-20 3:37 ` Eli Zaretskii 1 sibling, 1 reply; 23+ messages in thread From: Philipp Stephani @ 2021-01-19 20:58 UTC (permalink / raw) To: Andreas Schwab; +Cc: Eli Zaretskii, Philipp Stephani, Emacs developers Am Di., 19. Jan. 2021 um 21:46 Uhr schrieb Andreas Schwab <schwab@linux-m68k.org>: > > On Jan 19 2021, Eli Zaretskii wrote: > > > I'd be happier if we had some direct evidence to these effects. I'd > > also be surprised to hear that pselect doesn't return with EINTR when > > SIGCHLD comes in. It is more likely that SIGCHLD is delivered before > > we call pselect, but if that is the case, we should be able to > > reliably detect that, I think. > > If you want reliable detection of SIGCHLD, you need to block the signal > around pselect and let pselect unblock it, atomically. Right. It looks like we're not doing that because we want to support systems that don't have pselect (and it's gnulib emulation can't be race-free). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 20:58 ` Philipp Stephani @ 2021-01-20 18:07 ` Andreas Schwab 0 siblings, 0 replies; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 18:07 UTC (permalink / raw) To: Philipp Stephani; +Cc: Eli Zaretskii, Philipp Stephani, Emacs developers On Jan 19 2021, Philipp Stephani wrote: > Right. It looks like we're not doing that because we want to support > systems that don't have pselect (and it's gnulib emulation can't be > race-free). For those that don't provide a proper pselect, the situation wouldn't be worse than today. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-19 20:46 ` Andreas Schwab 2021-01-19 20:58 ` Philipp Stephani @ 2021-01-20 3:37 ` Eli Zaretskii 2021-01-20 8:37 ` Andreas Schwab 1 sibling, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 3:37 UTC (permalink / raw) To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: Philipp Stephani <p.stephani2@gmail.com>, phst@google.com, > emacs-devel@gnu.org > Date: Tue, 19 Jan 2021 21:46:46 +0100 > > On Jan 19 2021, Eli Zaretskii wrote: > > > I'd be happier if we had some direct evidence to these effects. I'd > > also be surprised to hear that pselect doesn't return with EINTR when > > SIGCHLD comes in. It is more likely that SIGCHLD is delivered before > > we call pselect, but if that is the case, we should be able to > > reliably detect that, I think. > > If you want reliable detection of SIGCHLD, you need to block the signal > around pselect and let pselect unblock it, atomically. AFAIU, this is an issue for programs that install a SIGCHLD handler immediately before calling pselect. But Emacs has a SIGCHLD handler installed at all times, so if the signal hits outside of the pselect call, we should be able to detect that reliably. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 3:37 ` Eli Zaretskii @ 2021-01-20 8:37 ` Andreas Schwab 2021-01-20 8:52 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 8:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel On Jan 20 2021, Eli Zaretskii wrote: > AFAIU, this is an issue for programs that install a SIGCHLD handler > immediately before calling pselect. But Emacs has a SIGCHLD handler > installed at all times, so if the signal hits outside of the pselect > call, we should be able to detect that reliably. No, the race between checking for the signal and entering pselect still exists. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 8:37 ` Andreas Schwab @ 2021-01-20 8:52 ` Eli Zaretskii 2021-01-20 9:18 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 8:52 UTC (permalink / raw) To: emacs-devel, Andreas Schwab; +Cc: phst, p.stephani2 On January 20, 2021 10:37:31 AM GMT+02:00, Andreas Schwab <schwab@linux-m68k.org> wrote: > On Jan 20 2021, Eli Zaretskii wrote: > > > AFAIU, this is an issue for programs that install a SIGCHLD handler > > immediately before calling pselect. But Emacs has a SIGCHLD handler > > installed at all times, so if the signal hits outside of the pselect > > call, we should be able to detect that reliably. > > No, the race between checking for the signal and entering pselect > still > exists. Please tell more, especially what is meant by "checking for the signal". We have a handler installed, so who is or should be checking for it? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 8:52 ` Eli Zaretskii @ 2021-01-20 9:18 ` Andreas Schwab 2021-01-20 10:14 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 9:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel On Jan 20 2021, Eli Zaretskii wrote: > Please tell more, especially what is meant by "checking for the signal". We have a handler installed, so who is or should be checking for it? The check that the signal occured and what to pass to pselect so it doesn't block waiting for the condition that just occured. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 9:18 ` Andreas Schwab @ 2021-01-20 10:14 ` Eli Zaretskii 2021-01-20 10:17 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 10:14 UTC (permalink / raw) To: emacs-devel, Andreas Schwab; +Cc: phst, p.stephani2 On January 20, 2021 11:18:19 AM GMT+02:00, Andreas Schwab <schwab@linux-m68k.org> wrote: > On Jan 20 2021, Eli Zaretskii wrote: > > > Please tell more, especially what is meant by "checking for the > signal". We have a handler installed, so who is or should be checking > for it? > > The check that the signal occured and what to pass to pselect so it > doesn't block waiting for the condition that just occured. If/when the signal arrives, the handler removes the file descriptor of the process from the set on which pselect will wait. Isn't that sufficient to prevent us from waiting for a dead process? If not, why not? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 10:14 ` Eli Zaretskii @ 2021-01-20 10:17 ` Andreas Schwab 2021-01-20 15:30 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 10:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel On Jan 20 2021, Eli Zaretskii wrote: > If/when the signal arrives, the handler removes the file descriptor of the process from the set on which pselect will wait. Isn't that sufficient to prevent us from waiting for a dead process? Who removes it? > If not, why not? What happens when the signal occurs between setting up the set and calling pselect? Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 10:17 ` Andreas Schwab @ 2021-01-20 15:30 ` Eli Zaretskii 2021-01-20 15:48 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 15:30 UTC (permalink / raw) To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: emacs-devel@gnu.org, phst@google.com, p.stephani2@gmail.com > Date: Wed, 20 Jan 2021 11:17:24 +0100 > > On Jan 20 2021, Eli Zaretskii wrote: > > > If/when the signal arrives, the handler removes the file descriptor of the process from the set on which pselect will wait. Isn't that sufficient to prevent us from waiting for a dead process? > > Who removes it? The SIGCHLD signal handler. Or I thought it did, here: /* If process has terminated, stop waiting for its output. */ if (WIFSIGNALED (status) || WIFEXITED (status)) { bool clear_desc_flag = 0; p->alive = 0; if (p->infd >= 0) clear_desc_flag = 1; /* clear_desc_flag avoids a compiler bug in Microsoft C. */ if (clear_desc_flag) delete_read_fd (p->infd); <<<<<<<<<<<<<<<<<<<<<<<<< } But it looks like it only removes the process's file descriptor from the set to be used for setting up arguments to pselect the _next_ iteration through the loop. So maybe this is the problem we should solve: make the SIGCHLD handler reset the bit of the process's output from the fd_set passed to pselect. Then, if SIGCHLD hits at any time before pselect is called, it will not wait for output from that process. > > If not, why not? > > What happens when the signal occurs between setting up the set and > calling pselect? Yes, see above. But it sounds like the situation shown in Philipp's log is different. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 15:30 ` Eli Zaretskii @ 2021-01-20 15:48 ` Andreas Schwab 2021-01-20 16:40 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 15:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel On Jan 20 2021, Eli Zaretskii wrote: > So maybe this is the problem we should solve: make the SIGCHLD handler > reset the bit of the process's output from the fd_set passed to > pselect. You cannot do that. That creates races all over the place. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 15:48 ` Andreas Schwab @ 2021-01-20 16:40 ` Eli Zaretskii 2021-01-20 16:44 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 16:40 UTC (permalink / raw) To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: emacs-devel@gnu.org, phst@google.com, p.stephani2@gmail.com > Date: Wed, 20 Jan 2021 16:48:43 +0100 > > On Jan 20 2021, Eli Zaretskii wrote: > > > So maybe this is the problem we should solve: make the SIGCHLD handler > > reset the bit of the process's output from the fd_set passed to > > pselect. > > You cannot do that. That creates races all over the place. What kind of races, and where in our code? please elaborate. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 16:40 ` Eli Zaretskii @ 2021-01-20 16:44 ` Andreas Schwab 2021-01-20 17:03 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 16:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel On Jan 20 2021, Eli Zaretskii wrote: >> From: Andreas Schwab <schwab@linux-m68k.org> >> Cc: emacs-devel@gnu.org, phst@google.com, p.stephani2@gmail.com >> Date: Wed, 20 Jan 2021 16:48:43 +0100 >> >> On Jan 20 2021, Eli Zaretskii wrote: >> >> > So maybe this is the problem we should solve: make the SIGCHLD handler >> > reset the bit of the process's output from the fd_set passed to >> > pselect. >> >> You cannot do that. That creates races all over the place. > > What kind of races, and where in our code? please elaborate. The signal can occur anytime. When it interrupts the setup of the pselect call, chaos will ensure. The only way to avoid that is to block the signal around the call. Doing anything non-trivial in a signal handler is bound to problems. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 16:44 ` Andreas Schwab @ 2021-01-20 17:03 ` Eli Zaretskii 2021-01-20 17:30 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2021-01-20 17:03 UTC (permalink / raw) To: Andreas Schwab; +Cc: phst, p.stephani2, emacs-devel > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: emacs-devel@gnu.org, phst@google.com, p.stephani2@gmail.com > Date: Wed, 20 Jan 2021 17:44:59 +0100 > > >> You cannot do that. That creates races all over the place. > > > > What kind of races, and where in our code? please elaborate. > > The signal can occur anytime. When it interrupts the setup of the > pselect call, chaos will ensure. The only way to avoid that is to block > the signal around the call. What I described would be in the signal handler. > Doing anything non-trivial in a signal handler is bound to problems. Yes, I know. I think resetting a bit in a variable is more trivial than, say, writing to a pipe, something that we do now. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'. 2021-01-20 17:03 ` Eli Zaretskii @ 2021-01-20 17:30 ` Andreas Schwab 0 siblings, 0 replies; 23+ messages in thread From: Andreas Schwab @ 2021-01-20 17:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: phst, p.stephani2, emacs-devel On Jan 20 2021, Eli Zaretskii wrote: > What I described would be in the signal handler. That's exactly the problem. Without synchronisation, it will fail miserably. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2021-01-23 18:30 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20210116184947.2105.45267@vcs0.savannah.gnu.org> [not found] ` <20210116184949.3C17C211A5@vcs0.savannah.gnu.org> 2021-01-19 15:10 ` scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect' Eli Zaretskii 2021-01-19 18:21 ` Philipp Stephani 2021-01-19 19:14 ` Eli Zaretskii 2021-01-19 20:22 ` Philipp Stephani 2021-01-20 15:05 ` Eli Zaretskii 2021-01-23 17:36 ` Philipp Stephani 2021-01-23 18:23 ` Eli Zaretskii 2021-01-23 18:30 ` Philipp Stephani 2021-01-19 20:46 ` Andreas Schwab 2021-01-19 20:58 ` Philipp Stephani 2021-01-20 18:07 ` Andreas Schwab 2021-01-20 3:37 ` Eli Zaretskii 2021-01-20 8:37 ` Andreas Schwab 2021-01-20 8:52 ` Eli Zaretskii 2021-01-20 9:18 ` Andreas Schwab 2021-01-20 10:14 ` Eli Zaretskii 2021-01-20 10:17 ` Andreas Schwab 2021-01-20 15:30 ` Eli Zaretskii 2021-01-20 15:48 ` Andreas Schwab 2021-01-20 16:40 ` Eli Zaretskii 2021-01-20 16:44 ` Andreas Schwab 2021-01-20 17:03 ` Eli Zaretskii 2021-01-20 17:30 ` Andreas Schwab
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.