all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Backport fix for bug #8855?
@ 2012-11-21 22:33 Ken Brown
  2012-11-21 23:50 ` bug#8855: Fix backported to Emacs 24 Paul Eggert
  2012-11-21 23:54 ` Backport fix for bug #8855? Paul Eggert
  0 siblings, 2 replies; 11+ messages in thread
From: Ken Brown @ 2012-11-21 22:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs

Is there any chance of backporting the fix for bug #8855 to the emacs-24 
branch?  The bug is pretty annoying, and there haven't been any problems 
reported with the fix, as far as I know.

Ken



^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-21 22:33 Backport fix for bug #8855? Ken Brown
@ 2012-11-21 23:50 ` Paul Eggert
  2012-11-22  3:49   ` Eli Zaretskii
  2012-11-22 15:13   ` Stefan Monnier
  2012-11-21 23:54 ` Backport fix for bug #8855? Paul Eggert
  1 sibling, 2 replies; 11+ messages in thread
From: Paul Eggert @ 2012-11-21 23:50 UTC (permalink / raw)
  To: 8855

[-- Attachment #1: Type: text/plain, Size: 91 bytes --]

Attached is the fixed backported to the emacs-24 branch
(bzr 110936) if there's interest.


[-- Attachment #2: g_spawn_sync.txt --]
[-- Type: text/plain, Size: 11638 bytes --]

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-11-21 19:28:14 +0000
+++ src/ChangeLog	2012-11-21 23:43:47 +0000
@@ -1,3 +1,34 @@
+2012-11-21  Paul Eggert  <eggert@cs.ucla.edu>
+
+	Fix a race condition that causes Emacs to mess up glib (Bug#8855).
+	The symptom is a diagnostic "GLib-WARNING **: In call to
+	g_spawn_sync(), exit status of a child process was requested but
+	SIGCHLD action was set to SIG_IGN and ECHILD was received by
+	waitpid(), so exit status can't be returned."  The diagnostic
+	is partly wrong, as the SIGCHLD action is not set to SIG_IGN.
+	The real bug is a race condition between Emacs and glib: Emacs
+	does a waitpid (-1, ...) and reaps glib's subprocess by mistake,
+	so that glib can't find it.  Work around the bug by invoking
+	waitpid only on subprocesses that Emacs itself creates.
+	* process.c (create_process, record_child_status_change):
+	Don't use special value -1 in pid field, as the caller now must
+	know the pid rather than having the callee infer it.  The
+	inference was sometimes incorrect anyway, due to another race.
+	(create_process): Set new 'alive' member if child is created.
+	(process_status_retrieved): New function.
+	(record_child_status_change): Use it.
+	Accept negative 1st argument, which means to wait for the
+	processes that Emacs already knows about.  Move special-case code
+	for DOS_NT (which lacks WNOHANG) here, from caller.  Keep track of
+	processes that have already been waited for, by testing and
+	clearing new 'alive' member.
+	(CAN_HANDLE_MULTIPLE_CHILDREN): Remove, as record_child_status_change
+	now does this internally.
+	(handle_child_signal): Let record_child_status_change do all
+	the work, since we do not want to reap all exited child processes,
+	only the child processes that Emacs itself created.
+	* process.h (Lisp_Process): New boolean member 'alive'.
+
 2012-11-21  Eli Zaretskii  <eliz@gnu.org>
 
 	* indent.c (Fvertical_motion): If the starting position is covered

=== modified file 'src/process.c'
--- src/process.c	2012-10-31 17:27:29 +0000
+++ src/process.c	2012-11-21 23:43:47 +0000
@@ -130,6 +130,10 @@
 		       EMACS_TIME *, void *);
 #endif
 
+/* This is for DOS_NT ports.  FIXME: Remove this old portability cruft
+   by having DOS_NT ports implement waitpid instead of wait.  Nowadays
+   POSIXish hosts all define waitpid, WNOHANG, and WUNTRACED, as these
+   have been standard since POSIX.1-1988.  */
 #ifndef WNOHANG
 # undef waitpid
 # define waitpid(pid, status, options) wait (status)
@@ -795,9 +799,8 @@
 #ifdef SIGCHLD
 /* Fdelete_process promises to immediately forget about the process, but in
    reality, Emacs needs to remember those processes until they have been
-   treated by the SIGCHLD handler; otherwise this handler would consider the
-   process as being synchronous and say that the synchronous process is
-   dead.  */
+   treated by the SIGCHLD handler and waitpid has been invoked on them;
+   otherwise they might fill up the kernel's process table.  */
 static Lisp_Object deleted_pid_list;
 #endif
 
@@ -1704,16 +1707,7 @@
   if (inchannel > max_process_desc)
     max_process_desc = inchannel;
 
-  /* Until we store the proper pid, enable the SIGCHLD handler
-     to recognize an unknown pid as standing for this process.
-     It is very important not to let this `marker' value stay
-     in the table after this function has returned; if it does
-     it might cause call-process to hang and subsequent asynchronous
-     processes to get their return values scrambled.  */
-  XPROCESS (process)->pid = -1;
-
-  /* This must be called after the above line because it may signal an
-     error. */
+  /* This may signal an error. */
   setup_process_coding_systems (process);
 
   encoded_current_dir = ENCODE_FILE (current_dir);
@@ -1880,6 +1874,8 @@
 #endif
 
   XPROCESS (process)->pid = pid;
+  if (0 <= pid)
+    XPROCESS (process)->alive = 1;
 
   /* Stop blocking signals in the parent.  */
 #ifdef SIGCHLD
@@ -6273,9 +6269,35 @@
   return process;
 }
 \f
-/* On receipt of a signal that a child status has changed, loop asking
-   about children with changed statuses until the system says there
-   are no more.
+/* If the status of the process DESIRED has changed, return true and
+   set *STATUS to its exit status; otherwise, return false.
+   If HAVE is nonnegative, assume that HAVE = waitpid (HAVE, STATUS, ...)
+   has already been invoked, and do not invoke waitpid again.  */
+
+static bool
+process_status_retrieved (pid_t desired, pid_t have, int *status)
+{
+  if (have < 0)
+    {
+      /* Invoke waitpid only with a known process ID; do not invoke
+	 waitpid with a nonpositive argument.  Otherwise, Emacs might
+	 reap an unwanted process by mistake.  For example, invoking
+	 waitpid (-1, ...) can mess up glib by reaping glib's subprocesses,
+	 so that another thread running glib won't find them.  */
+      do
+	have = waitpid (desired, status, WNOHANG | WUNTRACED);
+      while (have < 0 && errno == EINTR);
+    }
+
+  return have == desired;
+}
+
+/* If PID is nonnegative, the child process PID with wait status W has
+   changed its status; record this and return true.
+
+   If PID is negative, ignore W, and look for known child processes
+   of Emacs whose status have changed.  For each one found, record its new
+   status.
 
    All we do is change the status; we do not run sentinels or print
    notifications.  That is saved for the next time keyboard input is
@@ -6298,13 +6320,23 @@
    ** Malloc WARNING: This should never call malloc either directly or
    indirectly; if it does, that is a bug  */
 
-/* Record the changed status of the child process PID with wait status W.  */
 void
 record_child_status_change (pid_t pid, int w)
 {
 #ifdef SIGCHLD
-  Lisp_Object proc;
-  struct Lisp_Process *p;
+
+# ifdef WNOHANG
+  /* On POSIXish hosts, record at most one child only if we already
+     know one child that has exited.  */
+  bool record_at_most_one_child = 0 <= pid;
+# else
+  /* On DOS_NT (the only porting target that lacks WNOHANG),
+     record the status of at most one child process, since the SIGCHLD
+     handler must return right away.  If any more processes want to
+     signal us, we will get another signal.  */
+  bool record_at_most_one_child = 1;
+# endif
+
   Lisp_Object tail;
 
   /* Find the process that signaled us, and record its status.  */
@@ -6312,68 +6344,69 @@
   /* The process can have been deleted by Fdelete_process.  */
   for (tail = deleted_pid_list; CONSP (tail); tail = XCDR (tail))
     {
+      bool all_pids_are_fixnums
+	= (MOST_NEGATIVE_FIXNUM <= TYPE_MINIMUM (pid_t)
+	   && TYPE_MAXIMUM (pid_t) <= MOST_POSITIVE_FIXNUM);
       Lisp_Object xpid = XCAR (tail);
-      if ((INTEGERP (xpid) && pid == XINT (xpid))
-	  || (FLOATP (xpid) && pid == XFLOAT_DATA (xpid)))
+      if (all_pids_are_fixnums ? INTEGERP (xpid) : NUMBERP (xpid))
 	{
-	  XSETCAR (tail, Qnil);
-	  return;
+	  pid_t deleted_pid;
+	  if (INTEGERP (xpid))
+	    deleted_pid = XINT (xpid);
+	  else
+	    deleted_pid = XFLOAT_DATA (xpid);
+	  if (process_status_retrieved (deleted_pid, pid, &w))
+	    {
+	      XSETCAR (tail, Qnil);
+	      if (record_at_most_one_child)
+		return;
+	    }
 	}
     }
 
   /* Otherwise, if it is asynchronous, it is in Vprocess_alist.  */
-  p = 0;
   for (tail = Vprocess_alist; CONSP (tail); tail = XCDR (tail))
     {
-      proc = XCDR (XCAR (tail));
-      p = XPROCESS (proc);
-      if (EQ (p->type, Qreal) && p->pid == pid)
-	break;
-      p = 0;
-    }
-
-  /* Look for an asynchronous process whose pid hasn't been filled
-     in yet.  */
-  if (! p)
-    for (tail = Vprocess_alist; CONSP (tail); tail = XCDR (tail))
-      {
-	proc = XCDR (XCAR (tail));
-	p = XPROCESS (proc);
-	if (p->pid == -1)
-	  break;
-	p = 0;
-      }
-
-  /* Change the status of the process that was found.  */
-  if (p)
-    {
-      int clear_desc_flag = 0;
-
-      p->tick = ++process_tick;
-      p->raw_status = w;
-      p->raw_status_new = 1;
-
-      /* If process has terminated, stop waiting for its output.  */
-      if ((WIFSIGNALED (w) || WIFEXITED (w))
-	  && p->infd >= 0)
-	clear_desc_flag = 1;
-
-      /* We use clear_desc_flag to avoid a compiler bug in Microsoft C.  */
-      if (clear_desc_flag)
+      Lisp_Object proc = XCDR (XCAR (tail));
+      struct Lisp_Process *p = XPROCESS (proc);
+      if (p->alive && process_status_retrieved (p->pid, pid, &w))
 	{
-	  FD_CLR (p->infd, &input_wait_mask);
-	  FD_CLR (p->infd, &non_keyboard_wait_mask);
+	  /* Change the status of the process that was found.  */
+	  p->tick = ++process_tick;
+	  p->raw_status = w;
+	  p->raw_status_new = 1;
+
+	  /* If process has terminated, stop waiting for its output.  */
+	  if (WIFSIGNALED (w) || WIFEXITED (w))
+	    {
+	      int clear_desc_flag = 0;
+	      p->alive = 0;
+	      if (p->infd >= 0)
+		clear_desc_flag = 1;
+
+	      /* clear_desc_flag avoids a compiler bug in Microsoft C.  */
+	      if (clear_desc_flag)
+		{
+		  FD_CLR (p->infd, &input_wait_mask);
+		  FD_CLR (p->infd, &non_keyboard_wait_mask);
+		}
+	    }
+
+	  /* Tell wait_reading_process_output that it needs to wake up and
+	     look around.  */
+	  if (input_available_clear_time)
+	    *input_available_clear_time = make_emacs_time (0, 0);
+
+	  if (record_at_most_one_child)
+	    return;
 	}
-
-      /* Tell wait_reading_process_output that it needs to wake up and
-	 look around.  */
-      if (input_available_clear_time)
-	*input_available_clear_time = make_emacs_time (0, 0);
     }
-  /* There was no asynchronous process found for that pid: we have
-     a synchronous process.  */
-  else
+
+  if (0 <= pid)
     {
+      /* The caller successfully waited for a pid but no asynchronous
+	 process was found for it, so this is a synchronous process.  */
+
       synch_process_alive = 0;
 
       /* Report the status of the synchronous process.  */
@@ -6392,38 +6425,10 @@
 
 #ifdef SIGCHLD
 
-/* On some systems, the SIGCHLD handler must return right away.  If
-   any more processes want to signal us, we will get another signal.
-   Otherwise, loop around to use up all the processes that have
-   something to tell us.  */
-#if (defined WINDOWSNT \
-     || (defined USG && !defined GNU_LINUX \
-	 && !(defined HPUX && defined WNOHANG)))
-enum { CAN_HANDLE_MULTIPLE_CHILDREN = 0 };
-#else
-enum { CAN_HANDLE_MULTIPLE_CHILDREN = 1 };
-#endif
-
 static void
 handle_child_signal (int sig)
 {
-  do
-    {
-      pid_t pid;
-      int status;
-
-      do
-	pid = waitpid (-1, &status, WNOHANG | WUNTRACED);
-      while (pid < 0 && errno == EINTR);
-
-      /* PID == 0 means no processes found, PID == -1 means a real failure.
-	 Either way, we have done all our job.  */
-      if (pid <= 0)
-	break;
-
-      record_child_status_change (pid, status);
-    }
-  while (CAN_HANDLE_MULTIPLE_CHILDREN);
+  record_child_status_change (-1, 0);
 }
 
 static void

=== modified file 'src/process.h'
--- src/process.h	2012-08-27 17:23:48 +0000
+++ src/process.h	2012-11-21 23:43:47 +0000
@@ -142,6 +142,9 @@
     /* Flag to set coding-system of the process buffer from the
        coding_system used to decode process output.  */
     unsigned int inherit_coding_system_flag : 1;
+    /* Whether the process is alive, i.e., can be waited for.  Running
+       processes can be waited for, but exited and fake processes cannot.  */
+    unsigned int alive : 1;
     /* Record the process status in the raw form in which it comes from `wait'.
        This is to avoid consing in a signal handler.  The `raw_status_new'
        flag indicates that `raw_status' contains a new status that still


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Backport fix for bug #8855?
  2012-11-21 22:33 Backport fix for bug #8855? Ken Brown
  2012-11-21 23:50 ` bug#8855: Fix backported to Emacs 24 Paul Eggert
@ 2012-11-21 23:54 ` Paul Eggert
  2012-11-22 14:40   ` Ken Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Paul Eggert @ 2012-11-21 23:54 UTC (permalink / raw)
  To: Ken Brown; +Cc: Emacs Development

On 11/21/2012 02:33 PM, Ken Brown wrote:
> Is there any chance of backporting the fix for bug #8855 to the emacs-24 branch?

The backport is easy; I created a patch and you can see it in
<http://bugs.gnu.org/8855#53>.

This bug has been there for a while, so as I understand it the
fix normally would wait until the next version since we're in a
freeze.  But if the bug has become more annoying recently perhaps
there's enough justification for putting the fix into emacs-24.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-21 23:50 ` bug#8855: Fix backported to Emacs 24 Paul Eggert
@ 2012-11-22  3:49   ` Eli Zaretskii
  2012-11-22 15:13   ` Stefan Monnier
  1 sibling, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2012-11-22  3:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8855

> Date: Wed, 21 Nov 2012 15:50:59 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> Attached is the fixed backported to the emacs-24 branch
> (bzr 110936) if there's interest.

The related changes for MS-Windows (r110922) are needed as well.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Backport fix for bug #8855?
  2012-11-21 23:54 ` Backport fix for bug #8855? Paul Eggert
@ 2012-11-22 14:40   ` Ken Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Ken Brown @ 2012-11-22 14:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Chong Yidong, Stefan Monnier, Emacs Development

On 11/21/2012 6:54 PM, Paul Eggert wrote:
> On 11/21/2012 02:33 PM, Ken Brown wrote:
>> Is there any chance of backporting the fix for bug #8855 to the emacs-24 branch?
>
> The backport is easy; I created a patch and you can see it in
> <http://bugs.gnu.org/8855#53>.
>
> This bug has been there for a while, so as I understand it the
> fix normally would wait until the next version since we're in a
> freeze.  But if the bug has become more annoying recently perhaps
> there's enough justification for putting the fix into emacs-24.

This is obviously up to Stefan and Chong.  I can't honestly say the bug 
has become more annoying recently.  It's been annoying for a long time, 
and I was very glad that you fixed it.

In my experience, the bug only shows up if I start emacs without a D-Bus 
daemon running.  I suspect most GNU/Linux users always have a daemon 
running when they start emacs under X, so they aren't aware of the bug. 
  The reason for my request that you backport the fix is that I've been 
thinking ahead to the release of emacs-24.3 for the Cygwin distribution. 
  Cygwin users are probably not accustomed to starting a D-Bus daemon, 
so I will need to do something.

If Stefan and Chong decide that the fix should not be backported, I'll 
probably just apply the fix locally to my Cygwin build.

Ken




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-21 23:50 ` bug#8855: Fix backported to Emacs 24 Paul Eggert
  2012-11-22  3:49   ` Eli Zaretskii
@ 2012-11-22 15:13   ` Stefan Monnier
  2012-11-22 19:20     ` Paul Eggert
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2012-11-22 15:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8855

> Attached is the fixed backported to the emacs-24 branch
> (bzr 110936) if there's interest.

There's something I don't understand.
On Oct 29, Chong said:

> > simplifications, unfortunately.  The patch fixes a bug that's been
> > reported multiple times so I'm thinking it may be worthwhile to
> > install now, even though there's a feature freeze.
> I think the patch is OK to commit.  Put the extended explanation in a
> comment somewhere in the code, though---not in the ChangeLog entry.

So how come the patch is not in emacs-24 yet?


        Stefan





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-22 15:13   ` Stefan Monnier
@ 2012-11-22 19:20     ` Paul Eggert
  2012-11-23  2:40       ` Chong Yidong
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Eggert @ 2012-11-22 19:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 8855

On 11/22/2012 07:13 AM, Stefan Monnier wrote:
> So how come the patch is not in emacs-24 yet?

I understood Chong's comment to be about the trunk, not
about emacs-24.  The trunk was frozen at the time, if
I recall, so his permission would have been needed for
the trunk.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-22 19:20     ` Paul Eggert
@ 2012-11-23  2:40       ` Chong Yidong
  2012-11-23 21:46         ` Ken Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Chong Yidong @ 2012-11-23  2:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8855

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 11/22/2012 07:13 AM, Stefan Monnier wrote:
>> So how come the patch is not in emacs-24 yet?
>
> I understood Chong's comment to be about the trunk, not about
> emacs-24.  The trunk was frozen at the time, if I recall, so his
> permission would have been needed for the trunk.

The emacs-24 branch hadn't even been created when I wrote that, so
"commits to the trunk" at the time meant "code going into 24.3".

Feel free to backport it now.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-23  2:40       ` Chong Yidong
@ 2012-11-23 21:46         ` Ken Brown
  2012-11-23 22:22           ` Paul Eggert
  0 siblings, 1 reply; 11+ messages in thread
From: Ken Brown @ 2012-11-23 21:46 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Paul Eggert, 8855

On 11/22/2012 9:40 PM, Chong Yidong wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
>
>> On 11/22/2012 07:13 AM, Stefan Monnier wrote:
>>> So how come the patch is not in emacs-24 yet?
>>
>> I understood Chong's comment to be about the trunk, not about
>> emacs-24.  The trunk was frozen at the time, if I recall, so his
>> permission would have been needed for the trunk.
>
> The emacs-24 branch hadn't even been created when I wrote that, so
> "commits to the trunk" at the time meant "code going into 24.3".
>
> Feel free to backport it now.

It would be good to get this done before tomorrow's release of 24.2.90. 
  Paul, is there still time to do it?

Ken





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2012-11-23 21:46         ` Ken Brown
@ 2012-11-23 22:22           ` Paul Eggert
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Eggert @ 2012-11-23 22:22 UTC (permalink / raw)
  To: Ken Brown; +Cc: Chong Yidong, 8855

On 11/23/2012 01:46 PM, Ken Brown wrote:
> Paul, is there still time to do it?

Yes, done with Eli's suggestions as emacs-24 bzr 110946.

On 11/22/2012 06:40 PM, Chong Yidong wrote:
>
> The emacs-24 branch hadn't even been created when I wrote that, so
> "commits to the trunk" at the time meant "code going into 24.3".

Sorry about the confusion -- my memory must have been faulty.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#8855: Fix backported to Emacs 24
  2011-06-13 17:49 bug#8855: dbus error at startup Dan Nicolaescu
@ 2012-11-27  2:32 ` Paul Eggert
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Eggert @ 2012-11-27  2:32 UTC (permalink / raw)
  To: 8855

As discussed on emacs-devel, I had to undo that backport to
the emacs-24 branch.

The problem was that the fix for Bug#8855 introduces a new
bug, Bug#12980.  I have followed up at Bug#12980 but the patch
is nontrivial, so for now Bug#8855 remains unfixed on the
emacs-24 branch.





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-11-27  2:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-21 22:33 Backport fix for bug #8855? Ken Brown
2012-11-21 23:50 ` bug#8855: Fix backported to Emacs 24 Paul Eggert
2012-11-22  3:49   ` Eli Zaretskii
2012-11-22 15:13   ` Stefan Monnier
2012-11-22 19:20     ` Paul Eggert
2012-11-23  2:40       ` Chong Yidong
2012-11-23 21:46         ` Ken Brown
2012-11-23 22:22           ` Paul Eggert
2012-11-21 23:54 ` Backport fix for bug #8855? Paul Eggert
2012-11-22 14:40   ` Ken Brown
  -- strict thread matches above, loose matches on Subject: below --
2011-06-13 17:49 bug#8855: dbus error at startup Dan Nicolaescu
2012-11-27  2:32 ` bug#8855: Fix backported to Emacs 24 Paul Eggert

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.