unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#14474: 24.3.50; Zombie subprocesses (again)
@ 2013-05-25 23:38 Michael Heerdegen
  2013-05-25 23:49 ` Michael Heerdegen
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Michael Heerdegen @ 2013-05-25 23:38 UTC (permalink / raw)
  To: 14474


Hello,

dunno if this is related to bug#12980.  Although I had used a fresh
build all the time, I saw the following problem yesterday for the first
time (note: was on a trip before, so the problem could have been
introduced one or two weeks before today).

I'm using emacs-snapshot on Debian, currently a five days old build:

"GNU Emacs 24.3.50.1 (x86_64-pc-linux-gnu, GTK+ Version 3.4.2)
 of 2013-05-21 on dex, modified by Debian"

I'm experiencing the following:

- I start Emacs in X as a different user (via gksu), or

- I start Emacs from an X session that was started with startx

In such an Emacs, any child process seems to become a zombie after being
finished.  E.g., after typing "exit" in a *terminal* running bash, there
is still a running buffer process.  As a symptom, CPU is used
continuously at 100% until I C-x C-c.

However, if I log in via display manager and don't switch to another
user via gksu, this doesn't happen.  And: it happens with the gtk
version as well as with the lucid version, but _not_ with emacs -nw in
an xterm.

Please ask me if you need more info.


Thanks,

Michael.




In GNU Emacs 24.3.50.1 (x86_64-pc-linux-gnu, GTK+ Version 3.4.2)
 of 2013-05-21 on dex, modified by Debian
 (emacs-snapshot package, version 2:20130520-1)
Windowing system distributor `The X.Org Foundation', version 11.0.11204000
System Description:	Debian GNU/Linux testing (jessie)

Configured using:
 `configure --build x86_64-linux-gnu --host x86_64-linux-gnu
 --prefix=/usr --sharedstatedir=/var/lib --libexecdir=/usr/lib
 --localstatedir=/var --infodir=/usr/share/info --mandir=/usr/share/man
 --with-pop=yes
 --enable-locallisppath=/etc/emacs-snapshot:/etc/emacs:/usr/local/share/emacs/24.3.50/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.3.50/site-lisp:/usr/share/emacs/site-lisp
 --without-compress-info --with-crt-dir=/usr/lib/x86_64-linux-gnu/
 --with-x=yes --with-x-toolkit=gtk3 --with-imagemagick=yes
 CFLAGS='-DDEBIAN -DSITELOAD_PURESIZE_EXTRA=5000 -g -O2'
 CPPFLAGS='-D_FORTIFY_SOURCE=2' LDFLAGS='-g -Wl,--as-needed
 -znocombreloc''






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-25 23:38 bug#14474: 24.3.50; Zombie subprocesses (again) Michael Heerdegen
@ 2013-05-25 23:49 ` Michael Heerdegen
  2013-05-26  2:55 ` Eli Zaretskii
  2013-05-26 17:37 ` Paul Eggert
  2 siblings, 0 replies; 20+ messages in thread
From: Michael Heerdegen @ 2013-05-25 23:49 UTC (permalink / raw)
  To: 14474

Michael Heerdegen <michael_heerdegen@web.de> writes:

> I'm experiencing the following:
>
> - I start Emacs in X as a different user (via gksu), or
>
> - I start Emacs from an X session that was started with startx
>
> In such an Emacs, any child process seems to become a zombie after being
> finished.  E.g., after typing "exit" in a *terminal* running bash, there
> is still a running buffer process.  As a symptom, CPU is used
> continuously at 100% until I C-x C-c.

BTW, this is what Paul Eggert answered in emacs-dev:

> I can reproduce the problem on Ubuntu 13.04.  Apparently when you
> start up a GTK Emacs session that can't talk to dbus (because it's
> su'ed), the dbus library starts up its own service, using dbus-launch.
> This messes up Emacs somehow (I don't know why).


Thanks,

Michael.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-25 23:38 bug#14474: 24.3.50; Zombie subprocesses (again) Michael Heerdegen
  2013-05-25 23:49 ` Michael Heerdegen
@ 2013-05-26  2:55 ` Eli Zaretskii
  2013-05-26 17:37 ` Paul Eggert
  2 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2013-05-26  2:55 UTC (permalink / raw)
  To: michael_heerdegen; +Cc: 14474

> From: Michael Heerdegen <michael_heerdegen@web.de>
> Date: Sun, 26 May 2013 01:38:56 +0200
> 
> In such an Emacs, any child process seems to become a zombie after being
> finished.  E.g., after typing "exit" in a *terminal* running bash, there
> is still a running buffer process.  As a symptom, CPU is used
> continuously at 100% until I C-x C-c.

Can you attach a debugger and see where Emacs is looping?  etc/DEBUG
tells how to do that.

Thanks.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-25 23:38 bug#14474: 24.3.50; Zombie subprocesses (again) Michael Heerdegen
  2013-05-25 23:49 ` Michael Heerdegen
  2013-05-26  2:55 ` Eli Zaretskii
@ 2013-05-26 17:37 ` Paul Eggert
  2013-05-26 18:33   ` Michael Heerdegen
  2 siblings, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2013-05-26 17:37 UTC (permalink / raw)
  To: 14474

A workaround, for me at least, is to propagate the
DBUS_SESSION_BUS_ADDRESS environment variable into
the child process with a different userid.  

For example, here is a failing session, where I became the user 'exp'
and later observed the problem in a shell window:

$ env | grep DBUS
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-bpx4rxPk7z,guid=6e491bf38a5b2b6fce17d0a251a221bf
$ sudo sh
# su exp
$ env | grep DBUS
$ emacs
** (emacs:15115): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-x2KgryK9C8: Connection refused

And here is a session that worked.  The key difference is that
I used su's '-E' option:

$ env | grep DBUS
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-bpx4rxPk7z,guid=6e491bf38a5b2b6fce17d0a251a221bf
$ sudo -E sh
# su exp
$ env | grep DBUS
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-bpx4rxPk7z,guid=6e491bf38a5b2b6fce17d0a251a221bf
$ emacs
** (emacs:15441): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-x2KgryK9C8: Connection refused

In both cases, the dbus library complains to stderr that it can't connect
to /tmp/dbus-x2KgryK9C8 (I don't know where it's getting that name from).
When DBUS_SESSION_BUS_ADDRESS is unset, the dbus library arranges to run
the shell script /usr/bin/dbus-launch, which seems to cause the problem.
But when DBUS_SESSION_BUS_ADDRESS is set, the dbus library falls back
on its contents and doesn't invoke dbus-launch.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-26 17:37 ` Paul Eggert
@ 2013-05-26 18:33   ` Michael Heerdegen
  2013-05-27  1:36     ` Paul Eggert
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Heerdegen @ 2013-05-26 18:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 14474

Paul Eggert <eggert@cs.ucla.edu> writes:

> And here is a session that worked.  The key difference is that
> I used su's '-E' option:
>
> $ env | grep DBUS
> DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-bpx4rxPk7z,guid=6e491bf38a5b2b6fce17d0a251a221bf
> $ sudo -E sh
> # su exp
> $ env | grep DBUS
> DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-bpx4rxPk7z,guid=6e491bf38a5b2b6fce17d0a251a221bf
> $ emacs
> ** (emacs:15441): WARNING **: Couldn't connect to accessibility bus:
> Failed to connect to socket /tmp/dbus-x2KgryK9C8: Connection refused

I see something similar - using the -E flag for sudo works as a
workaround.  However, I don't get this "Failed to connect to socket..."
warning.  Instead, I get

** (emacs:6638): WARNING **: The connection is closed


Michael.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-26 18:33   ` Michael Heerdegen
@ 2013-05-27  1:36     ` Paul Eggert
  2013-05-27 12:46       ` Colin Walters
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2013-05-27  1:36 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: 14474, Colin Walters

[The bug is that a bleeding-edge GTK Emacs loses child processes
when it's run via sudo; see <http://bugs.gnu.org/14474>.]

I think I may have spotted the problem.
Glib 2.36.2's glib/gmain.c has a function
'ensure_unix_signal_handler_installed_unlocked'
that is run in the dconf worker thread.
This function calls sigaction to replace Emacs's SIGCHLD handler
with glib's own handler g_unix_signal_handler.
Signal handlers are process-wide, so this replacement affects
all threads, including the main (Emacs) thread.

After that happens, Emacs never sees when its children
exit, since g_unix_signal_handler discards Emacs's
child-exit notices, and the Emacs function
deliver_child_signal is never invoked.

The comment for g_child_watch_source_new
says that Emacs isn't supposed to invoke waitpid (-1, ...),
but that's already the case in the Emacs trunk.
Is there another limitation that we
didn't know about, a limitation that says Emacs can't
have signal handlers either?

I'll CC: this to Colin Walters since he seemed to have
a good handle on the situation from the glib point of view; see
<https://bugzilla.gnome.org/show_bug.cgi?id=676167>.

One possibility is to see if we can get Emacs to use
glib's child watcher.  But that's a bit of a delicate balance,
since Emacs must work even when gtk is absent, and it may need
to hand off from its own watcher to glib's watcher, and processes
shouldn't get lost during the handoff.  I don't offhand know how
to do all that.

A simpler but hacky workaround is to not use the graphical interface if
DBUS_SESSION_BUS_ADDRESS is unset.  Something like this:

--- src/xterm.c	2013-05-09 14:49:56 +0000
+++ src/xterm.c	2013-05-27 01:32:44 +0000
@@ -9819,6 +9819,14 @@ x_display_ok (const char *display)
     int dpy_ok = 1;
     Display *dpy;
 
+#ifdef USE_GTK
+    if (! egetenv ("DBUS_SESSION_BUS_ADDRESS"))
+      {
+	fprintf (stderr, "DBUS_SESSION_BUS_ADDRESS unset, so Gtk is unsafe\n");
+	return 0;
+      }
+#endif
+
     dpy = XOpenDisplay (display);
     if (dpy)
       XCloseDisplay (dpy);







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-27  1:36     ` Paul Eggert
@ 2013-05-27 12:46       ` Colin Walters
  2013-05-27 17:36         ` Paul Eggert
  2013-06-01  1:03         ` Paul Eggert
  0 siblings, 2 replies; 20+ messages in thread
From: Colin Walters @ 2013-05-27 12:46 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Michael Heerdegen, 14474

On Sun, 2013-05-26 at 18:36 -0700, Paul Eggert wrote:

> but that's already the case in the Emacs trunk.
> Is there another limitation that we
> didn't know about, a limitation that says Emacs can't
> have signal handlers either?

Basically it's going to be very hard over time to avoid codepaths
in the GTK+ stack that don't call g_spawn_*() indirectly, thus
installing a SIGCHLD handler, particuarly due to the pluggable nature of
Gio.

> I'll CC: this to Colin Walters since he seemed to have
> a good handle on the situation from the glib point of view; see
> <https://bugzilla.gnome.org/show_bug.cgi?id=676167>.

Yeah, I don't think much has changed since then.

> One possibility is to see if we can get Emacs to use
> glib's child watcher.

That'd be best obviously.

>   But that's a bit of a delicate balance,
> since Emacs must work even when gtk is absent,

Bear in mind that GLib is usable without gtk.  Even if you don't
have an X connection, if the GLib mainloop is linked into the process,
I don't see a reason not to use it.

>  and it may need
> to hand off from its own watcher to glib's watcher, and processes
> shouldn't get lost during the handoff. 

Would Emacs really be spawning processes before initializing
the frontend?

> A simpler but hacky workaround is to not use the graphical interface if
> DBUS_SESSION_BUS_ADDRESS is unset.

I don't see a real problem with that as a temporary thing.

Anyways, if there is something I can do GLib side, let me know.







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-27 12:46       ` Colin Walters
@ 2013-05-27 17:36         ` Paul Eggert
  2013-05-28 16:56           ` Paul Eggert
  2013-05-28 17:04           ` Jan Djärv
  2013-06-01  1:03         ` Paul Eggert
  1 sibling, 2 replies; 20+ messages in thread
From: Paul Eggert @ 2013-05-27 17:36 UTC (permalink / raw)
  To: Colin Walters; +Cc: Michael Heerdegen, 14474

[The context is
http://bugs.gnu.org/14474
]

On 05/27/2013 05:46 AM, Colin Walters wrote:

> Basically it's going to be very hard over time to avoid codepaths
> in the GTK+ stack that don't call g_spawn_*() indirectly, thus
> installing a SIGCHLD handler

Thanks.  In that case, shouldn't the glib documentation be
changed to warn application developers not to install a SIGCHLD
handler as well?  Currently it warns them only to not call
waitpid(-1, ...).

Are application developers allowed to temporarily mask SIGCHLD?
Emacs does that a lot.

>> One possibility is to see if we can get Emacs to use
>> > glib's child watcher.
> That'd be best obviously.

I suspect so too, but it requires more expertise in
glib than I have (which is, basically, nothing).
If I understand things correctly, if Emacs is using
Gtk it should

 * never call sigaction (SIGCHLD, ...) or signal (SIGCHLD, ...)
   or waitpid (-1, ...).
   E.g., remove the current call to sigaction (SIGCHLD, ...),
   in src/process.c's init_process_emacs.
   
 * Whenever Emacs creates a child process, use the
   following pattern:

       block SIGCHLD;
       pid = vfork ();
       if (pid > 0)
         {
           record pid in Emacs's process table, as location 'loc';
           record in *loc that glib is watching this pid;
           g_child_watch_add (pid, watcher, loc);
         }
       unblock SIGCHLD;

  * never call waitpid (pid, ...) if PID is recorded
    in Emacs's process table as something that glib is
    watching.

  * Add a glue function ("watcher", above) that does
    something like this:

      void watcher (GPid pid, gint status, gpointer loc) {
	block SIGCHLD
        record that PID exited with status STATUS, by modifying *LOC,
	  sort of like's what currently done in handle_child_signal;
        if (input_available_clear_time)
	  *input_available_clear_time = make_emacs_time (0, 0);
        unblock SIGCHLD
     }

But this sounds incomplete.  No doubt there's something
about the main loop, or setting up the watchers, that I don't
know about.  E.g., how does one remove the watcher once it
has fired and told us that the process has exited?

I'll CC: this to Jan Djärv, who knows about gtk, to
see if he can help.







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-27 17:36         ` Paul Eggert
@ 2013-05-28 16:56           ` Paul Eggert
  2013-05-28 20:42             ` Michael Heerdegen
  2013-06-04 17:12             ` Michael Heerdegen
  2013-05-28 17:04           ` Jan Djärv
  1 sibling, 2 replies; 20+ messages in thread
From: Paul Eggert @ 2013-05-28 16:56 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: 14474

In <http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00628.html>
something like the following milder workaround was suggested instead.
Michael, does this patch work around the bug for your test case?

=== modified file 'src/xterm.c'
--- src/xterm.c	2013-05-09 14:49:56 +0000
+++ src/xterm.c	2013-05-28 16:34:44 +0000
@@ -9897,6 +9897,13 @@ x_term_init (Lisp_Object display_name, c
 
         XSetLocaleModifiers ("");
 
+	/* If D-Bus is not already configured, inhibit D-Bus autolaunch,
+	   as autolaunch can mess up Emacs's SIGCHLD handler.
+	   FIXME: Rewrite subprocess handlers to use glib's child watchers.
+	   See Bug#14474.  */
+	if (! egetenv ("DBUS_SESSION_BUS_ADDRESS"))
+	  xputenv ("DBUS_SESSION_BUS_ADDRESS=");
+
         /* Emacs can only handle core input events, so make sure
            Gtk doesn't use Xinput or Xinput2 extensions.  */
 	xputenv ("GDK_CORE_DEVICE_EVENTS=1");





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-27 17:36         ` Paul Eggert
  2013-05-28 16:56           ` Paul Eggert
@ 2013-05-28 17:04           ` Jan Djärv
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Djärv @ 2013-05-28 17:04 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Michael Heerdegen, Colin Walters, 14474

Hello.

27 maj 2013 kl. 19:36 skrev Paul Eggert <eggert@cs.ucla.edu>:

> [The context is
> http://bugs.gnu.org/14474
> ]
> 
> On 05/27/2013 05:46 AM, Colin Walters wrote:
> 
>> Basically it's going to be very hard over time to avoid codepaths
>> in the GTK+ stack that don't call g_spawn_*() indirectly, thus
>> installing a SIGCHLD handler
> 
> Thanks.  In that case, shouldn't the glib documentation be
> changed to warn application developers not to install a SIGCHLD
> handler as well?  Currently it warns them only to not call
> waitpid(-1, ...).
> 
> Are application developers allowed to temporarily mask SIGCHLD?
> Emacs does that a lot.
> 
>>> One possibility is to see if we can get Emacs to use
>>>> glib's child watcher.
>> That'd be best obviously.
> 

> I suspect so too, but it requires more expertise in
> glib than I have (which is, basically, nothing).
> If I understand things correctly, if Emacs is using
> Gtk it should
> 

Actually GLib is linked in whenever one of GSettings, GConf, Gtk or rsvg is used.
I see rsvg only is not handeled in xgselect.c, an oversight.


> * never call sigaction (SIGCHLD, ...) or signal (SIGCHLD, ...)
>   or waitpid (-1, ...).
>   E.g., remove the current call to sigaction (SIGCHLD, ...),
>   in src/process.c's init_process_emacs.
> 
> * Whenever Emacs creates a child process, use the
>   following pattern:
> 
>       block SIGCHLD;
>       pid = vfork ();
>       if (pid > 0)
>         {
>           record pid in Emacs's process table, as location 'loc';
>           record in *loc that glib is watching this pid;
>           g_child_watch_add (pid, watcher, loc);
>         }
>       unblock SIGCHLD;
> 
>  * never call waitpid (pid, ...) if PID is recorded
>    in Emacs's process table as something that glib is
>    watching.
> 
>  * Add a glue function ("watcher", above) that does
>    something like this:
> 
>      void watcher (GPid pid, gint status, gpointer loc) {
> 	block SIGCHLD
>        record that PID exited with status STATUS, by modifying *LOC,
> 	  sort of like's what currently done in handle_child_signal;
>        if (input_available_clear_time)
> 	  *input_available_clear_time = make_emacs_time (0, 0);
>        unblock SIGCHLD
>     }
> 
> But this sounds incomplete.  No doubt there's something
> about the main loop, or setting up the watchers, that I don't
> know about.  E.g., how does one remove the watcher once it
> has fired and told us that the process has exited?
> 

Keep track of the return value from g_child_watch_add and pass it to g_source_remove.
I think g_source_remove can be called in the callback function.

We kind of use GLibs main loop in xgselect.c, so child watches should be called from there.
As GLib:s main loop is an "all or nothing" approach, we could also move the filedescriptor and timeout handling  there.  Then xgselect.c could more or less go away.  But there is no real gain to do that, xgselect works well enough.

	Jan D.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-28 16:56           ` Paul Eggert
@ 2013-05-28 20:42             ` Michael Heerdegen
  2013-06-04 17:12             ` Michael Heerdegen
  1 sibling, 0 replies; 20+ messages in thread
From: Michael Heerdegen @ 2013-05-28 20:42 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 14474

Paul Eggert <eggert@cs.ucla.edu> writes:

> In <http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00628.html>
> something like the following milder workaround was suggested instead.
> Michael, does this patch work around the bug for your test case?

Thanks for that, but I currently use a precompiled package for my OS
(emacs-snapshot), so I can neither debug C nor test patches.

It would be great if someone else that can reproduce this bug could try
that.  If not, I'll try to build Emacs myself in the next days, hoping
that the problem manifests there, too.


Regards,

Michael.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-27 12:46       ` Colin Walters
  2013-05-27 17:36         ` Paul Eggert
@ 2013-06-01  1:03         ` Paul Eggert
  2013-06-01  1:22           ` Colin Walters
  1 sibling, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2013-06-01  1:03 UTC (permalink / raw)
  To: Colin Walters; +Cc: Michael Heerdegen, Michael Albinus, 14474

On 05/27/2013 05:46 AM, Colin Walters wrote:
>> One possibility is to see if we can get Emacs to use
>> > glib's child watcher.
> That'd be best obviously.

I looked into this a bit, and found a problem.
Emacs wants to be notified about child processes
that are stopped, so it invokes waitpid with the
WUNTRACED option, but glib never uses WUNTRACED
when invoking waitpid.  If Emacs used glib to watch for
child processes, Emacs will not be informed about
a child process changing state because it has
stopped.  (Similarly for WCONTINUED and processes
that have been continued.)

Perhaps glib needs a new function, which lets the
caller specify additional options to be given to
waitpid?  Something like this, say:

   g_child_watch_source_new_full (pid, WUNTRACED | WCONTINUED)

Then, g_child_watch_source_new (pid) would be equivalent to
g_child_watch_source_new_full (pid, 0).





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-01  1:03         ` Paul Eggert
@ 2013-06-01  1:22           ` Colin Walters
  2013-06-01  6:14             ` Paul Eggert
  0 siblings, 1 reply; 20+ messages in thread
From: Colin Walters @ 2013-06-01  1:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Michael Heerdegen, Michael Albinus, 14474

On Fri, 2013-05-31 at 18:03 -0700, Paul Eggert wrote:
> On 05/27/2013 05:46 AM, Colin Walters wrote:
> >> One possibility is to see if we can get Emacs to use
> >> > glib's child watcher.
> > That'd be best obviously.
> 
> I looked into this a bit, and found a problem.
> Emacs wants to be notified about child processes
> that are stopped, 

Why, out of curiosity?

>    g_child_watch_source_new_full (pid, WUNTRACED | WCONTINUED)

We could add that to glib-unix.h probably, yeah.







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-01  1:22           ` Colin Walters
@ 2013-06-01  6:14             ` Paul Eggert
  2013-06-01 14:33               ` Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2013-06-01  6:14 UTC (permalink / raw)
  To: Colin Walters; +Cc: Michael Heerdegen, Michael Albinus, 14474

On 05/31/2013 06:22 PM, Colin Walters wrote:
> Why, out of curiosity?

Emacs has a function process-status that returns
a process's status.  Possible statuses include

run  -- for a process that is running.
stop -- for a process stopped but continuable.
exit -- for a process that has exited.
signal -- for a process that has got a fatal signal.

To implement this, Emacs keeps track, for each of its
child processes, what that process's status is.
Emacs updates the information that it records about
a child process whenever it's notified about
a child process status change.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-01  6:14             ` Paul Eggert
@ 2013-06-01 14:33               ` Stefan Monnier
  2013-06-03 16:09                 ` Colin Walters
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2013-06-01 14:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Michael Heerdegen, Colin Walters, Michael Albinus, 14474

> Emacs has a function process-status that returns
> a process's status.

Not only that, but the process-sentinel is called when the status
changes.  This said, I don't know if there are any process-sentinels out
there that need to be told when a process is stopped or "continued".


        Stefan





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-01 14:33               ` Stefan Monnier
@ 2013-06-03 16:09                 ` Colin Walters
  2013-06-04  7:20                   ` Paul Eggert
  0 siblings, 1 reply; 20+ messages in thread
From: Colin Walters @ 2013-06-03 16:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Michael Heerdegen, Paul Eggert, Michael Albinus, 14474

On Sat, 2013-06-01 at 10:33 -0400, Stefan Monnier wrote:
> > Emacs has a function process-status that returns
> > a process's status.
> 
> Not only that, but the process-sentinel is called when the status
> changes.  This said, I don't know if there are any process-sentinels out
> there that need to be told when a process is stopped or "continued".

Right; I kind of doubt it.  Regardless though, I filed:

https://bugzilla.gnome.org/show_bug.cgi?id=701538

Are there any other blocking issues for Emacs using the GLib mainloop?
If that's the only one I can probably get around to doing a patch this
week.

I suspect though you could simply not report stopped status, and not
break any real world programs.  The only thing I can think of is a
multiprocess application which sends SIGSTOP to children (but why would
they do that?).







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-03 16:09                 ` Colin Walters
@ 2013-06-04  7:20                   ` Paul Eggert
  2013-06-05 17:21                     ` Paul Eggert
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2013-06-04  7:20 UTC (permalink / raw)
  To: Colin Walters; +Cc: Michael Heerdegen, Michael Albinus, 14474

On 06/03/2013 09:09 AM, Colin Walters wrote:
> Are there any other blocking issues for Emacs using the GLib mainloop?
> If that's the only one I can probably get around to doing a patch this
> week.

Don't know of any.  But I haven't implemented it yet.

If it's the only problem, perhaps the Emacs code should be written
to run on older glibs, where it'll ignore child-process stops and continues.
If this turns out to be a real problem we can disable it (i.e., use the
current godawful workaround) on older glibs.  But anyway, the idea is
to prevent this from being a blocking issue for Emacs.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-05-28 16:56           ` Paul Eggert
  2013-05-28 20:42             ` Michael Heerdegen
@ 2013-06-04 17:12             ` Michael Heerdegen
  2020-09-09 13:52               ` Lars Ingebrigtsen
  1 sibling, 1 reply; 20+ messages in thread
From: Michael Heerdegen @ 2013-06-04 17:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 14474

Hi Paul,

> In <http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00628.html>
> something like the following milder workaround was suggested instead.
> Michael, does this patch work around the bug for your test case?

Have already installed it to trunk?  The issue is fixed for me after
upgrading my emacs-snapshot to

(emacs-version) ==>

GNU Emacs 24.3.50.1 (x86_64-pc-linux-gnu, GTK+ Version 3.8.2)
of 2013-06-03 on dex, modified by Debian


Thanks,

Michael.

>
> === modified file 'src/xterm.c'
> --- src/xterm.c	2013-05-09 14:49:56 +0000
> +++ src/xterm.c	2013-05-28 16:34:44 +0000
> @@ -9897,6 +9897,13 @@ x_term_init (Lisp_Object display_name, c
>  
>          XSetLocaleModifiers ("");
>  
> +	/* If D-Bus is not already configured, inhibit D-Bus autolaunch,
> +	   as autolaunch can mess up Emacs's SIGCHLD handler.
> +	   FIXME: Rewrite subprocess handlers to use glib's child watchers.
> +	   See Bug#14474.  */
> +	if (! egetenv ("DBUS_SESSION_BUS_ADDRESS"))
> +	  xputenv ("DBUS_SESSION_BUS_ADDRESS=");
> +
>          /* Emacs can only handle core input events, so make sure
>             Gtk doesn't use Xinput or Xinput2 extensions.  */
>  	xputenv ("GDK_CORE_DEVICE_EVENTS=1");





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-04  7:20                   ` Paul Eggert
@ 2013-06-05 17:21                     ` Paul Eggert
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Eggert @ 2013-06-05 17:21 UTC (permalink / raw)
  To: Colin Walters; +Cc: Michael Heerdegen, Michael Albinus, 14474

I found another problem with trying to have Emacs use glib's child watcher.
glib's signal handling code uses SA_RESTART and SA_NOCLDSTOP.
Both flags are non-starters for Emacs.  SA_NOCLDSTOP, I suppose,
could be conditionalized based on the discussion in Gnome bug
reports 701538 and 562501.  But SA_RESTART is more of a worry.
An interactive Emacs doesn't want SA_RESTART, because Emacs wants
long-running syscalls to be interrupted after a signal, not
restarted.

I thought of a way to work around this problem: have Emacs catch
SIGCHLD using its own flags, and call glib's SIGCHLD handler as part
of Emacs's SIGCHLD handler.  So I installed the patch quoted at the
end of this message into the Emacs trunk as bzr 112859.  If you've
had D-bus problems please try this new approach.

This raises three more questions for glib, though.  First,
why does glib use SA_RESTART?  If it's to avoid having application
syscalls fail with errno==EINTR, then we're OK.  But if it's to
avoid having glib's internal syscalls fail with errno==EINTR, then
we have a problem, as that can happen with the following patch
(and it can also happen with vanilla Emacs 24.3).

Second, should there be a more robust way for Emacs to invoke
glib's SIGCHLD handler.  The code below is a bit of a hack:
it uses g_source_unref (g_child_watch_source_new (0)) to
create and free a dummy SIGCHLD source, the only reason being
to trick glib into installing its SIGCHLD handler.  It also assumes
that glib does not use SA_SIGINFO.  This all seems fairly fragile.

Third, if a glib memory allocation fails, what does Emacs do?
Emacs tries hard not to exit when there's a memory allocation failure,
but I worry that glib will simply call 'exit' if malloc fails, which
is not good.

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2013-06-05 12:17:02 +0000
+++ src/ChangeLog	2013-06-05 17:04:13 +0000
@@ -1,3 +1,17 @@
+2013-06-05  Paul Eggert  <eggert@cs.ucla.edu>
+
+	Chain glib's SIGCHLD handler from Emacs's (Bug#14474).
+	* process.c (dummy_handler): New function.
+	(lib_child_handler): New static var.
+	(handle_child_signal): Invoke it.
+	(catch_child_signal): If a library has set up a signal handler,
+	save it into lib_child_handler.
+	(init_process_emacs): If using glib and not on Windows, tickle glib's
+	child-handling code so that it initializes its private SIGCHLD handler.
+	* syssignal.h (SA_SIGINFO): Default to 0.
+	* xterm.c (x_term_init): Remove D-bus hack that I installed on May
+	31; it should no longer be needed now.
+
 2013-06-05  Michael Albinus  <michael.albinus@gmx.de>
 
 	* emacs.c (main) [HAVE_GFILENOTIFY]: Call globals_of_gfilenotify.

=== modified file 'src/process.c'
--- src/process.c	2013-06-03 18:47:35 +0000
+++ src/process.c	2013-06-05 17:04:13 +0000
@@ -6100,6 +6100,12 @@
    might inadvertently reap a GTK-created process that happened to
    have the same process ID.  */
 
+/* LIB_CHILD_HANDLER is a SIGCHLD handler that Emacs calls while doing
+   its own SIGCHLD handling.  On POSIXish systems, glib needs this to
+   keep track of its own children.  The default handler does nothing.  */
+static void dummy_handler (int sig) {}
+static signal_handler_t volatile lib_child_handler = dummy_handler;
+
 /* Handle a SIGCHLD signal by looking for known child processes of
    Emacs whose status have changed.  For each one found, record its
    new status.
@@ -6184,6 +6190,8 @@
 	    }
 	}
     }
+
+  lib_child_handler (sig);
 }
 
 static void
@@ -7035,9 +7043,13 @@
 void
 catch_child_signal (void)
 {
-  struct sigaction action;
+  struct sigaction action, old_action;
   emacs_sigaction_init (&action, deliver_child_signal);
-  sigaction (SIGCHLD, &action, 0);
+  sigaction (SIGCHLD, &action, &old_action);
+  eassert (! (old_action.sa_flags & SA_SIGINFO));
+  if (old_action.sa_handler != SIG_DFL && old_action.sa_handler != SIG_IGN
+      && old_action.sa_handler != deliver_child_signal)
+    lib_child_handler = old_action.sa_handler;
 }
 
 \f
@@ -7055,6 +7067,11 @@
   if (! noninteractive || initialized)
 #endif
     {
+#if defined HAVE_GLIB && !defined WINDOWSNT
+      /* Tickle glib's child-handling code so that it initializes its
+	 private SIGCHLD handler.  */
+      g_source_unref (g_child_watch_source_new (0));
+#endif
       catch_child_signal ();
     }
 

=== modified file 'src/syssignal.h'
--- src/syssignal.h	2013-01-02 16:13:04 +0000
+++ src/syssignal.h	2013-06-05 17:04:13 +0000
@@ -50,6 +50,10 @@
 # define NSIG NSIG_MINIMUM
 #endif
 
+#ifndef SA_SIGINFO
+# define SA_SIGINFO 0
+#endif
+
 #ifndef emacs_raise
 # define emacs_raise(sig) raise (sig)
 #endif

=== modified file 'src/xterm.c'
--- src/xterm.c	2013-05-31 01:41:52 +0000
+++ src/xterm.c	2013-06-05 17:04:13 +0000
@@ -9897,13 +9897,6 @@
 
         XSetLocaleModifiers ("");
 
-	/* If D-Bus is not already configured, inhibit D-Bus autolaunch,
-	   as autolaunch can mess up Emacs's SIGCHLD handler.
-	   FIXME: Rewrite subprocess handlers to use glib's child watchers.
-	   See Bug#14474.  */
-	if (! egetenv ("DBUS_SESSION_BUS_ADDRESS"))
-	  xputenv ("DBUS_SESSION_BUS_ADDRESS=unix:path=/dev/null");
-
         /* Emacs can only handle core input events, so make sure
            Gtk doesn't use Xinput or Xinput2 extensions.  */
 	xputenv ("GDK_CORE_DEVICE_EVENTS=1");







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#14474: 24.3.50; Zombie subprocesses (again)
  2013-06-04 17:12             ` Michael Heerdegen
@ 2020-09-09 13:52               ` Lars Ingebrigtsen
  0 siblings, 0 replies; 20+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-09 13:52 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: Paul Eggert, 14474

Michael Heerdegen <michael_heerdegen@web.de> writes:

>> In <http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00628.html>
>> something like the following milder workaround was suggested instead.
>> Michael, does this patch work around the bug for your test case?
>
> Have already installed it to trunk?  The issue is fixed for me after
> upgrading my emacs-snapshot to
>
> (emacs-version) ==>
>
> GNU Emacs 24.3.50.1 (x86_64-pc-linux-gnu, GTK+ Version 3.8.2)
> of 2013-06-03 on dex, modified by Debian

There was some followup talk here about other possible glib problems,
but it looks like Paul fixed those two?  (I just skimmed the patch,
which was applied at the time.)

So I'm closing this bug report; if there are any further issues here,
please respond to the debbugs address and we'll reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-09 13:52 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-25 23:38 bug#14474: 24.3.50; Zombie subprocesses (again) Michael Heerdegen
2013-05-25 23:49 ` Michael Heerdegen
2013-05-26  2:55 ` Eli Zaretskii
2013-05-26 17:37 ` Paul Eggert
2013-05-26 18:33   ` Michael Heerdegen
2013-05-27  1:36     ` Paul Eggert
2013-05-27 12:46       ` Colin Walters
2013-05-27 17:36         ` Paul Eggert
2013-05-28 16:56           ` Paul Eggert
2013-05-28 20:42             ` Michael Heerdegen
2013-06-04 17:12             ` Michael Heerdegen
2020-09-09 13:52               ` Lars Ingebrigtsen
2013-05-28 17:04           ` Jan Djärv
2013-06-01  1:03         ` Paul Eggert
2013-06-01  1:22           ` Colin Walters
2013-06-01  6:14             ` Paul Eggert
2013-06-01 14:33               ` Stefan Monnier
2013-06-03 16:09                 ` Colin Walters
2013-06-04  7:20                   ` Paul Eggert
2013-06-05 17:21                     ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).