From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: storm@cua.dk (Kim F. Storm) Newsgroups: gmane.emacs.devel Subject: Re: busyloop in sigchld_handler Date: Wed, 14 Mar 2007 10:24:05 +0100 Message-ID: References: <45F59395.4010708@gnu.org> <45F5A2B4.7090301@gnu.org> <85ejnumf1o.fsf@lola.goethe.zz> <868xe11tzu.fsf@lola.quinscape.zz> <85abyglrbs.fsf@lola.goethe.zz> <85tzwokb4x.fsf@lola.goethe.zz> <86tzwoxq1q.fsf@lola.quinscape.zz> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1173864215 4081 80.91.229.12 (14 Mar 2007 09:23:35 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 14 Mar 2007 09:23:35 +0000 (UTC) Cc: Andreas Schwab , Sam Steingold , emacs-devel@gnu.org To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Mar 14 10:23:27 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1HRPhe-00042x-OB for ged-emacs-devel@m.gmane.org; Wed, 14 Mar 2007 10:23:27 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HRPiW-0000f4-AT for ged-emacs-devel@m.gmane.org; Wed, 14 Mar 2007 04:24:20 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1HRPiT-0000eW-6j for emacs-devel@gnu.org; Wed, 14 Mar 2007 05:24:17 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1HRPiS-0000d7-8u for emacs-devel@gnu.org; Wed, 14 Mar 2007 05:24:16 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HRPiS-0000ct-5h for emacs-devel@gnu.org; Wed, 14 Mar 2007 04:24:16 -0500 Original-Received: from pfepb.post.tele.dk ([195.41.46.236]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1HRPhQ-0002gy-Tr; Wed, 14 Mar 2007 05:23:13 -0400 Original-Received: from kfs-l.imdomain.dk.cua.dk (unknown [80.165.4.124]) by pfepb.post.tele.dk (Postfix) with SMTP id 19457A5005B; Wed, 14 Mar 2007 10:23:05 +0100 (CET) In-Reply-To: <86tzwoxq1q.fsf@lola.quinscape.zz> (David Kastrup's message of "Wed\, 14 Mar 2007 08\:06\:09 +0100") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.95 (gnu/linux) X-detected-kernel: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:67905 Archived-At: David Kastrup writes: > Andreas Schwab writes: > >> David Kastrup writes: >> >>> The CPU is claimed by the process with the loop, so no other process >>> may actually progress to a state which can be "wait"ed for. >> >> Of there is no child to be waited for then there is no loop. > > In order to make sophistics solve the problem, you need to convince > the kernel. This happens in the sigchld handler - which is only invoked when there is a dead child (zombie) to "wait3" for - so we should not have to wait for the dead child to "really die". In addition, we call wait3 with WNOHANG, so it is not supposed to block if there are no dead childs. That why Andreas and I can't really see where the busy loop can happen, but since the loop _is_ observed, it is important to understand why it happens, not just install a "semi-random" patch which fixes the problem, but nobody can explain why. Perhaps we need to ask a Linux kernel hacker? Here's the code in condensed form: while (1) { while (1) { errno = 0; pid = wait3 (&w, WNOHANG | WUNTRACED, 0); if (! (pid < 0 && errno == EINTR)) break; /* Avoid a busyloop: wait3 is a system call, so we do not want to prevent the kernel from actually sending SIGCHLD to emacs by asking for it all the time. */ sleep (1); } if (pid <= 0) return; /* handle death of child `pid' */ } So the problem is the interpretation of an EINTR error from wait3(..., WNOHANG, ...). The Linux man page says: EINTR if WNOHANG was not set and an unblocked signal or a SIGCHLD was caught. So WNOHANG => EINTR is not explained, but the usual meaning is that the wait3 was interrupted by some other signal - and if there is a loop, that signal is repeated somehow ... However, with the test code I inserted into the sigchld handler, and then executing M-x complile once after starting emacs -Q, it clearly shows that: a) the sigchld handler is entered exactly once. b) the first wait3 returns immediately with the pid of the compile process, c) the next wait3 returns immediately with 0, since there are no more processes to wait for. So where's the busy loop? The above code is the version for Linux - other variations of the code are used for other platform, but the OP said this was observed on a GNU/Linux system. Thinking more about it, I wonder why we use the WUNTRACED flag on wait3. WUNTRACED which means to also return for children which are stopped, and whose status has not been reported. Why do we care about stopped processes? -- Kim F. Storm http://www.cua.dk