From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#12471: Avoid some signal-handling races, and simplify. Date: Thu, 20 Sep 2012 20:44:51 +0300 Message-ID: <837grozizw.fsf@gnu.org> References: <50590626.2070407@cs.ucla.edu> <83lig6yoim.fsf@gnu.org> <505AB299.9090605@cs.ucla.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1348163136 16238 80.91.229.3 (20 Sep 2012 17:45:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 20 Sep 2012 17:45:36 +0000 (UTC) Cc: 12471@debbugs.gnu.org, lekktu@gmail.com To: Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Sep 20 19:45:40 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TEkom-0001Ml-0l for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Sep 2012 19:45:40 +0200 Original-Received: from localhost ([::1]:56036 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEkoh-0007KY-L1 for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Sep 2012 13:45:35 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:60759) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEkoe-0007KI-9w for bug-gnu-emacs@gnu.org; Thu, 20 Sep 2012 13:45:33 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TEkoc-0001nU-LV for bug-gnu-emacs@gnu.org; Thu, 20 Sep 2012 13:45:32 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35977) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEkoc-0001nP-Hf for bug-gnu-emacs@gnu.org; Thu, 20 Sep 2012 13:45:30 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TEkq6-00074J-3r for bug-gnu-emacs@gnu.org; Thu, 20 Sep 2012 13:47:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 20 Sep 2012 17:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 12471 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 12471-submit@debbugs.gnu.org id=B12471.134816317527116 (code B ref 12471); Thu, 20 Sep 2012 17:47:01 +0000 Original-Received: (at 12471) by debbugs.gnu.org; 20 Sep 2012 17:46:15 +0000 Original-Received: from localhost ([127.0.0.1]:45523 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TEkpL-00073I-3e for submit@debbugs.gnu.org; Thu, 20 Sep 2012 13:46:15 -0400 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:42745) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TEkpH-000738-AR for 12471@debbugs.gnu.org; Thu, 20 Sep 2012 13:46:13 -0400 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MAN00K00T9VY300@a-mtaout20.012.net.il> for 12471@debbugs.gnu.org; Thu, 20 Sep 2012 20:44:37 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MAN00K1FTACWQ20@a-mtaout20.012.net.il>; Thu, 20 Sep 2012 20:44:37 +0300 (IDT) In-reply-to: <505AB299.9090605@cs.ucla.edu> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:64648 Archived-At: > Date: Wed, 19 Sep 2012 23:07:21 -0700 > From: Paul Eggert > CC: bug-gnu-emacs@gnu.org, lekktu@gmail.com > > On 09/19/2012 09:18 AM, Eli Zaretskii wrote: > > > Why is it a good idea to replace this macro with a function? > > The function makes it easier to control access to volatile variables, > that is, to access them in a minimal number of times to lessen the > number of race conditions. This requires a local variable. I don't understand. The only volatile variable that is involved is interrupt_input_blocked, which is a global variable. How does a function help in this regard? > > - kill (getpid (), fatal_error_code); > > + raise (fatal_error_code); > > > > If there are good reasons to prefer 'raise' to 'kill' in this context, > > On POSIX hosts, the change is helpful because it fixes some > race conditions. raise (SIGABRT) is equivalent to > pthread_kill (pthread_self (), SIGABRT), and we do want the > signal to be delivered to the main thread rather than to > some other thread (which 'kill' might do). In that case, can we use 'pthread_kill' as you show above, instead of 'raise'? I'd like to avoid redefining 'raise' on Windows; see below. > > can we please special-case SIGABRT here and call 'emacs_abort' > > directly, at least for hosts that cannot produce the backtrace? > > Otherwise, assertion violations will not end up in 'emacs_abort', > > AFAICS. > > Sorry, I'm not following. By "assertion violations" do you > mean eassert failure? If so, currently: > > eassert fails. > 'die' prints "Emacs fatal error" with some details. > fatal_error_backtrace resets SIGABRT to SIG_DFL > fatal_error_backtrace outputs a backtrace if available > fatal_error_backtrace unblocks SIGABRT > fatal_error_backtrace uses 'kill' with SIGABRT to terminate Emacs > > so emacs_abort is never called. Under the proposed patch, the > actions do basically the same thing: > > eassert fails > 'die' resets SIGABRT to SIG_DFL > 'die' prints "Emacs fatal error" with some details. > terminate_due_to_signal resets SIGABRT to SIG_DFL > terminate_due_to_signal outputs a backtrace if available > terminate_due_to_signal unblocks SIGABRT > terminate_due_to_signal uses 'raise' with SIGABRT to terminate Emacs > > emacs_abort is not called in either the old code or the new. It does, on Windows. (I thought other platforms shared that, but perhaps I was wrong.) On Windows, 'kill' is redefined to call 'emacs_abort', when its first arg is Emacs's own PID and the second arg is SIGABRT. (Originally, 'die' called 'abort', which on Windows was redirected to 'w32_abort'. Then calls to 'abort' were changed to call 'emacs_abort', and then 'w32_abort' was renamed to 'emacs_abort'. But then, when you made 'die' call fatal_error_backtrace, that made Emacs on Windows exit silently upon assertion violations. So I changed 'sys_kill' to call 'emacs_abort' on Windows.) If you leave 'raise' there, it will invoke 'abort' from the system library on Windows, which is not what we want. And I don't want to redefine yet another library function, if it can be avoided. Incidentally, I think this feature of producing a backtrace on a fatal error will be hated by both users and Emacs maintainers, but that's another subject for another thread. > > - handle_on_main_thread (sig, handle_interrupt_signal); > > + deliver_process_signal (sig, handle_interrupt_signal); > > } > > > > Do we really need this double indirection? > > Yes, because Gnome/gtk+ threads shouldn't execute handle_interrupt_signal > at the same time that the Emacs main thread is running. That signal > handler doesn't do an atomic action, and if its actions are done in > parallel with the Emacs main thread, bad things could happen. > There's commentary to this effect in deliver_process_signal. We are miscommunicating. I meant why we need a bunch of functions, called deliver_FOO_signal, whose body is a single line that calls handle_on_main_thread (now deliver_process_signal, which is even a more confusing name, IMO -- the previous one at least hinted at what it did). Then handle_on_main_thread does some thread-related magic and/or calls the _real_ handler for the signal. Thus "double indirection": instead of calling the handler, we call a function that calls another function, which calls the real handler. (And btw, some of these handlers are also one-liners: they call fatal_error_backtrace, now renamed to terminate_due_to_signal, yet another function. Can you say "triple indirection"?) This makes the code hard to understand, especially since these functions are spread all over in several different source files. Can we do better than that? Ideally, there should be a single function, the signal handler, which does everything. Then a given signal will have just two places to look up: the place where you call sigaction to set up the handler, and another one where the handler is implemented. As for the FORWARD_SIGNAL_TO_MAIN_THREAD stuff, you could make each signal handler invoke a function that does just this part. Or maybe it would work to set up things so that no non-main threads ever get any signals -- is there some pthread_* function that can setup a callback to run each time a thread is created? if so, perhaps we could use such a callback to block signals in each thread. If any of that can be done, I think the signal-related code will really become much easier to study, understand, and maintain. > > Why is IEEE_FLOATING_POINT, which detects the representation of FP > > numbers, related to SIGFPE? > > Most hosts nowadays use IEEE floating point, so they do not trap on > exceptions and don't need an exception handler. Well, SIGFPE could happen for reasons entirely unrelated to math calculations and functions. A hardware problem, for example. > The exception handler is problematic (it longjmps out of a signal > handler, which leads to some real problems) and should be omitted > unless it's really needed. Given that SIGFPE will not happen due to calculations, how about changing its handler to call fatal_error_backtrace, rather than signal a Lisp-level error?