From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#7728: 24.0.50; GDB backtrace from abort Date: Thu, 13 Jan 2011 21:40:35 -0500 Message-ID: References: <30041A5C411E45A7B7AF7A9ECA3AA0BE@us.oracle.com><83y67echvm.fsf@gnu.org> <837heopknq.fsf@gnu.org> <4D2D5E29.3010502@gmx.at> <26B139ADC64E4827BE54938B3CF26872@us.oracle.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1294973676 25463 80.91.229.12 (14 Jan 2011 02:54:36 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 14 Jan 2011 02:54:36 +0000 (UTC) Cc: 7728@debbugs.gnu.org To: "Drew Adams" Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jan 14 03:54:31 2011 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PdZo5-00075K-Nu for geb-bug-gnu-emacs@m.gmane.org; Fri, 14 Jan 2011 03:54:30 +0100 Original-Received: from localhost ([127.0.0.1]:58283 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PdZo4-0005I5-US for geb-bug-gnu-emacs@m.gmane.org; Thu, 13 Jan 2011 21:54:29 -0500 Original-Received: from [140.186.70.92] (port=55554 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PdZny-0005Ho-2r for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2011 21:54:23 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PdZnw-0004HR-Fb for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2011 21:54:21 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:46511) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PdZnw-0004HN-Br for bug-gnu-emacs@gnu.org; Thu, 13 Jan 2011 21:54:20 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1PdZUH-00025y-U5; Thu, 13 Jan 2011 21:34:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 14 Jan 2011 02:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7728 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 7728-submit@debbugs.gnu.org id=B7728.12949723877993 (code B ref 7728); Fri, 14 Jan 2011 02:34:01 +0000 Original-Received: (at 7728) by debbugs.gnu.org; 14 Jan 2011 02:33:07 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PdZTO-00024s-HA for submit@debbugs.gnu.org; Thu, 13 Jan 2011 21:33:06 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PdZTM-00024P-66 for 7728@debbugs.gnu.org; Thu, 13 Jan 2011 21:33:05 -0500 Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.69) (envelope-from ) id 1PdZad-00057m-1J; Thu, 13 Jan 2011 21:40:35 -0500 In-reply-to: (drew.adams@oracle.com) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Thu, 13 Jan 2011 21:34:01 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:43347 Archived-At: > From: "Drew Adams" > Cc: , <7728@debbugs.gnu.org> > Date: Thu, 13 Jan 2011 17:19:43 -0800 > > > > In this case the `save-window-excursion' should amount to a > > > no-op in the end. The source and target window and frame need > > > not be the same in general, but they are the same in the > > > crashes I reported. > > > > I don't believe this to be true, at least not from Emacs's internals > > POV. The code that crashes clearly executes the branch where the > > frame recorded by save-window-excursion is NOT the selected frame by > > the time the body of save-window-excursion is done being evaluated. > > As I said, I followed the _source_ code in the debugger. And the source code > does not cause a crash. The source code lets us know what _should_ be happening > here, not what is actually happening that provokes a crash. Since you couldn't reproduce the crash under the Lisp debugger, the evidence you collected during that debugging session is not really admissible in the court of Emacs bugs ;-) IOW, the backtrace you posted clearly shows that somehow, save-window-excursion needed to switch frames, and its code that restores the original window configuration therefore needed to select a different frame. That is a fact revealed by the C backtrace. If we want to make sure that this frame switch is real, I would suggest to look at the values of sf and w->frame in this fragment from select-window: sf = SELECTED_FRAME (); if (XFRAME (WINDOW_FRAME (w)) != sf) { XFRAME (WINDOW_FRAME (w))->selected_window = window; /* Use this rather than Fhandle_switch_frame so that FRAME_FOCUS_FRAME is moved appropriately as we move around in the state where a minibuffer in a separate frame is active. */ Fselect_frame (WINDOW_FRAME (w), norecord); For that, you need to reproduce the crash, then go to the call-stack frame where select-window (Fselect_window) invokes select-frame (Fselect_frame). In your original backtrace, this was frame #15: #12 0x01288ef3 in Fredirect_frame_focus (frame=93005829, focus_frame=93005829) at frame.c:2082 #13 0x0127f4c8 in do_switch_frame (frame=93005829, track=1, for_deletion=0, norecord=49010714) at frame.c:847 #14 0x01280733 in Fselect_frame (frame=93005829, norecord=49010714) at frame.c:899 #15 0x01252702 in Fselect_window (window=93006853, norecord=49010714) at window.c:3581 #16 0x0125e7c8 in Fset_window_configuration (configuration=99327941) at window.c:6148 So in that case, you would need to issue the following GDB commands: (gdb) frame 15 (gdb) p sf->name (gdb) xstring (gdb) p w->frame (gdb) xframe The 3rd and the 5th command will display the names of the two frames, the one that's selected at this point, and the one to which the window w (from the configuration being restored) belongs, respectively. We could then try to understand how come Emacs thinks it needs to switch frames, while your analysis of the Lisp code suggests these two should have specified the same frame. (Note that frame #15 could have a different number in a different crash, so look for the frame whose description is the same as what is shown about, i.e. a call from Fselect_window to Fselect_frame, and use the number of that frame.) > > > * Let me repeat that the _source code works fine_ - no > > > error, no crash, no bug. > > > > > > * Let me repeat too that the byte-compiled code (no matter > > > which Emacs version it was compiled with) works fine in all > > > Emacs versions except the current development code - no error, > > > no crash, no bug. > > > > I don't think this to be relevant, sorry. > > Why? The only thing new to the mix is the new Emacs dev version. The source > code and the byte-compiled code are the same as before. The regression is not > realized using the source code. It happens only with the new dev version when > it executes the byte code. Why isn't that relevant? Because we have the C backtrace (thanks to you). And that backtrace speaks for itself. There's nothing in it that cannot be understood without invoking some non-trivial bug in the compiled byte code. So, while it's certainly possible that byte compilation has some unwanted effect here, it sounds extremely unlikely, certainly not the first explanation we should try. > > I'm inclined to think that it's some weird side effect of > > Edebug, or maybe something else. > > You think _what_ is a weird effect of edebug? The fact that uncompiled code seems to avoid the crash. > With the debugger there was no crash. So it certainly cannot be > some weird effect of the debugger that is causing the crash. The way I see it, you had a crash with byte-compiled code without the debugger, and you had no crash with uncompiled code under the debugger. Which of these two variables caused the difference in behavior remains to be seen. > > > This is a _regression_ due to some change in the development > > > version that no longer plays well with the byte-compiled code. > > > > That's a possibility, but I think it's a remote one. > > Seems more like an inescapable conclusion, to me. Substitute any other Emcs > version and presto: no problem. Substitute the source code for the byte code > and presto: no problem. To really convince me in this, you would have to run Emacs under GDB, using the source Lisp code, step through all the functions involved in the crash, and show that the crash is indeed avoided, and why. If you give me a reproducible recipe for the crash, I might try doing this myself. > > The offending code > > What offending code? The one that sets to nil the internal variable which holds the selected window. That's what triggers the crash, because way down the call-stack, Emacs tries to reference the mode-line face of the frame held in that variable. > What you see as offending code, if it was already in 21.1, did not present a > problem - it wasn't offending anyone. Ever heard of bugs that lurk and rear their ugly head years after they were introduced? > > has been in Emacs since v21.1, so the problem is not new in any way. > > Of course the problem is new. It's a _regression_. Only if the issue is looked at phenomenologically. From my POV, this bug was there for years. > There is no such crash in any prior Emacs version. But you have never before used any Emacs binary compiled with ENABLE_CHECKING, did you? Only such a version will crash, because it does extra checking. > You and I have different views of what "the problem" > is, I guess. For me, the problem is the crash. That's new. We can never fix the crash unless we understand what code causes it, and why. I posted here many messages ago why it crashes, and what I found does not need to invoke any mysterious changes introduced by the byte compiler to explain the crash. It is crystal clear that, under specific and well-defined circumstances set-window-configuration and any code that calls it, including save-window-excursion, can crash in the same way, if the window configuration being restored was recorded in a different frame. _That_ is the problem I'm trying to fix in this bug. While the crash in your specific use-case could indeed be new (if it is explained by something other than the fact you are for the first time using a binary compiled with ENABLE_CHECKING), the defect in the code that I found and described could cause crashes in any number of other use-cases, which have nothing to do with byte-compiling. I'm trying to find a solution for all those use-cases, not just for yours. > > I think you interpret the latest messages incorrectly. No one is > > arguing that your code is the culprit. The correct way to fix this > > bug was pointed out by Stefan several messages ago, and I will do just > > that when I have time. > > I did not understand that you have a solution. I didn't get that impression > from your asking me to check the selected window in the debugger etc. I asked that to have more evidence to back up my analysis. It's never a bad idea to look for more evidence, because sometimes it can contradict the best hypothesis and change the whole picture.