From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs crashes Date: Wed, 15 Mar 2006 06:43:50 +0200 Message-ID: References: <17429.54459.803236.351040@kahikatea.snap.net.nz> <17431.11106.207260.301400@kahikatea.snap.net.nz> Reply-To: Eli Zaretskii NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1142397867 6261 80.91.229.2 (15 Mar 2006 04:44:27 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 15 Mar 2006 04:44:27 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Mar 15 05:44:27 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FJNrw-0004sQ-W1 for ged-emacs-devel@m.gmane.org; Wed, 15 Mar 2006 05:44:21 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FJNrw-0005fp-F6 for ged-emacs-devel@m.gmane.org; Tue, 14 Mar 2006 23:44:20 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FJNrd-0005fD-Px for emacs-devel@gnu.org; Tue, 14 Mar 2006 23:44:01 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FJNrb-0005dW-Ki for emacs-devel@gnu.org; Tue, 14 Mar 2006 23:44:01 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FJNrb-0005dS-I6 for emacs-devel@gnu.org; Tue, 14 Mar 2006 23:43:59 -0500 Original-Received: from [192.114.186.17] (helo=gandalf.inter.net.il) by monty-python.gnu.org with esmtp (Exim 4.52) id 1FJNvw-0000hR-SB for emacs-devel@gnu.org; Tue, 14 Mar 2006 23:48:29 -0500 Original-Received: from nitzan.inter.net.il (nitzan.inter.net.il [192.114.186.20]) by gandalf.inter.net.il (MOS 3.7.1-GA) with ESMTP id IDR30987; Wed, 15 Mar 2006 06:43:52 +0200 (IST) Original-Received: from HOME-C4E4A596F7 (IGLD-80-230-55-37.inter.net.il [80.230.55.37]) by nitzan.inter.net.il (MOS 3.7.3-GA) with ESMTP id CWX15261 (AUTH halo1); Wed, 15 Mar 2006 06:43:45 +0200 (IST) Original-To: Nick Roberts In-reply-to: <17431.11106.207260.301400@kahikatea.snap.net.nz> (message from Nick Roberts on Wed, 15 Mar 2006 09:45:22 +1300) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51643 Archived-At: > From: Nick Roberts > Date: Wed, 15 Mar 2006 09:45:22 +1300 > Cc: emacs-devel@gnu.org > > > The fact that there are thousands of recursive calls to mark_object is > > not in itself a sign of a problem. It is normal for the mark phase to > > be deeply recursive. > > OK, I didn't know that. Perhaps I should look at the bottom of the backtrace > (i.e low frame nos) instead of the top. Actually, it's the other way around: you need to look at the frames that call mark_object and its subroutines, and try to correlate those frames with the contents of last_marked[] array. Through these two pieces of evidence, you should reconstruct the Lisp data structure that is being marked (recursively) at the point of crash. Once the offending data structure is identified, i.e. you know the name of the Lisp variable/function/whatever that was corrupted, the next step is to try to figure out how it gets corrupted. > (gdb) p last_marked_index > $1 = 482 > (gdb) p last_marked[482] > $2 = 173755437 > (gdb) xtype > Lisp_Cons > (gdb) xcons > $3 = (struct Lisp_Cons *) 0xa5b4c28 > { > car = 0x83bc641, > u = { > cdr = 0x837b8c9, > chain = 0x837b8c9 > } > } > (gdb) p last_marked[481] > $4 = 167781611 > (gdb) xtype > Lisp_String > (gdb) xcons > $5 = (struct Lisp_Cons *) 0xa0024e8 > { > car = 0x4, > u = { > cdr = 0xffffffff, > chain = 0xffffffff > } > } > > These last addresses looks suspect Yes. > I don't know what to do next. You need to go back in time ;-). Print previous values in last_marked[] and correlate them with the backtrace. In each frame of the backtrace, you will see what kind of Lisp primitive data type is being marked, but since some subroutines of mark_object have loops, you won't see all the components being marked in the backtrace, so last_marked[] will fill in the blanks. For each Lisp type you find in last_marked[], try to establish its type and name, and, if it's a string, the value. The name and the string value are the most important parts, since you can then grep the sources to find out what data structure it could belong to. Continue doing this until you find a symbol that is a global or buffer-local variable you can identify in the sources. > Am I right to assume that 481 is the index of the very last marked > object, 480 the one before etc. And that 482 is the index of the > oldest marked object in the array held in a circular fashion? Yes. You need to go from 481 backwards and examine the objects one by one. > Incidentally with gdb-ui, if you display a watch expression in the speedbar > and press 'p' on a component (with a live process), Emacs will print the > s-expression in the GUD buffer. Beware: these features invoke code inside the crashed Emacs version. Even if you have a live process, if it crashed, it is unsafe to invoke `pr' and its ilk in that session, because it will most probably get a SIGSEGV a second time. You _must_ use only the simple commands xtype, xcons, xsymbol, xstring, etc. One other thing: since you are in the middle of the mark stage of GC, some objects, notably the strings in last_marked[] array, have their mark bit set and are relocated. I think xstring, doesn't know how to cope with that, so you might need to look at lisp.h and reconstruct the C pointers to the relevant C data structure manually, instead of using xstring. (This particular piece of experience is from long ago, so perhaps this problem is no longer with us with the current sources. Just don't be intimidated if some xstring says it cannot show the value, even though xtype said it's a string; try walking the C data structures manually.)