From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Eli Zaretskii" Newsgroups: gmane.emacs.devel Subject: Re: Fix to long-standing crashes in GC Date: Tue, 25 May 2004 09:07:56 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <8011-Tue25May2004090755+0300-eliz@gnu.org> References: <40A3BC23.8060000@math.ku.dk> <40AF976B.2090104@math.ku.dk> <9003-Sun23May2004183302+0300-eliz@gnu.org> <200405231632.i4NGWZo07382@raven.dms.auburn.edu> <200405250303.i4P33YF17293@raven.dms.auburn.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1085466735 20098 80.91.224.253 (25 May 2004 06:32:15 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 25 May 2004 06:32:15 +0000 (UTC) Cc: larsh@math.ku.dk, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue May 25 08:32:02 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BSVTm-0006WI-00 for ; Tue, 25 May 2004 08:32:02 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BSVTm-0000Zb-00 for ; Tue, 25 May 2004 08:32:02 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BSVPK-00081N-RR for emacs-devel@quimby.gnus.org; Tue, 25 May 2004 02:27:26 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34) id 1BSVPE-00080L-Dv for emacs-devel@gnu.org; Tue, 25 May 2004 02:27:20 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34) id 1BSVOi-0007m8-4X for emacs-devel@gnu.org; Tue, 25 May 2004 02:27:19 -0400 Original-Received: from [192.114.186.23] (helo=aragorn.inter.net.il) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BSVA6-0004ks-Pe for emacs-devel@gnu.org; Tue, 25 May 2004 02:11:43 -0400 Original-Received: from zaretski ([80.230.155.151]) by aragorn.inter.net.il (MOS 3.4.6-GR) with ESMTP id CYC65928; Tue, 25 May 2004 09:09:41 +0300 (IDT) Original-To: Luc Teirlinck X-Mailer: emacs 21.3.50 (via feedmail 8 I) and Blat ver 1.8.9 In-reply-to: <200405250303.i4P33YF17293@raven.dms.auburn.edu> (message from Luc Teirlinck on Mon, 24 May 2004 22:03:34 -0500 (CDT)) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:23913 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:23913 > Date: Mon, 24 May 2004 22:03:34 -0500 (CDT) > From: Luc Teirlinck > > Once you discover the corrupted Lisp object or data structure, it is > useful to look at it in a fresh Emacs session and compare its contents > with a session that you are debugging. > > Except that to notice that a Lisp object is corrupted you have to > _already_ know how its contents look in a fresh Emacs session. No, that's not what DEBUG wants to say. A corrupted object is _always_ the one that caused the crash. That's why we call `abort' at those places: we've discovered something that cannot happen with valid Lisp objects. ``Discovering the corrupted Lisp object or data structure'' in the fragment above means that one needs to find the _enclosing_ data structure of which the corrupted object is a part. For example, if the object that was the immideate cause of the crash is a cdr of some cons cell, one needs to find out what cons cell was that; if it's a member of a plist, one needs to find out whose property list it was; etc. That is when you make use of the last_marked[] array and walk the marking code backwards guided by its contents. > Many Elisp programmers do not have a very good knowledge about the > very low level C structure of various Lisp objects. Well, that's something that comes with experience. However, if you (or someone else) can share some pieces of that knowledge which, if we add it to DEBUG, could make the learning curve shorter and/or less steep, we could certainly use that. > So I went through all of the last_marked array, without any > idea of what to look for, that is: how do you recognize a "corrupted > Lisp object or data structure"? Does what I wrote above help in any way? It cannot cover every possible situation, and of course some knowledge about the object that was the immideate cause of the call to `abort' _is_ needed, but I don't see how this can be avoided. > (gdb) p last_marked[17] > $2 = 143587538 > (gdb) pr > # immediately and please report this bug> Actually, as DEBUG says, it is not recommended to use `pr' in a crashed session, especially one that crashed during GC. `pr' invokes a function inside Emacs code that looks at Lisp data structures; when those data structures are corrupted, `pr' could well cause another segfault and ruin your entire debugging session. > This is not easy since GC changes the tag bits and relocates strings > which make it hard to look at Lisp objects with commands such as `pr'. > It is sometimes necessary to convert Lisp_Object variables into > pointers to C struct's manually. > > It says "It is sometimes necessary...". When? When `pr' and the x* (xstring, xsymbol, etc.) commands fail to print the Lisp object. > When I see: > > pr > > that is, no output, I can guess it is necessary. > > What if I see: > > pr > "" > > I know from experience that I still have to use xstring in that case, > even though the empty string is a perfectly valid return value. But > xstring often reveals a different real value anyway. Is this a bug in > pr or is this normal? Again, don't use `pr' in these cases. Use xtype and the appropriate x* command according to the type. When you use x*, a failure to examine an object generates partial information and an error message, like this: (gdb) xsymbol $201 = (struct Lisp_Symbol *) 0xdeadbeef Argument to arithmetic operation not a number or boolean. You then need to examine the Lisp_Symbol structure at the address shown as a C object: (gdb) print *((struct Lisp_Symbol *) 0xdeadbeef) > What if I see > > pr > "dired-find-file" > > Can I trust _this_ or should I still use xstring, that is, should the > above have said: "It is always necessary, to be safe,..."? In a crashed session, I personally never trust `pr', and only use it as a secondary means, to view very complex data structures. The xsymbol command and its ilk are your friends. I'll try to add this info to DEBUG when I have time (unless someone else beats me to that).