From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Luc Teirlinck Newsgroups: gmane.emacs.devel Subject: Re: Fix to long-standing crashes in GC Date: Mon, 17 May 2004 19:13:39 -0500 (CDT) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200405180013.i4I0Ddl15818@raven.dms.auburn.edu> References: <40A3BC23.8060000@math.ku.dk> NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1084840420 7864 80.91.224.253 (18 May 2004 00:33:40 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 18 May 2004 00:33:40 +0000 (UTC) Cc: larsh@math.ku.dk, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue May 18 02:33:27 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BPsXv-0001DQ-00 for ; Tue, 18 May 2004 02:33:27 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BPsXv-0001tF-00 for ; Tue, 18 May 2004 02:33:27 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BPsJH-0001ik-91 for emacs-devel@quimby.gnus.org; Mon, 17 May 2004 20:18:19 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34) id 1BPsH3-00019V-RC for emacs-devel@gnu.org; Mon, 17 May 2004 20:16:02 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34) id 1BPsFy-0000wm-6z for emacs-devel@gnu.org; Mon, 17 May 2004 20:15:31 -0400 Original-Received: from [131.204.53.104] (helo=manatee.dms.auburn.edu) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BPsFN-0000dV-EK for emacs-devel@gnu.org; Mon, 17 May 2004 20:14:17 -0400 Original-Received: from raven.dms.auburn.edu (raven.dms.auburn.edu [131.204.53.29]) by manatee.dms.auburn.edu (8.12.10/8.12.10) with ESMTP id i4I0DuTS023329; Mon, 17 May 2004 19:14:00 -0500 (CDT) Original-Received: (from teirllm@localhost) by raven.dms.auburn.edu (8.11.6+Sun/8.11.6) id i4I0Ddl15818; Mon, 17 May 2004 19:13:39 -0500 (CDT) X-Authentication-Warning: raven.dms.auburn.edu: teirllm set sender to teirllm@dms.auburn.edu using -f Original-To: storm@cua.dk In-reply-to: (storm@cua.dk) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:23607 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:23607 Kim Storm wrote: I just installed a change which I hope will fix these crashes. Please report if you stil experience crashes in mark_object. I still do and pretty easily in a fully customized Emacs. If I try to visit a remote file using Tramp-ssh, I get an immediate crash. The fact that the crash is now so immediate can not possibly be due to your Lisp_Misc_Free change, but several other changes have occurred since I last updated following Andreas' fix early Saturday (US Central). When I previously replied, I had not yet updated my CVS and tried out your fix. I just _assumed_ that it would "obviously" take care of all my crashes, because every single one of my many crashes was caused by a Lisp_Misc_Free type. The problem seems to be that after a while the data type itself of these objects seems to get corrupted. At that stage, your fix does not work any more, because now they have an _invalid_ data type, instead of being of type Lisp_Misc_Free. Then we get the abort anyway. Apparently, your fix worked for Robert. With me, my updated Emacs crashes immediately when I try to visit a file using Tramp-ssh, _when using a fully customized Emacs_. Apparently, one needs enough "activity" to get these marker types corrupted quickly. In my fully customized Emacs, there is much more timer activity than auto-revert. I have plenty of information and could try to produce a "cleaner" crash with emacs -q and only a few customizations, but at the moment I will just give the information that I personally believe is relevant. The object with the invalid data type _is_ that same marker, I recognize the -37. One could completely fix the crashes by making gc not only fail to abort when discovering still accessible Lisp objects whose memory has been freed (as your fix currently does), but when detecting Lisp objects with invalid data type as well. I do not know enough about very low level Emacs stuff like this, but would that not be dangerous? Should we not try to fix the problem completely differently and try to find out_why_ the memory for these markers was erroneously freed and fix _that_ problem? Then gc could continue to try to detect Lisp objects whose memory was erroneously freed, which, at first sight, would seem like a safer thing to do (but then again, I do not know enough about this to judge). (gdb) bt #0 abort () at emacs.c:434 #1 0x0812a589 in mark_object (arg=143587538) at alloc.c:5042 #2 0x0812a5ea in mark_object (arg=143787141) at alloc.c:5059 #3 0x0812a5ea in mark_object (arg=143785989) at alloc.c:5059 #4 0x0812948a in mark_memory (start=0xbfffb6a0, end=0xbffff5ac) at alloc.c:3781 #5 0x081294f5 in mark_stack () at alloc.c:4055 #6 0x08129aba in Fgarbage_collect () at alloc.c:4429 (gdb) p last_marked_index $1 = 18 (gdb) p last_marked[17] $2 = 143587538 (gdb) pr # (gdb) xtype Lisp_Misc 2 (gdb) xmiscfree $3 = (struct Lisp_Free *) 0x88ef8d0 (gdb) p *$3 $4 = { type = 2, gcmarkbit = 1, spacer = 0, chain = 0x2, padding = "\000\000\000\000<\332\216\b\002\000\000\000\002\000\000" } (gdb) p last_marked[16] $5 = 143787141 (gdb) pr (# . -37) (gdb) p last_marked[15] $6 = 143787133 (gdb) pr ((# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (1 . 58) ("tramp_exit_status 0 " . 1) (# . -20)) (gdb) p last_marked[14] $7 = -296 (gdb) pr -37 (gdb) p last_marked[13] $10 = 143806834 (gdb) pr # (gdb) p last_marked[12] $11 = 143787125 (gdb) pr (# . -37) (gdb) p last_marked[11] $12 = 143787117 (gdb) pr ((# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (1 . 58) ("tramp_exit_status 0 " . 1) (# . -20)) (gdb) p last_marked[10] $13 = -296 (gdb) pr -37 (gdb) p last_marked[9] $14 = 143806858 (gdb) pr # (gdb) p last_marked[8] $15 = 143787109 (gdb) pr (# . -37) (gdb) p last_marked[7] $18 = 143787101 (gdb) pr ((# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -3 7) (1 . 58) ("tramp_exit_status 0 " . 1) (# . -20)) (gdb) p last_marked[6] $19 = -296 (gdb) pr -37 (gdb) p last_marked[5] $20 = 143807602 (gdb) pr # (gdb) p last_marked[4] $21 = 143787093 (gdb) pr (# . -37) (gdb) p last_marked[3] $22 = 143787085 (gdb) pr ((# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (# . -37) (1 . 58) ("tramp_exit_status 0 " . 1) (# . -20)) (gdb) p last_marked[2] $23 = -296 (gdb) pr -37 (gdb) p last_marked[1] $24 = 143807626 (gdb) pr # (gdb) p last_marked[0] $25 = 143787077 (gdb) pr (# . -37)