From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Mark McAuliffe Newsgroups: gmane.emacs.bugs Subject: Re: core dump triggered by garbage collection (?) Date: Fri, 5 Sep 2003 00:32:48 -0700 Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Message-ID: <16216.15392.176843.420150@oscar.mv.timesten.com> References: <200308281716.h7SHGK012200@mis-dns.mv.timesten.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1062768834 1989 80.91.224.253 (5 Sep 2003 13:33:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 5 Sep 2003 13:33:54 +0000 (UTC) Cc: Mark McAuliffe , bug-gnu-emacs@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Sep 05 15:33:49 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19vGii-0007xS-00 for ; Fri, 05 Sep 2003 15:33:48 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.22) id 19vGiV-0006gq-IP for geb-bug-gnu-emacs@m.gmane.org; Fri, 05 Sep 2003 09:33:35 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.22) id 19vBFE-0007ss-Sx for bug-gnu-emacs@gnu.org; Fri, 05 Sep 2003 03:43:00 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.22) id 19vB9L-0005Px-3B for bug-gnu-emacs@gnu.org; Fri, 05 Sep 2003 03:36:56 -0400 Original-Received: from [63.75.22.209] (helo=mis-dns.mv.timesten.com) by monty-python.gnu.org with esmtp (Exim 4.22) id 19vB5R-00041i-7Z; Fri, 05 Sep 2003 03:32:53 -0400 Original-Received: from oscar.mv.timesten.com.timesten.com (oscar.mv.timesten.com [10.10.10.50]) by mis-dns.mv.timesten.com (8.11.0/8.11.0) with ESMTP id h857Wm009866; Fri, 5 Sep 2003 00:32:48 -0700 Original-To: rms@gnu.org In-Reply-To: X-Mailer: VM 7.01 under Emacs 21.3.1 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:5737 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:5737 Richard Stallman writes: > #19 0x0810f00e in lisp_free (block=0x8be41b0) at alloc.c:630 > #20 0x081130dc in gc_sweep () at alloc.c:5270 > > To learn something this crash, it is necessary to analyze the data > being operated on in those two frames, and try to figure out what was > inconsistent in the data (and what the data were being used for). > Knowing that, we might be able to figure out the code that created > the invalid data. > > This is not easy, but I don't know of any substitute for it. I've spent some time looking into this. I don't know that I have found anything of value, but here is what I've got so far... For starters, I have had 2 more crashes since I reported the bug originally, so I now have 4 core files worth of info. There appear to be 2 types of crash -- presumably the same underlying problem, but 2 different manifestations. In one type, it appears that corrupt data are being found in compact_small_strings. 3 of the 4 core files are like this. The other type finds the corrupt data in gc_sweep. This latter type is the one you specifically asked about above, but it is also the one I have had less luck analyzing (no luck at all, in fact). I'm hoping that what I have been able to learn about the former type will be helpful. If it's not, perhaps you could help steer me in the right direction for the latter one. For "type 1" core files, I wrote a gdb user-defined procedure that can traverse the linked list in compact_small_strings (the inner one, that starts with "for (from = &b->first_data; from < end; from = from_end)". FWIW, it looks like this: define a if ( $from < end ) if ( $from->string == 0 ) set $n = $from->u.nbytes else set $s = $from->string p $s p *$s p ((char*)($s->data)) - ((char*)&($from->u.data)) if ( $s->size_byte < 0 ) set $n = $s->size else set $n = $s->size_byte end end p $nb = ( $n + 8 ) & ~3 p $from = (struct sdata *)((char*)$from + $nb) p *$from end end I initialized $from to be &b->first_data, as in the for-loop, and ran procedure "a" to traverse the list of struct sdata's until it ran into corruption. I did this for the 3 core files that have the problem in compact_small_strings, and I found that the data that appeared right before the corruption were similar. Below is the last few iterations from each core file: core.17451: $104 = (struct Lisp_String *) 0x8b6d9c4 $105 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc80 " 5"} $106 = 2875744 $107 = 8 $108 = (struct sdata *) 0x8dcdb24 $109 = {string = 0x8b6d994, u = {data = "6", nbytes = 1667432502}} (gdb) $110 = (struct Lisp_String *) 0x8b6d994 $111 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc88 "6"} $112 = 2875744 $113 = 8 $114 = (struct sdata *) 0x8dcdb2c $115 = {string = 0x8b6d934, u = {data = " ", nbytes = 3547168}} (gdb) $116 = (struct Lisp_String *) 0x8b6d934 $117 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc90 " 6"} $118 = 2875744 $119 = 8 $120 = (struct sdata *) 0x8dcdb34 $121 = {string = 0x8b6d924, u = {data = "7", nbytes = 538968119}} (gdb) $122 = (struct Lisp_String *) 0x8b6d924 $123 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc98 "7"} $124 = 2875744 $125 = 8 $126 = (struct sdata *) 0x8dcdb3c $127 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}} (gdb) $128 = (struct Lisp_String *) 0x8b6d914 $129 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bca0 ""} $130 = 2875744 $131 = 8 $132 = (struct sdata *) 0x8dcdb44 $133 = {string = 0x20202020, u = {data = "m", nbytes = 1919115629}} (gdb) $134 = (struct Lisp_String *) 0x20202020 Cannot access memory at address 0x20202020 (gdb) core.24594 $269 = (struct Lisp_String *) 0x9c1c50c $270 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed40 "18"} $271 = -4418724 $272 = 8 $273 = (struct sdata *) 0xa0559e8 $274 = {string = 0x9c1c4ec, u = {data = " ", nbytes = 3682592}} (gdb) $275 = (struct Lisp_String *) 0x9c1c4ec $276 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed48 " 18"} $277 = -4418724 $278 = 8 $279 = (struct sdata *) 0xa0559f0 $280 = {string = 0x9c1c4dc, u = {data = "1", nbytes = 14641}} (gdb) $281 = (struct Lisp_String *) 0x9c1c4dc $282 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed50 "19"} $283 = -4418724 $284 = 8 $285 = (struct sdata *) 0xa0559f8 $286 = {string = 0x9c1c4bc, u = {data = " ", nbytes = 3748128}} (gdb) $287 = (struct Lisp_String *) 0x9c1c4bc $288 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed58 " 19"} $289 = -4418724 $290 = 8 $291 = (struct sdata *) 0xa055a00 $292 = {string = 0x9c1a494, u = {data = "", nbytes = 0}} (gdb) $293 = (struct Lisp_String *) 0x9c1a494 $294 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed60 ""} $295 = -4418724 $296 = 8 $297 = (struct sdata *) 0xa055a08 $298 = {string = 0x43c143, u = {data = "8", nbytes = 1240629304}} (gdb) $299 = (struct Lisp_String *) 0x43c143 Cannot access memory at address 0x43c143 core.25897 $1007 = (struct Lisp_String *) 0x9fb29c4 $1008 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc7c "7"} $1009 = 74632 $1010 = 8 $1011 = (struct sdata *) 0xa35a8f8 $1012 = {string = 0x9fb2964, u = {data = " ", nbytes = 3612704}} (gdb) $1013 = (struct Lisp_String *) 0x9fb2964 $1014 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc84 " 7"} $1015 = 74632 $1016 = 8 $1017 = (struct sdata *) 0xa35a900 $1018 = {string = 0x9fb2944, u = {data = "8", nbytes = 56}} (gdb) $1019 = (struct Lisp_String *) 0x9fb2944 $1020 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc8c "8"} $1021 = 74632 $1022 = 8 $1023 = (struct sdata *) 0xa35a908 $1024 = {string = 0x9fb2924, u = {data = " ", nbytes = 3678240}} (gdb) $1025 = (struct Lisp_String *) 0x9fb2924 $1026 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc94 " 8"} $1027 = 74632 $1028 = 8 $1029 = (struct sdata *) 0xa35a910 $1030 = {string = 0xa388b24, u = {data = "9", nbytes = 57}} (gdb) $1031 = (struct Lisp_String *) 0xa388b24 $1032 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc9c "9"} $1033 = 74632 $1034 = 8 $1035 = (struct sdata *) 0xa35a918 $1036 = {string = 0xa388ae4, u = {data = "", nbytes = 0}} (gdb) $1037 = (struct Lisp_String *) 0xa388ae4 $1038 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cca4 ""} $1039 = 74632 $1040 = 8 $1041 = (struct sdata *) 0xa35a920 $1042 = {string = 0x24, u = {data = "$", nbytes = 36}} (gdb) $1043 = (struct Lisp_String *) 0x24 Cannot access memory at address 0x24 (gdb) In all three cases, the strings that appear before the corruption are numbers. Since the crash always seems to happen when I try to read mail with VM, I assume those numbers are the message numbers in the VM summary buffer. Significant? Helpful?? I dunno... I also tried to figure out what the data was that overwrote the list data for tthe 3 core files: core.17451 The gdb snippet below picks up right after the above snippet for core.17451. The overwriting data appears to be basically text (a compiled lisp macro?): (gdb) p $x = $126 $135 = (struct sdata *) 0x8dcdb3c (gdb) p *$x $136 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}} (gdb) set print null-stop o Display all 117 possibilities? (y or n) (gdb) set print null-stop off (gdb) p $x->u.data $137 = "" (gdb) p $x->u.data[0]@20 $138 = "\0 macro %\b%_\b_" (gdb) p $x->u.data[0]@100 $139 = "\0 macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be created in\n" (gdb) p $x->u.data[0]@200 $140 = "\0 macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be created in\n", ' ' , "the directory named by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n -\b--\b-p\bpr\bre\bef\bfi\bix\b" (gdb) p $x->u.data[0]@400 $141 = "\0 macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be created in\n", ' ' , "the directory named by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n -\b--\b-p\bpr\bre\bef\bfi\bix\b\0 _\b"... (I hope that stuff survives being emailed...). core.24594 This gdb snippet more-or-less picks up where the above 24594 snippet left off, with some editing: (gdb) p $x = $267 $306 = (struct sdata *) 0xa0559e0 (gdb) x/100 $x->u.data 0xa0559e4: 0x49003831 0x09c1c4ec 0x00383120 0x09c1c4dc 0xa0559f4: 0x00003931 0x09c1c4bc 0x00393120 0x09c1a494 0xa055a04: 0x00000000 0x0043c143 0x49f28038 0x00000006 0xa055a14: 0x40000000 0x00000032 0x0043c144 0x49f28038 0xa055a24: 0x00000006 0x40000000 0x00000032 0x0043c145 0xa055a34: 0x49f28038 0x00000006 0x40000000 0x0000002e 0xa055a44: 0x0043c146 0x49f28038 0x00000006 0x40000000 0xa055a54: 0x0000002e 0x00000000 0x00000000 0x00000006 0xa055a64: 0x40000000 0x00000020 0x00005480 0x489f3ce0 0xa055a74: 0x00000006 0x40000004 0x0000002f 0x00005481 0xa055a84: 0x489f3ce0 0x00000006 0x40000004 0x00000077 0xa055a94: 0x00005482 0x489f3ce0 0x00000006 0x40000004 0xa055aa4: 0x0000006f 0x00005483 0x489f3ce0 0x00000006 0xa055ab4: 0x40000004 0x00000072 0x00005484 0x489f3ce0 0xa055ac4: 0x00000006 0x40000004 0x00000000 0x09c1a494 0xa055ad4: 0x48003032 0x09c1a454 0x00303220 0x09c1a424 0xa055ae4: 0x00003132 0x09c1a414 0x00313220 0x09c1a404 0xa055af4: 0x00003232 0x09c1a3f4 0x00323220 0x09c1a3e4 0xa055b04: 0x40003332 0x09c1a3d4 0x00333220 0x09c1a3c4 0xa055b14: 0x00003432 0x09c1a3b4 0x00343220 0x09c1a3a4 0xa055b24: 0x48003532 0x09c1a394 0x00353220 0x09c1a384 0xa055b34: 0x00003632 0x09c1a374 0x00363220 0x09c1a354 0xa055b44: 0x00003732 0x09c1a344 0x00373220 0x09c1a334 0xa055b54: 0x40003832 0x09c1a324 0x00383220 0x09c1a314 0xa055b64: 0x00003932 0x09c1a304 0x00393220 0x09c1a2f4 The first two lines are the tail end of the good data. The third line is where things get messed up. The corruption data seems to have some pattern to it, but I have no idea what it might be. core.25897 This gdb snippet picks up more or less where the above 25897 snippet leaves off (with some editing). The corruption data for this core file seems to have some regularity too: (gdb) p $x = $1005 $1049 = (struct sdata *) 0xa35a8f0 (gdb) x/100 $x->u.data 0xa35a8f4: 0x00000037 0x09fb2964 0x00372020 0x09fb2944 0xa35a904: 0x00000038 0x09fb2924 0x00382020 0x0a388b24 0xa35a914: 0x00000039 0x0a388ae4 0x00000000 0x00000024 0xa35a924: 0x00000024 0x00000000 0x00000000 0x00000000 0xa35a934: 0x00000919 0x0a44ab38 0x4212e280 0x00000000 0xa35a944: 0x00000000 0x6877202c 0x20686369 0x73207369 0xa35a954: 0x20746e65 0x74206f74 0x73206568 0x00000000 0xa35a964: 0x00000000 0xffffffff 0x00000001 0x00000000 0xa35a974: 0x00000000 0x00000000 0x65736e6f 0x1826d17c 0xa35a984: 0x1826d17c 0x1826d17c 0x394b1aec 0x1826d17c 0xa35a994: 0x00000000 0x1826d17c 0x1826d17c 0x286e23dc 0xa35a9a4: 0x1826d17c 0x1826d26c 0x38273a14 0x582cd6ac 0xa35a9b4: 0x1826d17c 0x1826d17c 0x4828bf50 0x48277028 0xa35a9c4: 0x48277668 0x1826d1ac 0x00000008 0x00000046 0xa35a9d4: 0x00000000 0x1826d17c 0x1826d17c 0x48277e98 0xa35a9e4: 0x48365800 0x0a388ae4 0x00392020 0x0a388aa4 0xa35a9f4: 0x18003031 0x0a388a74 0x00303120 0x0a388a24 0xa35aa04: 0x18003131 0x0a388a04 0x00313120 0x0a3889f4 0xa35aa14: 0x18003231 0x0a3889c4 0x00323120 0x0a3889b4 0xa35aa24: 0x18003331 0x0a388994 0x00333120 0x0a388984 0xa35aa34: 0x18003431 0x0a388964 0x00343120 0x0a3888e4 0xa35aa44: 0x18003531 0x0a3888c4 0x00353120 0x0a3888b4 0xa35aa54: 0x00003631 0x0a388894 0x00363120 0x0a388834 0xa35aa64: 0x18003731 0x0a388824 0x00373120 0x0a3887f4 0xa35aa74: 0x18003831 0x0a3887d4 0x00383120 0x0a3887c4 also: (gdb) p $x = $1035 $1050 = (struct sdata *) 0xa35a918 (gdb) x/100c $x->u.data 0xa35a91c: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 36 '$' 0 '\0' 0 '\0' 0 '\0' 0xa35a924: 36 '$' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0xa35a92c: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0xa35a934: 25 '\031' 9 '\t' 0 '\0' 0 '\0' 56 '8' -85 '' 68 'D' 10 '\n' 0xa35a93c: -128 '\200' -30 ' 18 '\022' 66 'B' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0xa35a944: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 44 ',' 32 ' ' 119 'w' 104 'h' 0xa35a94c: 105 'i' 99 'c' 104 'h' 32 ' ' 105 'i' 115 's' 32 ' ' 115 's' 0xa35a954: 101 'e' 110 'n' 116 't' 32 ' ' 116 't' 111 'o' 32 ' ' 116 't' 0xa35a95c: 104 'h' 101 'e' 32 ' ' 115 's' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0xa35a964: 0 '\0' 0 '\0' 0 '\0' 0 '\0' -1 ' -1 ' -1 ' -1 ' 0xa35a96c: 1 '\001' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0xa35a974: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0xa35a97c: 111 'o' 110 'n' 115 's' 101 'e' In the middle of all this is the string "which is sent to the s", which probably isn't helpful for debugging, but it does sound kind of like an important clue from some bad mystery novel. Anyway... a lot of data here. I don't know if any of it is at all helpful. Please advise on where I might go from here. One question: I see in alloc.c that there is code ifdefed with GC_CHECK_STRING_BYTES. Presumably defining this symbol enables additional checks during garbage collection (how *did* I figure that out?? :-). Would it be helpful for me to compile a version with this flag set, given that the crash does happen with some regularity? Is an emacs compiled with this symbol defined practical to use? On last bit: I'm afraid that I don't have any netnews access at the moment, so I cannot read the emacs bug newsgroup. Please respond by email to mlm@timesten.com. Thanks, - Mark