From: Mark McAuliffe <mlm@timesten.com>
Cc: Mark McAuliffe <mlm@timesten.com>, bug-gnu-emacs@gnu.org
Subject: Re: core dump triggered by garbage collection (?)
Date: Fri, 5 Sep 2003 00:32:48 -0700 [thread overview]
Message-ID: <16216.15392.176843.420150@oscar.mv.timesten.com> (raw)
In-Reply-To: <E19teL5-000161-Rr@fencepost.gnu.org>
Richard Stallman writes:
> #19 0x0810f00e in lisp_free (block=0x8be41b0) at alloc.c:630
> #20 0x081130dc in gc_sweep () at alloc.c:5270
>
> To learn something this crash, it is necessary to analyze the data
> being operated on in those two frames, and try to figure out what was
> inconsistent in the data (and what the data were being used for).
> Knowing that, we might be able to figure out the code that created
> the invalid data.
>
> This is not easy, but I don't know of any substitute for it.
I've spent some time looking into this. I don't know that I have found
anything of value, but here is what I've got so far...
For starters, I have had 2 more crashes since I reported the bug
originally, so I now have 4 core files worth of info. There appear to be 2
types of crash -- presumably the same underlying problem, but 2 different
manifestations. In one type, it appears that corrupt data are being found
in compact_small_strings. 3 of the 4 core files are like this. The other
type finds the corrupt data in gc_sweep. This latter type is the one you
specifically asked about above, but it is also the one I have had less luck
analyzing (no luck at all, in fact). I'm hoping that what I have been able
to learn about the former type will be helpful. If it's not, perhaps you
could help steer me in the right direction for the latter one.
For "type 1" core files, I wrote a gdb user-defined procedure that can
traverse the linked list in compact_small_strings (the inner one, that
starts with "for (from = &b->first_data; from < end; from = from_end)".
FWIW, it looks like this:
define a
if ( $from < end )
if ( $from->string == 0 )
set $n = $from->u.nbytes
else
set $s = $from->string
p $s
p *$s
p ((char*)($s->data)) - ((char*)&($from->u.data))
if ( $s->size_byte < 0 )
set $n = $s->size
else
set $n = $s->size_byte
end
end
p $nb = ( $n + 8 ) & ~3
p $from = (struct sdata *)((char*)$from + $nb)
p *$from
end
end
I initialized $from to be &b->first_data, as in the for-loop, and ran
procedure "a" to traverse the list of struct sdata's until it ran into
corruption. I did this for the 3 core files that have the problem in
compact_small_strings, and I found that the data that appeared right before
the corruption were similar. Below is the last few iterations from each
core file:
core.17451:
$104 = (struct Lisp_String *) 0x8b6d9c4
$105 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc80 " 5"}
$106 = 2875744
$107 = 8
$108 = (struct sdata *) 0x8dcdb24
$109 = {string = 0x8b6d994, u = {data = "6", nbytes = 1667432502}}
(gdb)
$110 = (struct Lisp_String *) 0x8b6d994
$111 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc88 "6"}
$112 = 2875744
$113 = 8
$114 = (struct sdata *) 0x8dcdb2c
$115 = {string = 0x8b6d934, u = {data = " ", nbytes = 3547168}}
(gdb)
$116 = (struct Lisp_String *) 0x8b6d934
$117 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc90 " 6"}
$118 = 2875744
$119 = 8
$120 = (struct sdata *) 0x8dcdb34
$121 = {string = 0x8b6d924, u = {data = "7", nbytes = 538968119}}
(gdb)
$122 = (struct Lisp_String *) 0x8b6d924
$123 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc98 "7"}
$124 = 2875744
$125 = 8
$126 = (struct sdata *) 0x8dcdb3c
$127 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}}
(gdb)
$128 = (struct Lisp_String *) 0x8b6d914
$129 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bca0 ""}
$130 = 2875744
$131 = 8
$132 = (struct sdata *) 0x8dcdb44
$133 = {string = 0x20202020, u = {data = "m", nbytes = 1919115629}}
(gdb)
$134 = (struct Lisp_String *) 0x20202020
Cannot access memory at address 0x20202020
(gdb)
core.24594
$269 = (struct Lisp_String *) 0x9c1c50c
$270 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed40 "18"}
$271 = -4418724
$272 = 8
$273 = (struct sdata *) 0xa0559e8
$274 = {string = 0x9c1c4ec, u = {data = " ", nbytes = 3682592}}
(gdb)
$275 = (struct Lisp_String *) 0x9c1c4ec
$276 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed48 " 18"}
$277 = -4418724
$278 = 8
$279 = (struct sdata *) 0xa0559f0
$280 = {string = 0x9c1c4dc, u = {data = "1", nbytes = 14641}}
(gdb)
$281 = (struct Lisp_String *) 0x9c1c4dc
$282 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed50 "19"}
$283 = -4418724
$284 = 8
$285 = (struct sdata *) 0xa0559f8
$286 = {string = 0x9c1c4bc, u = {data = " ", nbytes = 3748128}}
(gdb)
$287 = (struct Lisp_String *) 0x9c1c4bc
$288 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed58 " 19"}
$289 = -4418724
$290 = 8
$291 = (struct sdata *) 0xa055a00
$292 = {string = 0x9c1a494, u = {data = "", nbytes = 0}}
(gdb)
$293 = (struct Lisp_String *) 0x9c1a494
$294 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed60 ""}
$295 = -4418724
$296 = 8
$297 = (struct sdata *) 0xa055a08
$298 = {string = 0x43c143, u = {data = "8", nbytes = 1240629304}}
(gdb)
$299 = (struct Lisp_String *) 0x43c143
Cannot access memory at address 0x43c143
core.25897
$1007 = (struct Lisp_String *) 0x9fb29c4
$1008 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc7c "7"}
$1009 = 74632
$1010 = 8
$1011 = (struct sdata *) 0xa35a8f8
$1012 = {string = 0x9fb2964, u = {data = " ", nbytes = 3612704}}
(gdb)
$1013 = (struct Lisp_String *) 0x9fb2964
$1014 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc84 " 7"}
$1015 = 74632
$1016 = 8
$1017 = (struct sdata *) 0xa35a900
$1018 = {string = 0x9fb2944, u = {data = "8", nbytes = 56}}
(gdb)
$1019 = (struct Lisp_String *) 0x9fb2944
$1020 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc8c "8"}
$1021 = 74632
$1022 = 8
$1023 = (struct sdata *) 0xa35a908
$1024 = {string = 0x9fb2924, u = {data = " ", nbytes = 3678240}}
(gdb)
$1025 = (struct Lisp_String *) 0x9fb2924
$1026 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc94 " 8"}
$1027 = 74632
$1028 = 8
$1029 = (struct sdata *) 0xa35a910
$1030 = {string = 0xa388b24, u = {data = "9", nbytes = 57}}
(gdb)
$1031 = (struct Lisp_String *) 0xa388b24
$1032 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc9c "9"}
$1033 = 74632
$1034 = 8
$1035 = (struct sdata *) 0xa35a918
$1036 = {string = 0xa388ae4, u = {data = "", nbytes = 0}}
(gdb)
$1037 = (struct Lisp_String *) 0xa388ae4
$1038 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cca4 ""}
$1039 = 74632
$1040 = 8
$1041 = (struct sdata *) 0xa35a920
$1042 = {string = 0x24, u = {data = "$", nbytes = 36}}
(gdb)
$1043 = (struct Lisp_String *) 0x24
Cannot access memory at address 0x24
(gdb)
In all three cases, the strings that appear before the corruption are
numbers. Since the crash always seems to happen when I try to read mail
with VM, I assume those numbers are the message numbers in the VM summary
buffer. Significant? Helpful?? I dunno...
I also tried to figure out what the data was that overwrote the list data
for tthe 3 core files:
core.17451
The gdb snippet below picks up right after the above snippet for
core.17451. The overwriting data appears to be basically text (a compiled
lisp macro?):
(gdb) p $x = $126
$135 = (struct sdata *) 0x8dcdb3c
(gdb) p *$x
$136 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}}
(gdb) set print null-stop o
Display all 117 possibilities? (y or n)
(gdb) set print null-stop off
(gdb) p $x->u.data
$137 = ""
(gdb) p $x->u.data[0]@20
$138 = "\0 macro %\b%_\b_"
(gdb) p $x->u.data[0]@100
$139 = "\0 macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be created in\n"
(gdb) p $x->u.data[0]@200
$140 = "\0 macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be created in\n", ' ' <repeats 14 times>, "the directory named by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n -\b--\b-p\bpr\bre\bef\bfi\bix\b"
(gdb) p $x->u.data[0]@400
$141 = "\0 macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be created in\n", ' ' <repeats 14 times>, "the directory named by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n -\b--\b-p\bpr\bre\bef\bfi\bix\b\0 _\b"...
(I hope that stuff survives being emailed...).
core.24594
This gdb snippet more-or-less picks up where the above 24594 snippet left
off, with some editing:
(gdb) p $x = $267
$306 = (struct sdata *) 0xa0559e0
(gdb) x/100 $x->u.data
0xa0559e4: 0x49003831 0x09c1c4ec 0x00383120 0x09c1c4dc
0xa0559f4: 0x00003931 0x09c1c4bc 0x00393120 0x09c1a494
0xa055a04: 0x00000000 0x0043c143 0x49f28038 0x00000006
0xa055a14: 0x40000000 0x00000032 0x0043c144 0x49f28038
0xa055a24: 0x00000006 0x40000000 0x00000032 0x0043c145
0xa055a34: 0x49f28038 0x00000006 0x40000000 0x0000002e
0xa055a44: 0x0043c146 0x49f28038 0x00000006 0x40000000
0xa055a54: 0x0000002e 0x00000000 0x00000000 0x00000006
0xa055a64: 0x40000000 0x00000020 0x00005480 0x489f3ce0
0xa055a74: 0x00000006 0x40000004 0x0000002f 0x00005481
0xa055a84: 0x489f3ce0 0x00000006 0x40000004 0x00000077
0xa055a94: 0x00005482 0x489f3ce0 0x00000006 0x40000004
0xa055aa4: 0x0000006f 0x00005483 0x489f3ce0 0x00000006
0xa055ab4: 0x40000004 0x00000072 0x00005484 0x489f3ce0
0xa055ac4: 0x00000006 0x40000004 0x00000000 0x09c1a494
0xa055ad4: 0x48003032 0x09c1a454 0x00303220 0x09c1a424
0xa055ae4: 0x00003132 0x09c1a414 0x00313220 0x09c1a404
0xa055af4: 0x00003232 0x09c1a3f4 0x00323220 0x09c1a3e4
0xa055b04: 0x40003332 0x09c1a3d4 0x00333220 0x09c1a3c4
0xa055b14: 0x00003432 0x09c1a3b4 0x00343220 0x09c1a3a4
0xa055b24: 0x48003532 0x09c1a394 0x00353220 0x09c1a384
0xa055b34: 0x00003632 0x09c1a374 0x00363220 0x09c1a354
0xa055b44: 0x00003732 0x09c1a344 0x00373220 0x09c1a334
0xa055b54: 0x40003832 0x09c1a324 0x00383220 0x09c1a314
0xa055b64: 0x00003932 0x09c1a304 0x00393220 0x09c1a2f4
The first two lines are the tail end of the good data. The third line is
where things get messed up. The corruption data seems to have some pattern
to it, but I have no idea what it might be.
core.25897
This gdb snippet picks up more or less where the above 25897 snippet leaves
off (with some editing). The corruption data for this core file seems to
have some regularity too:
(gdb) p $x = $1005
$1049 = (struct sdata *) 0xa35a8f0
(gdb) x/100 $x->u.data
0xa35a8f4: 0x00000037 0x09fb2964 0x00372020 0x09fb2944
0xa35a904: 0x00000038 0x09fb2924 0x00382020 0x0a388b24
0xa35a914: 0x00000039 0x0a388ae4 0x00000000 0x00000024
0xa35a924: 0x00000024 0x00000000 0x00000000 0x00000000
0xa35a934: 0x00000919 0x0a44ab38 0x4212e280 0x00000000
0xa35a944: 0x00000000 0x6877202c 0x20686369 0x73207369
0xa35a954: 0x20746e65 0x74206f74 0x73206568 0x00000000
0xa35a964: 0x00000000 0xffffffff 0x00000001 0x00000000
0xa35a974: 0x00000000 0x00000000 0x65736e6f 0x1826d17c
0xa35a984: 0x1826d17c 0x1826d17c 0x394b1aec 0x1826d17c
0xa35a994: 0x00000000 0x1826d17c 0x1826d17c 0x286e23dc
0xa35a9a4: 0x1826d17c 0x1826d26c 0x38273a14 0x582cd6ac
0xa35a9b4: 0x1826d17c 0x1826d17c 0x4828bf50 0x48277028
0xa35a9c4: 0x48277668 0x1826d1ac 0x00000008 0x00000046
0xa35a9d4: 0x00000000 0x1826d17c 0x1826d17c 0x48277e98
0xa35a9e4: 0x48365800 0x0a388ae4 0x00392020 0x0a388aa4
0xa35a9f4: 0x18003031 0x0a388a74 0x00303120 0x0a388a24
0xa35aa04: 0x18003131 0x0a388a04 0x00313120 0x0a3889f4
0xa35aa14: 0x18003231 0x0a3889c4 0x00323120 0x0a3889b4
0xa35aa24: 0x18003331 0x0a388994 0x00333120 0x0a388984
0xa35aa34: 0x18003431 0x0a388964 0x00343120 0x0a3888e4
0xa35aa44: 0x18003531 0x0a3888c4 0x00353120 0x0a3888b4
0xa35aa54: 0x00003631 0x0a388894 0x00363120 0x0a388834
0xa35aa64: 0x18003731 0x0a388824 0x00373120 0x0a3887f4
0xa35aa74: 0x18003831 0x0a3887d4 0x00383120 0x0a3887c4
also:
(gdb) p $x = $1035
$1050 = (struct sdata *) 0xa35a918
(gdb) x/100c $x->u.data
0xa35a91c: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 36 '$' 0 '\0' 0 '\0' 0
'\0'
0xa35a924: 36 '$' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0
'\0'
0xa35a92c: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0
'\0'
0xa35a934: 25 '\031' 9 '\t' 0 '\0' 0 '\0' 56 '8' -85 '' 68
'D' 10 '\n'
0xa35a93c: -128 '\200' -30 ' 18 '\022' 66 'B' 0 '\0' 0
'\0' 0 '\0' 0 '\0'
0xa35a944: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 44 ',' 32 ' ' 119 'w' 104
'h'
0xa35a94c: 105 'i' 99 'c' 104 'h' 32 ' ' 105 'i' 115 's' 32 ' ' 115
's'
0xa35a954: 101 'e' 110 'n' 116 't' 32 ' ' 116 't' 111 'o' 32 ' ' 116
't'
0xa35a95c: 104 'h' 101 'e' 32 ' ' 115 's' 0 '\0' 0 '\0' 0 '\0' 0
'\0'
0xa35a964: 0 '\0' 0 '\0' 0 '\0' 0 '\0' -1 ' -1 ' -1 ' -1 '
0xa35a96c: 1 '\001' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0
'\0' 0 '\0'
0xa35a974: 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0 '\0' 0
'\0'
0xa35a97c: 111 'o' 110 'n' 115 's' 101 'e'
In the middle of all this is the string "which is sent to the s", which
probably isn't helpful for debugging, but it does sound kind of like an
important clue from some bad mystery novel.
Anyway... a lot of data here. I don't know if any of it is at all helpful.
Please advise on where I might go from here. One question: I see in
alloc.c that there is code ifdefed with GC_CHECK_STRING_BYTES. Presumably
defining this symbol enables additional checks during garbage collection
(how *did* I figure that out?? :-). Would it be helpful for me to compile
a version with this flag set, given that the crash does happen with some
regularity? Is an emacs compiled with this symbol defined practical to
use?
On last bit: I'm afraid that I don't have any netnews access at the moment,
so I cannot read the emacs bug newsgroup. Please respond by email to
mlm@timesten.com.
Thanks,
- Mark
next prev parent reply other threads:[~2003-09-05 7:32 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-08-28 17:16 core dump triggered by garbage collection (?) Mark McAuliffe
2003-09-01 2:22 ` Richard Stallman
2003-09-05 7:32 ` Mark McAuliffe [this message]
2003-09-07 20:23 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=16216.15392.176843.420150@oscar.mv.timesten.com \
--to=mlm@timesten.com \
--cc=bug-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).