unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* core dump triggered by garbage collection (?)
@ 2003-08-28 17:16 Mark McAuliffe
  2003-09-01  2:22 ` Richard Stallman
  0 siblings, 1 reply; 4+ messages in thread
From: Mark McAuliffe @ 2003-08-28 17:16 UTC (permalink / raw)


This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.3.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2003-07-11 on oscar
configured using `configure  --prefix=/usr/local.oscar/emacs-21.3'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.iso885915
  locale-coding-system: iso-latin-9
  default-enable-multibyte-characters: t

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

I have encountered an emacs core dump multiple times, usually, but not
always, when reading mail with VM.  Beyond that, I don't have a lot of
information on what triggers the bug -- it seems completely unpredictable.
I have two core files from two different instances of the crash.  They have
similar, but not identical, stacks.  In both cases, though, it looks like
emacs was doing gc at the time of the crash.

The first stack is:

======== stack from core.23404 ========

#0  0x42029331 in kill () from /lib/i686/libc.so.6
#1  0x080ced5c in fatal_error_signal (sig=6) at emacs.c:354
#2  <signal handler called>
#3  0x42029331 in kill () from /lib/i686/libc.so.6
#4  0x080ced7a in abort () at emacs.c:387
#5  0x08121bb4 in Fsignal (error_symbol=405365612, data=1478979276)
    at eval.c:1387
#6  0x0808afa9 in Fcheck_coding_system (coding_system=406034828)
    at coding.c:6224
#7  0x0808bb16 in code_convert_string_norecord (string=954907492, 
    coding_system=406034828, encodep=1) at coding.c:6650
#8  0x080f7f3b in Fwrite_region (start=1, end=4310720, filename=954907492, 
    append=405197180, visit=405293428, lockname=954907492, mustbenew=405197180)
    at fileio.c:4787
#9  0x080f8ce3 in auto_save_1 () at fileio.c:5447
#10 0x0812197e in internal_condition_case (bfun=0x80f8c60 <auto_save_1>, 
    handlers=405197228, hfun=0x80f8bac <auto_save_error>) at eval.c:1267
#11 0x080f90fc in Fdo_auto_save (no_message=405197228, current_only=405197180)
    at fileio.c:5634
#12 0x080d030e in shut_down_emacs (sig=11, no_x=0, stuff=405197180)
    at emacs.c:1883
#13 0x080ced09 in fatal_error_signal (sig=11) at emacs.c:341
#14 <signal handler called>
#15 0x4207afcc in chunk_free () from /lib/i686/libc.so.6
#16 0x4207ad24 in free () from /lib/i686/libc.so.6
#17 0x0810f087 in emacs_blocked_free (ptr=0x8be41b0) at alloc.c:688
#18 0x4207ac9d in free () from /lib/i686/libc.so.6
#19 0x0810f00e in lisp_free (block=0x8be41b0) at alloc.c:630
#20 0x081130dc in gc_sweep () at alloc.c:5270
#21 0x08111da6 in Fgarbage_collect () at alloc.c:4194
#22 0x0812324a in Ffuncall (nargs=3, args=0xbfffec70) at eval.c:2599
#23 0x0814960c in Fbyte_code (bytestr=945031748, vector=1213476688, maxdepth=3)
    at bytecode.c:716
#24 0x081239d2 in funcall_lambda (fun=1213476840, nargs=0, 
    arg_vector=0xbfffed84) at eval.c:2851
#25 0x081234e8 in Ffuncall (nargs=1, args=0xbfffed80) at eval.c:2707
#26 0x0814960c in Fbyte_code (bytestr=945473388, vector=1213929776, maxdepth=6)
    at bytecode.c:716
#27 0x081239d2 in funcall_lambda (fun=1213808272, nargs=0, 
    arg_vector=0xbfffeea4) at eval.c:2851
#28 0x081234e8 in Ffuncall (nargs=1, args=0xbfffeea0) at eval.c:2707
#29 0x0814960c in Fbyte_code (bytestr=945473244, vector=1213929512, maxdepth=5)
    at bytecode.c:716
#30 0x081239d2 in funcall_lambda (fun=1213813456, nargs=0, 
    arg_vector=0xbfffefc4) at eval.c:2851
#31 0x081234e8 in Ffuncall (nargs=1, args=0xbfffefc0) at eval.c:2707
#32 0x0814960c in Fbyte_code (bytestr=945383564, vector=1213902360, maxdepth=6)
    at bytecode.c:716
#33 0x081239d2 in funcall_lambda (fun=1213839464, nargs=3, 
    arg_vector=0xbffff114) at eval.c:2851
#34 0x081234e8 in Ffuncall (nargs=4, args=0xbffff110) at eval.c:2707
#35 0x0812023f in Fcall_interactively (function=408098700, 
    record_flag=405197180, keys=1210562296) at callint.c:797
#36 0x080dae16 in Fcommand_execute (cmd=-1739384948, record_flag=405197180, 
    keys=405197180, special=405197180) at keyboard.c:9250
#37 0x080d1eb8 in command_loop_1 () at keyboard.c:1661
#38 0x0812197e in internal_condition_case (bfun=0x80d1340 <command_loop_1>, 
    handlers=405293524, hfun=0x80d0f74 <cmd_error>) at eval.c:1267
#39 0x080d1216 in command_loop_2 () at keyboard.c:1245
#40 0x0812153f in internal_catch (tag=405255108, 
    func=0x80d11f8 <command_loop_2>, arg=405197180) at eval.c:1030
#41 0x080d11a4 in command_loop () at keyboard.c:1224
#42 0x080d0d56 in recursive_edit_1 () at keyboard.c:950
#43 0x080d0e6f in Frecursive_edit () at keyboard.c:1006
#44 0x080cfe3f in main (argc=1, argv=0xbffff8a4, envp=0xbffff8ac)
    at emacs.c:1547
#45 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

======== end stack from core.23404 ========


The other stack is:

======== stack from core.24594 ========

#0  0x42029331 in kill () from /lib/i686/libc.so.6
#1  0x080ced5c in fatal_error_signal (sig=6) at emacs.c:354
#2  <signal handler called>
#3  0x42029331 in kill () from /lib/i686/libc.so.6
#4  0x080ced7a in abort () at emacs.c:387
#5  0x08121bb4 in Fsignal (error_symbol=405293572, data=1509656020)
    at eval.c:1387
#6  0x0811360a in wrong_type_argument (predicate=405293980, value=405365444)
    at data.c:119
#7  0x0812984a in Fplist_get (plist=-668454820, prop=405365444) at fns.c:1884
#8  0x0808ae4b in Fcoding_system_p (obj=406034828) at coding.c:6175
#9  0x0808af79 in Fcheck_coding_system (coding_system=406034828)
    at coding.c:6221
#10 0x0808bb16 in code_convert_string_norecord (string=965578404, 
    coding_system=406034828, encodep=1) at coding.c:6650
#11 0x080f7f3b in Fwrite_region (start=1, end=4587138, filename=965578404, 
    append=405197180, visit=405293428, lockname=965578404, mustbenew=405197180)
    at fileio.c:4787
#12 0x080f8ce3 in auto_save_1 () at fileio.c:5447
#13 0x0812197e in internal_condition_case (bfun=0x80f8c60 <auto_save_1>, 
    handlers=405197228, hfun=0x80f8bac <auto_save_error>) at eval.c:1267
#14 0x080f90fc in Fdo_auto_save (no_message=405197228, current_only=405197180)
    at fileio.c:5634
#15 0x080d030e in shut_down_emacs (sig=11, no_x=0, stuff=405197180)
    at emacs.c:1883
#16 0x080ced09 in fatal_error_signal (sig=11) at emacs.c:341
#17 <signal handler called>
#18 compact_small_strings () at alloc.c:1610
#19 0x08112b27 in gc_sweep () at alloc.c:4928
#20 0x08111da6 in Fgarbage_collect () at alloc.c:4194
#21 0x081493e5 in Fbyte_code (bytestr=944646156, vector=1213282216, maxdepth=9)
    at bytecode.c:542
#22 0x081239d2 in funcall_lambda (fun=1213086368, nargs=3, 
    arg_vector=0xbfffccc4) at eval.c:2851
#23 0x081234e8 in Ffuncall (nargs=4, args=0xbfffccc0) at eval.c:2707
#24 0x0814960c in Fbyte_code (bytestr=944655732, vector=1214625952, maxdepth=6)
    at bytecode.c:716
#25 0x081239d2 in funcall_lambda (fun=1213069168, nargs=3, 
    arg_vector=0xbfffcde4) at eval.c:2851
#26 0x081234e8 in Ffuncall (nargs=4, args=0xbfffcde0) at eval.c:2707
#27 0x0814960c in Fbyte_code (bytestr=944637572, vector=1214805168, maxdepth=4)
    at bytecode.c:716
#28 0x081239d2 in funcall_lambda (fun=1215482968, nargs=2, 
    arg_vector=0xbfffcfe8) at eval.c:2851
#29 0x081234e8 in Ffuncall (nargs=3, args=0xbfffcfe4) at eval.c:2707
#30 0x08122fb2 in run_hook_with_args (nargs=3, args=0xbfffcfe4, 
    cond=to_completion) at eval.c:2330
#31 0x08122e53 in Frun_hook_with_args (nargs=3, args=0xbfffcfe4) at eval.c:2223
#32 0x0812338a in Ffuncall (nargs=4, args=0xbfffcfe0) at eval.c:2640
#33 0x0814960c in Fbyte_code (bytestr=947271404, vector=1214773944, maxdepth=8)
    at bytecode.c:716
#34 0x081239d2 in funcall_lambda (fun=1214774128, nargs=2, 
    arg_vector=0xbfffd104) at eval.c:2851
#35 0x081234e8 in Ffuncall (nargs=3, args=0xbfffd100) at eval.c:2707
#36 0x0814960c in Fbyte_code (bytestr=947271484, vector=1215069808, maxdepth=7)
    at bytecode.c:716
#37 0x081239d2 in funcall_lambda (fun=1214645336, nargs=0, 
    arg_vector=0xbfffd2e8) at eval.c:2851
#38 0x081234e8 in Ffuncall (nargs=1, args=0xbfffd2e4) at eval.c:2707
#39 0x08122cb4 in Fapply (nargs=2, args=0xbfffd2e4) at eval.c:2114
#40 0x0812338a in Ffuncall (nargs=3, args=0xbfffd2e0) at eval.c:2640
#41 0x0814960c in Fbyte_code (bytestr=941671720, vector=1210107212, maxdepth=4)
    at bytecode.c:716
#42 0x08122a55 in Feval (form=1478542616) at eval.c:2019
#43 0x08121895 in Fcondition_case (args=1509879652) at eval.c:1211
#44 0x08149aca in Fbyte_code (bytestr=941671480, vector=1210107060, maxdepth=5)
    at bytecode.c:898
#45 0x081239d2 in funcall_lambda (fun=1210106900, nargs=1, 
    arg_vector=0xbfffd674) at eval.c:2851
#46 0x081234e8 in Ffuncall (nargs=2, args=0xbfffd670) at eval.c:2707
#47 0x0812316a in call1 (fn=-1742231988, arg1=-934492280) at eval.c:2456
#48 0x080d509a in timer_check (do_it_now=1) at keyboard.c:4103
#49 0x080d3fca in readable_events (do_timers_now=1) at keyboard.c:3181
#50 0x080d6c3f in get_input_pending (addr=0x8265624, do_timers_now=1)
    at keyboard.c:6052
#51 0x080db376 in detect_input_pending_run_timers (do_display=1)
    at keyboard.c:9473
#52 0x0814d64e in wait_reading_process_input (time_limit=30, microsecs=0, 
    read_kbd=268435455, do_display=1) at process.c:2687
#53 0x08056c3a in sit_for (sec=30, usec=0, reading=1, display=1, 
    initial_display=0) at dispnew.c:6240
#54 0x080d2fb8 in read_char (commandflag=1, nmaps=2, maps=0xbfffdb20, 
    prev_event=405197180, used_mouse_menu=0xbfffdb58) at keyboard.c:2518
#55 0x080d95c8 in read_key_sequence (keybuf=0xbfffdc80, bufsize=30, 
    prompt=405197180, dont_downcase_last=0, can_return_switch_frame=1, 
    fix_current_buffer=1) at keyboard.c:8209
#56 0x080d163f in command_loop_1 () at keyboard.c:1451
#57 0x0812197e in internal_condition_case (bfun=0x80d1340 <command_loop_1>, 
    handlers=405293524, hfun=0x80d0f74 <cmd_error>) at eval.c:1267
#58 0x080d1216 in command_loop_2 () at keyboard.c:1245
#59 0x0812153f in internal_catch (tag=405255108, 
    func=0x80d11f8 <command_loop_2>, arg=405197180) at eval.c:1030
#60 0x080d11a4 in command_loop () at keyboard.c:1224
#61 0x080d0d56 in recursive_edit_1 () at keyboard.c:950
#62 0x080d0e6f in Frecursive_edit () at keyboard.c:1006
#63 0x080cfe3f in main (argc=1, argv=0xbfffe244, envp=0xbfffe24c)
    at emacs.c:1547
#64 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

======== end stack from core.24594 ========


I will hold onto the core files in case anyone would like to see them.  I'm
sorry I cannot be more specific about what triggers the bug, but hopefully
the core files contain enough info to figure out what causes this.

- Mark


Recent input:
o g W r i t e <return> <S-right> <down> <down> <down> 
<down> <down> <down> <down> <down> <M-right> <right> 
<right> <right> M-. <return> <S-right> <S-right> <S-right> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <right> <right> <right> <right> M-. <return> 
M-x e m a c s - v e r <tab> <return> M-x e m a c s 
- b <tab> <M-backspace> <M-backspace> r e p <tab> <tab> 
o <tab> r t <tab> <return>

Recent messages:
Loading cc-mode...done
Loading font-lock...
Loading regexp-opt...done
Loading font-lock...done
Mark set [4 times]
GNU Emacs 21.3.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2003-07-11 on oscar
Making completion list...
Loading view...done
Making completion list...
Loading emacsbug...done

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: core dump triggered by garbage collection (?)
  2003-08-28 17:16 core dump triggered by garbage collection (?) Mark McAuliffe
@ 2003-09-01  2:22 ` Richard Stallman
  2003-09-05  7:32   ` Mark McAuliffe
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Stallman @ 2003-09-01  2:22 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    #19 0x0810f00e in lisp_free (block=0x8be41b0) at alloc.c:630
    #20 0x081130dc in gc_sweep () at alloc.c:5270

To learn something this crash, it is necessary to analyze the data
being operated on in those two frames, and try to figure out what was
inconsistent in the data (and what the data were being used for).
Knowing that, we might be able to figure out the code that created
the invalid data.

This is not easy, but I don't know of any substitute for it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: core dump triggered by garbage collection (?)
  2003-09-01  2:22 ` Richard Stallman
@ 2003-09-05  7:32   ` Mark McAuliffe
  2003-09-07 20:23     ` Richard Stallman
  0 siblings, 1 reply; 4+ messages in thread
From: Mark McAuliffe @ 2003-09-05  7:32 UTC (permalink / raw)
  Cc: Mark McAuliffe, bug-gnu-emacs

Richard Stallman writes:
>     #19 0x0810f00e in lisp_free (block=0x8be41b0) at alloc.c:630
>     #20 0x081130dc in gc_sweep () at alloc.c:5270
> 
> To learn something this crash, it is necessary to analyze the data
> being operated on in those two frames, and try to figure out what was
> inconsistent in the data (and what the data were being used for).
> Knowing that, we might be able to figure out the code that created
> the invalid data.
> 
> This is not easy, but I don't know of any substitute for it.


I've spent some time looking into this.  I don't know that I have found
anything of value, but here is what I've got so far...

For starters, I have had 2 more crashes since I reported the bug
originally, so I now have 4 core files worth of info.  There appear to be 2
types of crash -- presumably the same underlying problem, but 2 different
manifestations.  In one type, it appears that corrupt data are being found
in compact_small_strings.  3 of the 4 core files are like this.  The other
type finds the corrupt data in gc_sweep.  This latter type is the one you
specifically asked about above, but it is also the one I have had less luck
analyzing (no luck at all, in fact).  I'm hoping that what I have been able
to learn about the former type will be helpful.  If it's not, perhaps you
could help steer me in the right direction for the latter one.

For "type 1" core files, I wrote a gdb user-defined procedure that can
traverse the linked list in compact_small_strings (the inner one, that
starts with "for (from = &b->first_data; from < end; from = from_end)".
FWIW, it looks like this:

define a
  if ( $from < end )
    if ( $from->string == 0 )
      set $n = $from->u.nbytes
    else
      set $s = $from->string
      p $s
      p *$s
      p ((char*)($s->data)) - ((char*)&($from->u.data))
      if ( $s->size_byte < 0 )
        set $n = $s->size
      else
        set $n = $s->size_byte
      end
    end
    p $nb = ( $n + 8 ) & ~3
    p $from = (struct sdata *)((char*)$from + $nb)
    p *$from
  end
end


I initialized $from to be &b->first_data, as in the for-loop, and ran
procedure "a" to traverse the list of struct sdata's until it ran into
corruption.  I did this for the 3 core files that have the problem in
compact_small_strings, and I found that the data that appeared right before
the corruption were similar.  Below is the last few iterations from each
core file:

core.17451:

$104 = (struct Lisp_String *) 0x8b6d9c4
$105 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc80 "  5"}
$106 = 2875744
$107 = 8
$108 = (struct sdata *) 0x8dcdb24
$109 = {string = 0x8b6d994, u = {data = "6", nbytes = 1667432502}}
(gdb) 
$110 = (struct Lisp_String *) 0x8b6d994
$111 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc88 "6"}
$112 = 2875744
$113 = 8
$114 = (struct sdata *) 0x8dcdb2c
$115 = {string = 0x8b6d934, u = {data = " ", nbytes = 3547168}}
(gdb) 
$116 = (struct Lisp_String *) 0x8b6d934
$117 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc90 "  6"}
$118 = 2875744
$119 = 8
$120 = (struct sdata *) 0x8dcdb34
$121 = {string = 0x8b6d924, u = {data = "7", nbytes = 538968119}}
(gdb) 
$122 = (struct Lisp_String *) 0x8b6d924
$123 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc98 "7"}
$124 = 2875744
$125 = 8
$126 = (struct sdata *) 0x8dcdb3c
$127 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}}
(gdb) 
$128 = (struct Lisp_String *) 0x8b6d914
$129 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bca0 ""}
$130 = 2875744
$131 = 8
$132 = (struct sdata *) 0x8dcdb44
$133 = {string = 0x20202020, u = {data = "m", nbytes = 1919115629}}
(gdb) 
$134 = (struct Lisp_String *) 0x20202020
Cannot access memory at address 0x20202020
(gdb) 


core.24594

$269 = (struct Lisp_String *) 0x9c1c50c
$270 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed40 "18"}
$271 = -4418724
$272 = 8
$273 = (struct sdata *) 0xa0559e8
$274 = {string = 0x9c1c4ec, u = {data = " ", nbytes = 3682592}}
(gdb) 
$275 = (struct Lisp_String *) 0x9c1c4ec
$276 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed48 " 18"}
$277 = -4418724
$278 = 8
$279 = (struct sdata *) 0xa0559f0
$280 = {string = 0x9c1c4dc, u = {data = "1", nbytes = 14641}}
(gdb) 
$281 = (struct Lisp_String *) 0x9c1c4dc
$282 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed50 "19"}
$283 = -4418724
$284 = 8
$285 = (struct sdata *) 0xa0559f8
$286 = {string = 0x9c1c4bc, u = {data = " ", nbytes = 3748128}}
(gdb) 
$287 = (struct Lisp_String *) 0x9c1c4bc
$288 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed58 " 19"}
$289 = -4418724
$290 = 8
$291 = (struct sdata *) 0xa055a00
$292 = {string = 0x9c1a494, u = {data = "", nbytes = 0}}
(gdb) 
$293 = (struct Lisp_String *) 0x9c1a494
$294 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed60 ""}
$295 = -4418724
$296 = 8
$297 = (struct sdata *) 0xa055a08
$298 = {string = 0x43c143, u = {data = "8", nbytes = 1240629304}}
(gdb) 
$299 = (struct Lisp_String *) 0x43c143
Cannot access memory at address 0x43c143


core.25897

$1007 = (struct Lisp_String *) 0x9fb29c4
$1008 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc7c "7"}
$1009 = 74632
$1010 = 8
$1011 = (struct sdata *) 0xa35a8f8
$1012 = {string = 0x9fb2964, u = {data = " ", nbytes = 3612704}}
(gdb) 
$1013 = (struct Lisp_String *) 0x9fb2964
$1014 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc84 "  7"}
$1015 = 74632
$1016 = 8
$1017 = (struct sdata *) 0xa35a900
$1018 = {string = 0x9fb2944, u = {data = "8", nbytes = 56}}
(gdb) 
$1019 = (struct Lisp_String *) 0x9fb2944
$1020 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc8c "8"}
$1021 = 74632
$1022 = 8
$1023 = (struct sdata *) 0xa35a908
$1024 = {string = 0x9fb2924, u = {data = " ", nbytes = 3678240}}
(gdb) 
$1025 = (struct Lisp_String *) 0x9fb2924
$1026 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc94 "  8"}
$1027 = 74632
$1028 = 8
$1029 = (struct sdata *) 0xa35a910
$1030 = {string = 0xa388b24, u = {data = "9", nbytes = 57}}
(gdb) 
$1031 = (struct Lisp_String *) 0xa388b24
$1032 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc9c "9"}
$1033 = 74632
$1034 = 8
$1035 = (struct sdata *) 0xa35a918
$1036 = {string = 0xa388ae4, u = {data = "", nbytes = 0}}
(gdb) 
$1037 = (struct Lisp_String *) 0xa388ae4
$1038 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cca4 ""}
$1039 = 74632
$1040 = 8
$1041 = (struct sdata *) 0xa35a920
$1042 = {string = 0x24, u = {data = "$", nbytes = 36}}
(gdb) 
$1043 = (struct Lisp_String *) 0x24
Cannot access memory at address 0x24
(gdb) 


In all three cases, the strings that appear before the corruption are
numbers.  Since the crash always seems to happen when I try to read mail
with VM, I assume those numbers are the message numbers in the VM summary
buffer.  Significant?  Helpful??  I dunno...


I also tried to figure out what the data was that overwrote the list data
for tthe 3 core files:


core.17451

The gdb snippet below picks up right after the above snippet for
core.17451.  The overwriting data appears to be basically text (a compiled
lisp macro?):

(gdb) p $x = $126
$135 = (struct sdata *) 0x8dcdb3c
(gdb) p *$x
$136 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}}
(gdb) set print null-stop o
Display all 117 possibilities? (y or n)
(gdb) set print null-stop off
(gdb) p $x->u.data
$137 = ""
(gdb) p $x->u.data[0]@20
$138 = "\0       macro %\b%_\b_"
(gdb) p $x->u.data[0]@100
$139 = "\0       macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be  created  in\n"
(gdb) p $x->u.data[0]@200
$140 = "\0       macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be  created  in\n", ' ' <repeats 14 times>, "the  directory  named  by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n       -\b--\b-p\bpr\bre\bef\bfi\bix\b"
(gdb) p $x->u.data[0]@400
$141 = "\0       macro %\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt and will be  created  in\n", ' ' <repeats 14 times>, "the  directory  named  by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n       -\b--\b-p\bpr\bre\bef\bfi\bix\b\0 _\b"...

(I hope that stuff survives being emailed...).


core.24594

This gdb snippet more-or-less picks up where the above 24594 snippet left
off, with some editing:

(gdb) p $x = $267
$306 = (struct sdata *) 0xa0559e0
(gdb) x/100 $x->u.data
0xa0559e4:      0x49003831      0x09c1c4ec      0x00383120      0x09c1c4dc
0xa0559f4:      0x00003931      0x09c1c4bc      0x00393120      0x09c1a494
0xa055a04:      0x00000000      0x0043c143      0x49f28038      0x00000006
0xa055a14:      0x40000000      0x00000032      0x0043c144      0x49f28038
0xa055a24:      0x00000006      0x40000000      0x00000032      0x0043c145
0xa055a34:      0x49f28038      0x00000006      0x40000000      0x0000002e
0xa055a44:      0x0043c146      0x49f28038      0x00000006      0x40000000
0xa055a54:      0x0000002e      0x00000000      0x00000000      0x00000006
0xa055a64:      0x40000000      0x00000020      0x00005480      0x489f3ce0
0xa055a74:      0x00000006      0x40000004      0x0000002f      0x00005481
0xa055a84:      0x489f3ce0      0x00000006      0x40000004      0x00000077
0xa055a94:      0x00005482      0x489f3ce0      0x00000006      0x40000004
0xa055aa4:      0x0000006f      0x00005483      0x489f3ce0      0x00000006
0xa055ab4:      0x40000004      0x00000072      0x00005484      0x489f3ce0
0xa055ac4:      0x00000006      0x40000004      0x00000000      0x09c1a494
0xa055ad4:      0x48003032      0x09c1a454      0x00303220      0x09c1a424
0xa055ae4:      0x00003132      0x09c1a414      0x00313220      0x09c1a404
0xa055af4:      0x00003232      0x09c1a3f4      0x00323220      0x09c1a3e4
0xa055b04:      0x40003332      0x09c1a3d4      0x00333220      0x09c1a3c4
0xa055b14:      0x00003432      0x09c1a3b4      0x00343220      0x09c1a3a4
0xa055b24:      0x48003532      0x09c1a394      0x00353220      0x09c1a384
0xa055b34:      0x00003632      0x09c1a374      0x00363220      0x09c1a354
0xa055b44:      0x00003732      0x09c1a344      0x00373220      0x09c1a334
0xa055b54:      0x40003832      0x09c1a324      0x00383220      0x09c1a314
0xa055b64:      0x00003932      0x09c1a304      0x00393220      0x09c1a2f4

The first two lines are the tail end of the good data.  The third line is
where things get messed up.  The corruption data seems to have some pattern
to it, but I have no idea what it might be.


core.25897

This gdb snippet picks up more or less where the above 25897 snippet leaves
off (with some editing).  The corruption data for this core file seems to
have some regularity too:

(gdb) p $x = $1005
$1049 = (struct sdata *) 0xa35a8f0
(gdb) x/100 $x->u.data
0xa35a8f4:      0x00000037      0x09fb2964      0x00372020      0x09fb2944
0xa35a904:      0x00000038      0x09fb2924      0x00382020      0x0a388b24
0xa35a914:      0x00000039      0x0a388ae4      0x00000000      0x00000024
0xa35a924:      0x00000024      0x00000000      0x00000000      0x00000000
0xa35a934:      0x00000919      0x0a44ab38      0x4212e280      0x00000000
0xa35a944:      0x00000000      0x6877202c      0x20686369      0x73207369
0xa35a954:      0x20746e65      0x74206f74      0x73206568      0x00000000
0xa35a964:      0x00000000      0xffffffff      0x00000001      0x00000000
0xa35a974:      0x00000000      0x00000000      0x65736e6f      0x1826d17c
0xa35a984:      0x1826d17c      0x1826d17c      0x394b1aec      0x1826d17c
0xa35a994:      0x00000000      0x1826d17c      0x1826d17c      0x286e23dc
0xa35a9a4:      0x1826d17c      0x1826d26c      0x38273a14      0x582cd6ac
0xa35a9b4:      0x1826d17c      0x1826d17c      0x4828bf50      0x48277028
0xa35a9c4:      0x48277668      0x1826d1ac      0x00000008      0x00000046
0xa35a9d4:      0x00000000      0x1826d17c      0x1826d17c      0x48277e98
0xa35a9e4:      0x48365800      0x0a388ae4      0x00392020      0x0a388aa4
0xa35a9f4:      0x18003031      0x0a388a74      0x00303120      0x0a388a24
0xa35aa04:      0x18003131      0x0a388a04      0x00313120      0x0a3889f4
0xa35aa14:      0x18003231      0x0a3889c4      0x00323120      0x0a3889b4
0xa35aa24:      0x18003331      0x0a388994      0x00333120      0x0a388984
0xa35aa34:      0x18003431      0x0a388964      0x00343120      0x0a3888e4
0xa35aa44:      0x18003531      0x0a3888c4      0x00353120      0x0a3888b4
0xa35aa54:      0x00003631      0x0a388894      0x00363120      0x0a388834
0xa35aa64:      0x18003731      0x0a388824      0x00373120      0x0a3887f4
0xa35aa74:      0x18003831      0x0a3887d4      0x00383120      0x0a3887c4

also:

(gdb) p $x = $1035
$1050 = (struct sdata *) 0xa35a918
(gdb) x/100c $x->u.data
0xa35a91c:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  36 '$'  0 '\0'  0 '\0'  0
'\0'
0xa35a924:      36 '$'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a92c:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a934:      25 '\031'       9 '\t'  0 '\0'  0 '\0'  56 '8'  -85 '' 68
'D' 10 '\n'
0xa35a93c:      -128 '\200'     -30 ' 18 '\022'       66 'B'  0 '\0'  0
'\0' 0 '\0'   0 '\0'
0xa35a944:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  44 ','  32 ' '  119 'w' 104
'h'
0xa35a94c:      105 'i' 99 'c'  104 'h' 32 ' '  105 'i' 115 's' 32 ' '  115
's'
0xa35a954:      101 'e' 110 'n' 116 't' 32 ' '  116 't' 111 'o' 32 ' '  116
't'
0xa35a95c:      104 'h' 101 'e' 32 ' '  115 's' 0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a964:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  -1 '  -1 '  -1 '  -1 '
0xa35a96c:      1 '\001'        0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0' 0 '\0'
0xa35a974:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a97c:      111 'o' 110 'n' 115 's' 101 'e'

In the middle of all this is the string "which is sent to the s", which
probably isn't helpful for debugging, but it does sound kind of like an
important clue from some bad mystery novel.



Anyway... a lot of data here.  I don't know if any of it is at all helpful.
Please advise on where I might go from here.  One question: I see in
alloc.c that there is code ifdefed with GC_CHECK_STRING_BYTES.  Presumably
defining this symbol enables additional checks during garbage collection
(how *did* I figure that out?? :-).  Would it be helpful for me to compile
a version with this flag set, given that the crash does happen with some
regularity?  Is an emacs compiled with this symbol defined practical to
use?

On last bit: I'm afraid that I don't have any netnews access at the moment,
so I cannot read the emacs bug newsgroup.  Please respond by email to
mlm@timesten.com.

Thanks,
- Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: core dump triggered by garbage collection (?)
  2003-09-05  7:32   ` Mark McAuliffe
@ 2003-09-07 20:23     ` Richard Stallman
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Stallman @ 2003-09-07 20:23 UTC (permalink / raw)
  Cc: mlm, bug-gnu-emacs

    In all three cases, the strings that appear before the corruption are
    numbers.  Since the crash always seems to happen when I try to read mail
    with VM, I assume those numbers are the message numbers in the VM summary
    buffer.  Significant?  Helpful??  I dunno...

It won't be easy to figure out the bug from this clue, but it is worth
a try.

Maybe the string that VM makes just after it makes the number
is getting clobbered somehow.  Can you take a look at a live process
running VM when it has not crashed, and see what's in the string
right after the message number?  Also take a look at the code
of VM to see what code makes that string, and what that string is
used for.

    0xa055a04:      0x00000000      0x0043c143      0x49f28038      0x00000006
    0xa055a14:      0x40000000      0x00000032      0x0043c144      0x49f28038
    0xa055a24:      0x00000006      0x40000000      0x00000032      0x0043c145
    0xa055a34:      0x49f28038      0x00000006      0x40000000      0x0000002e
    0xa055a44:      0x0043c146      0x49f28038      0x00000006      0x40000000
    0xa055a54:      0x0000002e      0x00000000      0x00000000      0x00000006
    0xa055a64:      0x40000000      0x00000020      0x00005480      0x489f3ce0

Is 0x43c143 the address of something?  If so, what?

    In the middle of all this is the string "which is sent to the s", which
    probably isn't helpful for debugging, but it does sound kind of like an
    important clue from some bad mystery novel.

If it is part of what was written erroneously into the block,
it may teach us something, especially if you can find the place
that it came from.

If it is data in a string block, then it could be just some string
text that was not clobbered.  In that case it may not be relevant.

    Anyway... a lot of data here.  I don't know if any of it is at all helpful.
    Please advise on where I might go from here.  One question: I see in
    alloc.c that there is code ifdefed with GC_CHECK_STRING_BYTES.  Presumably
    defining this symbol enables additional checks during garbage collection
    (how *did* I figure that out?? :-).  Would it be helpful for me to compile
    a version with this flag set, given that the crash does happen with some
    regularity?

I don't know, but I think it is worth a try.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-09-07 20:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-28 17:16 core dump triggered by garbage collection (?) Mark McAuliffe
2003-09-01  2:22 ` Richard Stallman
2003-09-05  7:32   ` Mark McAuliffe
2003-09-07 20:23     ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).