unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Emacs crashes
@ 2006-03-13 20:23 Nick Roberts
  2006-03-13 20:47 ` Chong Yidong
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Nick Roberts @ 2006-03-13 20:23 UTC (permalink / raw)



I've had Emacs crash/hang in three different ways in recent days.  It would
appear to be less stable than it was two years ago:

1) It hangs with some kind of mutex lock which I don't understand with a brief
   backtrace of three functions in libc, I think.  The only thing I can do,
   after attaching with GDB, is kill it.

2) A garbage collection related crash where mark_object is called recursively 
   literally thousands of times,

3) A crash that is caused by recent changes to the tool bar (I think).  I
   attach the bactrace to this one below (xbacktrace produces no output).
   It appears to go wrong in produce_image_glyph where img=0x0 because
   it->f->output_data.x->display_info->used = 73, is less than
   it->image_id = 87.

-- 
Nick                                           http://www.inet.net.nz/~nickrob


0  0x005657a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x005a5df6 in kill () from /lib/tls/libc.so.6
#2  0x0810f587 in fatal_error_signal (sig=11) at emacs.c:430
#3  <signal handler called>
#4  0x08106d26 in prepare_image_for_display (f=0xa1312e0, img=0x0) at image.c:1203
#5  0x0808cbad in produce_image_glyph (it=0xfee4d570) at xdisp.c:19505
#6  0x0809030d in x_produce_glyphs (it=0xfee4d570) at xdisp.c:20585
#7  0x080755c4 in display_tool_bar_line (it=0xfee4d570, height=38) at xdisp.c:9470
#8  0x08075cf6 in redisplay_tool_bar (f=0xa1312e0) at xdisp.c:9678
#9  0x0807d992 in redisplay_window (window=169022564, just_this_one_p=0) at xdisp.c:13160
#10 0x080792f0 in redisplay_window_0 (window=169022564) at xdisp.c:11523
#11 0x0818c065 in internal_condition_case_1 (bfun=0x80792c4 <redisplay_window_0>, arg=169022564, handlers=137856245, hfun=0x80792a3 <redisplay_window_error>) at eval.c:1521
#12 0x08079290 in redisplay_windows (window=169022564) at xdisp.c:11502
#13 0x08078731 in redisplay_internal (preserve_echo_area=0) at xdisp.c:11062
#14 0x08076d29 in redisplay () at xdisp.c:10292
#15 0x08115328 in read_char (commandflag=1, nmaps=3, maps=0xfee4e4f0, prev_event=137869513, used_mouse_menu=0xfee4e5ec) at keyboard.c:2549
#16 0x0811e7fc in read_key_sequence (keybuf=0xfee4e750, bufsize=30, prompt=137869513, dont_downcase_last=0, can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:8874
#17 0x08112d25 in command_loop_1 () at keyboard.c:1536
#18 0x0818bf37 in internal_condition_case (bfun=0x8112a27 <command_loop_1>, handlers=137914153, hfun=0x811256f <cmd_error>) at eval.c:1473
#19 0x081128a0 in command_loop_2 () at keyboard.c:1328
#20 0x0818b9b6 in internal_catch (tag=137910385, func=0x8112882 <command_loop_2>, arg=137869513) at eval.c:1211
#21 0x08112854 in command_loop () at keyboard.c:1307
#22 0x081122ee in recursive_edit_1 () at keyboard.c:1000
#23 0x0811242f in Frecursive_edit () at keyboard.c:1061
#24 0x08110d0c in main (argc=1, argv=0xfee4ed94) at emacs.c:1789

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 20:23 Emacs crashes Nick Roberts
@ 2006-03-13 20:47 ` Chong Yidong
  2006-03-13 22:06 ` Kim F. Storm
  2006-03-14  4:33 ` Eli Zaretskii
  2 siblings, 0 replies; 43+ messages in thread
From: Chong Yidong @ 2006-03-13 20:47 UTC (permalink / raw)
  Cc: emacs-devel

Nick Roberts <nickrob@snap.net.nz> writes:

> I've had Emacs crash/hang in three different ways in recent days.  It would
> appear to be less stable than it was two years ago:
>
> 1) It hangs with some kind of mutex lock which I don't understand with a brief
>    backtrace of three functions in libc, I think.  The only thing I can do,
>    after attaching with GDB, is kill it.
>
> 2) A garbage collection related crash where mark_object is called recursively 
>    literally thousands of times,

Kim Storm reported some similar crashes around the beginning of March.
Unfortunately, the only big change landed to the src/ directory around
that time was my x_catch_errors change to avoid using
record_unwind_protect.  I've gone over those changes several times,
but no luck: I don't see how they can possbily lead to garbage
collection bugs.

The only possibility I can think of is the change to struct
specbinding and specpdl_ptr to make them non-volatile, which is
supposedly OK since record_unwind_protect can no longer be called in a
signal handler.  Could that lead to problems elsewhere in Emacs?

(The only other big checkin during that period was Luc's
load-file-rep-suffixes change, but that's even less likely to be the
caused.)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 20:23 Emacs crashes Nick Roberts
  2006-03-13 20:47 ` Chong Yidong
@ 2006-03-13 22:06 ` Kim F. Storm
  2006-03-14  0:39   ` Kenichi Handa
                     ` (4 more replies)
  2006-03-14  4:33 ` Eli Zaretskii
  2 siblings, 5 replies; 43+ messages in thread
From: Kim F. Storm @ 2006-03-13 22:06 UTC (permalink / raw)
  Cc: Nick Roberts

Nick Roberts <nickrob@snap.net.nz> writes:

> I've had Emacs crash/hang in three different ways in recent days.

I can second that.  I had another crash today, so it has crashed on me
four times in the last week.

I suspect the recent changes to the handling (unwind etc) of x errors,
but I have no proof, as there is no similarity to the crashes (except
that it has now crashed twice in malloc_consolidate (libc internal)
called from emacs_blocked_malloc called from XtVaGetValues in
x_set_toolkit_scroll_bar_thumb.

I didn't have time to dig further into the crash - and I have no way to determine
what kind of corruption was causing the crash in malloc_consolidate.

>  It would
> appear to be less stable than it was two years ago:

It *is* *much* less stable that it was a week ago!

>
> 1) It hangs with some kind of mutex lock which I don't understand with a brief
>    backtrace of three functions in libc, I think.  The only thing I can do,
>    after attaching with GDB, is kill it.
>
> 2) A garbage collection related crash where mark_object is called recursively 
>    literally thousands of times,
>
> 3) A crash that is caused by recent changes to the tool bar (I think).  I
>    attach the bactrace to this one below (xbacktrace produces no output).
>    It appears to go wrong in produce_image_glyph where img=0x0 because
>    it->f->output_data.x->display_info->used = 73, is less than
>    it->image_id = 87.

I haven't seen any of these -- so there are now 6 different crashes.

Looks like a completely random memory corruption.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 22:06 ` Kim F. Storm
@ 2006-03-14  0:39   ` Kenichi Handa
  2006-03-14 16:09     ` Richard Stallman
  2006-03-14  1:02   ` Juanma Barranquero
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Kenichi Handa @ 2006-03-14  0:39 UTC (permalink / raw)
  Cc: nickrob, emacs-devel

In article <m3pskqrod2.fsf@kfs-l.imdomain.dk>, storm@cua.dk (Kim F. Storm) writes:

> I haven't seen any of these -- so there are now 6 different crashes.

> Looks like a completely random memory corruption.

Yesterday, I met Emacs crash this way:

Program received signal SIGSEGV, Segmentation fault.
0x40359560 in mallopt () from /lib/libc.so.6
(gdb) bt 10
#0  0x40359560 in mallopt () from /lib/libc.so.6
#1  0x4035a339 in mallopt () from /lib/libc.so.6
#2  0x08138341 in emacs_blocked_malloc (size=1078031196, ptr=0x8137cad)
    at alloc.c:1217
#3  0x40357f55 in malloc () from /lib/libc.so.6
#4  0x08137cad in xmalloc (size=140991128) at alloc.c:740
#5  0x0809c8a8 in coding_allocate_composition_data (coding=0x89c41c8,
    char_offset=1078031196) at coding.c:1708
#6  0x080a5d17 in decode_coding_string (str=151149291, coding=0x10, nocopy=0)
    at coding.c:6294
#7  0x08182865 in read_process_output (proc=144453228, channel=46)
    at process.c:5040
(gdb) up 5
#5  0x0809c8a8 in coding_allocate_composition_data (coding=0x89c41c8,
    char_offset=1078031196) at coding.c:1708
1708        = (struct composition_data *) xmalloc (sizeof *cmp_data);

But, this Emacs was compiled before these changes:

2006-03-10  Kim F. Storm  <storm@cua.dk>

	* alloc.c (USE_POSIX_MEMALIGN): Fix last change.

2006-03-09  Stefan Monnier  <monnier@iro.umontreal.ca>

	* alloc.c (USE_POSIX_MEMALIGN): New macro.
	(ABLOCKS_BASE, lisp_align_malloc, lisp_align_free): Use it.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 22:06 ` Kim F. Storm
  2006-03-14  0:39   ` Kenichi Handa
@ 2006-03-14  1:02   ` Juanma Barranquero
  2006-03-14  9:36     ` David Kastrup
  2006-03-14  1:37   ` Nick Roberts
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Juanma Barranquero @ 2006-03-14  1:02 UTC (permalink / raw)


On 3/13/06, Kim F. Storm <storm@cua.dk> wrote:
> Nick Roberts <nickrob@snap.net.nz> writes:

> > It would appear to be less stable than it was two years ago:

> It *is* *much* less stable that it was a week ago!

Impossible! Absurd! We're in a feature freeze, after all...

--
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 22:06 ` Kim F. Storm
  2006-03-14  0:39   ` Kenichi Handa
  2006-03-14  1:02   ` Juanma Barranquero
@ 2006-03-14  1:37   ` Nick Roberts
  2006-03-14 16:07   ` Chong Yidong
  2006-03-14 16:09   ` Richard Stallman
  4 siblings, 0 replies; 43+ messages in thread
From: Nick Roberts @ 2006-03-14  1:37 UTC (permalink / raw)
  Cc: emacs-devel

 > > I've had Emacs crash/hang in three different ways in recent days.
 > 
 > I can second that.  I had another crash today, so it has crashed on me
 > four times in the last week.
 > 
 > I suspect the recent changes to the handling (unwind etc) of x errors,
 > but I have no proof, as there is no similarity to the crashes (except
 > that it has now crashed twice in malloc_consolidate (libc internal)
 > called from emacs_blocked_malloc called from XtVaGetValues in
 > x_set_toolkit_scroll_bar_thumb.

Well mine are different to the ones I reported in February which always
included:

  ...
  #4  0x080e89d1 in x_catch_errors_unwind (dummy=137858041) at xterm.c:7543
  #5  0x0818dc6e in unbind_to (count=44, value=137858041) at eval.c:3233

I attach the bottom part of the backtraces for the garbage collection related
crashes below.

-- 
Nick                                           http://www.inet.net.nz/~nickrob


...
#1283 0x0817417e in mark_object (arg=146162697) at alloc.c:5575
#1284 0x0817417e in mark_object (arg=146839721) at alloc.c:5575
#1285 0x08174136 in mark_object (arg=137870564) at alloc.c:5562
#1286 0x08173422 in Fgarbage_collect () at alloc.c:5022
#1287 0x0818de36 in Ffuncall (nargs=2, args=0xfefe2990) at eval.c:2839
#1288 0x0818dcc4 in call1 (fn=137906777, arg1=144985555) at eval.c:2690
#1289 0x08114e0e in show_help_echo (help=144985555, window=137869513, object=137869513, pos=-8, ok_to_overwrite_keystroke_echo=0) at keyboard.c:2309
#1290 0x08116524 in read_char (commandflag=1, nmaps=4, maps=0xfefe2c30, prev_event=137869513, used_mouse_menu=0xfefe2d2c) at keyboard.c:3155
#1291 0x0811e7fc in read_key_sequence (keybuf=0xfefe2e90, bufsize=30, prompt=137869513, dont_downcase_last=0, can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:8874
#1292 0x08112d25 in command_loop_1 () at keyboard.c:1536
#1293 0x0818bf37 in internal_condition_case (bfun=0x8112a27 <command_loop_1>, handlers=137914153, hfun=0x811256f <cmd_error>) at eval.c:1473
#1294 0x081128a0 in command_loop_2 () at keyboard.c:1328
#1295 0x0818b9b6 in internal_catch (tag=137910385, func=0x8112882 <command_loop_2>, arg=137869513) at eval.c:1211
#1296 0x08112854 in command_loop () at keyboard.c:1307
#1297 0x081122ee in recursive_edit_1 () at keyboard.c:1000
#1298 0x0811242f in Frecursive_edit () at keyboard.c:1061
#1299 0x08110d0c in main (argc=1, argv=0xfefe34d4) at emacs.c:1789

and

...
#19 0x0817417e in mark_object (arg=172548329) at alloc.c:5575
#20 0x08174136 in mark_object (arg=137870564) at alloc.c:5562
#21 0x08173422 in Fgarbage_collect () at alloc.c:5022
#22 0x0818cfc9 in Feval (form=177331709) at eval.c:2138
#23 0x0818c065 in internal_condition_case_1 (bfun=0x818ced9 <Feval>, arg=177331709, handlers=137914153, hfun=0x811bd6f <menu_item_eval_property_1>) at eval.c:1521
#24 0x0811bdf6 in menu_item_eval_property (sexpr=177331709) at keyboard.c:7198
#25 0x08125106 in get_keyelt (object=138087793, autoload=1) at keymap.c:821
#26 0x08124c2d in access_keymap (map=137857181, idx=137902585, t_ok=2, noinherit=0, autoload=1) at keymap.c:651
#27 0x0811cab6 in tool_bar_items (reuse=169488892, nitems=0xfef6fae4) at keyboard.c:7660
#28 0x0807501b in update_tool_bar (f=0x9125da8, save_match_data=0) at xdisp.c:9247
#29 0x08074a38 in prepare_menu_bars () at xdisp.c:8952
#30 0x08077bdc in redisplay_internal (preserve_echo_area=0) at xdisp.c:10703
#31 0x08076d29 in redisplay () at xdisp.c:10292
#32 0x08115328 in read_char (commandflag=1, nmaps=4, maps=0xfef70370, prev_event=137869513, used_mouse_menu=0xfef7046c) at keyboard.c:2549
#33 0x0811e7fc in read_key_sequence (keybuf=0xfef705d0, bufsize=30, prompt=137869513, dont_downcase_last=0, can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:8874
#34 0x08112d25 in command_loop_1 () at keyboard.c:1536
#35 0x0818bf37 in internal_condition_case (bfun=0x8112a27 <command_loop_1>, handlers=137914153, hfun=0x811256f <cmd_error>) at eval.c:1473
#36 0x081128a0 in command_loop_2 () at keyboard.c:1328
#37 0x0818b9b6 in internal_catch (tag=137910385, func=0x8112882 <command_loop_2>, arg=137869513) at eval.c:1211
#38 0x08112854 in command_loop () at keyboard.c:1307
#39 0x081122ee in recursive_edit_1 () at keyboard.c:1000
#40 0x0811242f in Frecursive_edit () at keyboard.c:1061
#41 0x08110d0c in main (argc=1, argv=0xfef70c14) at emacs.c:1789

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 20:23 Emacs crashes Nick Roberts
  2006-03-13 20:47 ` Chong Yidong
  2006-03-13 22:06 ` Kim F. Storm
@ 2006-03-14  4:33 ` Eli Zaretskii
  2006-03-14 20:45   ` Nick Roberts
  2 siblings, 1 reply; 43+ messages in thread
From: Eli Zaretskii @ 2006-03-14  4:33 UTC (permalink / raw)
  Cc: emacs-devel

> From: Nick Roberts <nickrob@snap.net.nz>
> Date: Tue, 14 Mar 2006 09:23:23 +1300
> 
> 1) It hangs with some kind of mutex lock which I don't understand with a brief
>    backtrace of three functions in libc, I think.  The only thing I can do,
>    after attaching with GDB, is kill it.

Please show at least that short backtrace.

> 2) A garbage collection related crash where mark_object is called recursively 
>    literally thousands of times,

The fact that there are thousands of recursive calls to mark_object is
not in itself a sign of a problem.  It is normal for the mark phase to
be deeply recursive.

etc/DEBUG has some text on how to debug crashes during GC.  Could you
try to use those techniques and see what data structure is corrupted?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14  1:02   ` Juanma Barranquero
@ 2006-03-14  9:36     ` David Kastrup
  2006-03-14 11:59       ` Juanma Barranquero
  0 siblings, 1 reply; 43+ messages in thread
From: David Kastrup @ 2006-03-14  9:36 UTC (permalink / raw)
  Cc: emacs-devel

"Juanma Barranquero" <lekktu@gmail.com> writes:

> On 3/13/06, Kim F. Storm <storm@cua.dk> wrote:
>> Nick Roberts <nickrob@snap.net.nz> writes:
>
>> > It would appear to be less stable than it was two years ago:
>
>> It *is* *much* less stable that it was a week ago!
>
> Impossible! Absurd! We're in a feature freeze, after all...

New bugs can be introduced by fixing old ones.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14  9:36     ` David Kastrup
@ 2006-03-14 11:59       ` Juanma Barranquero
  2006-03-14 17:45         ` Richard Stallman
  0 siblings, 1 reply; 43+ messages in thread
From: Juanma Barranquero @ 2006-03-14 11:59 UTC (permalink / raw)
  Cc: emacs-devel

On 3/14/06, David Kastrup <dak@gnu.org> wrote:
> "Juanma Barranquero" <lekktu@gmail.com> writes:

> New bugs can be introduced by fixing old ones.

And that's probably the case right now. I don't doubt it.

Still I find difficult to take the freeze seriously. There must be
something wrong with me, or perhaps in the air.

--
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 22:06 ` Kim F. Storm
                     ` (2 preceding siblings ...)
  2006-03-14  1:37   ` Nick Roberts
@ 2006-03-14 16:07   ` Chong Yidong
  2006-03-14 16:15     ` Kim F. Storm
  2006-03-14 16:09   ` Richard Stallman
  4 siblings, 1 reply; 43+ messages in thread
From: Chong Yidong @ 2006-03-14 16:07 UTC (permalink / raw)
  Cc: Nick Roberts, emacs-devel

storm@cua.dk (Kim F. Storm) writes:

> Nick Roberts <nickrob@snap.net.nz> writes:
>
>> I've had Emacs crash/hang in three different ways in recent days.
>
> I can second that.  I had another crash today, so it has crashed on me
> four times in the last week.
>
> I suspect the recent changes to the handling (unwind etc) of x errors,
> but I have no proof, as there is no similarity to the crashes (except
> that it has now crashed twice in malloc_consolidate (libc internal)
> called from emacs_blocked_malloc called from XtVaGetValues in
> x_set_toolkit_scroll_bar_thumb.

You could revert those changes (if you like, I can send you a single
patch reverting just my changes), run Emacs for a while, and see if
the crashes still happen.

If it is really the cause, I could change the x_catch_errors so that
it doesn't use malloc; maybe that will help.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-13 22:06 ` Kim F. Storm
                     ` (3 preceding siblings ...)
  2006-03-14 16:07   ` Chong Yidong
@ 2006-03-14 16:09   ` Richard Stallman
  2006-03-14 20:47     ` Kim F. Storm
  2006-03-15 15:41     ` Kim F. Storm
  4 siblings, 2 replies; 43+ messages in thread
From: Richard Stallman @ 2006-03-14 16:09 UTC (permalink / raw)
  Cc: nickrob, emacs-devel

Below are all the C-level changes in the past 14 days that are not
specific to Windows or MacOS, and that are before the point at which
Handa reports his Emacs was compiled.  There are not very many of
them.

So that means people could try reverting one of these changes and see
if the crashes stop.  If we try each of them, and record which ones
have been tried, we should find which one it is.

If you try reverting one of these changes and still get a crash,
please put a note into src/ChangeLog saying "Checked DATE YOURNAME"
on a line just after the header line.

It would also be useful if people make a checkout of the March 1 sources
and edit with them for a while, to verify that they indeed do not crash.


2006-03-09  Kenichi Handa  <handa@m17n.org>

	* coding.c (DECODE_EMACS_MULE_COMPOSITION_CHAR): Fix decoding
	ASCII component of a composition.

2006-03-08  Luc Teirlinck  <teirllm@auburn.edu>

	* window.c: Declare preserve_y as a static global variable.
	(window_scroll_pixel_based): No longer declare preserve_y;
	it is global now.
	(syms_of_window): Set preserve_y to -1.

2006-03-06  Chong Yidong  <cyd@stupidchicken.com>

	* xdisp.c (handle_invisible_prop): Don't update it->position with
	a buffer position if we're in a display string.

2006-03-05  Andreas Schwab  <schwab@suse.de>

	* xselect.c (x_catch_errors_unwind): Fix missing return value.

2006-03-02  Kim F. Storm  <storm@cua.dk>

	* frame.h (struct frame): New member n_tool_bar_rows.

	* xdisp.c: Minimize the unpleasent visual impact of the requirement
	that non-toolkit tool-bars must occupy an integral number of screen
	lines, by distributing the rows evenly over the tool-bar screen	area.
	(Vtool_bar_border): New variable.
	(syms_of_xdisp): DEFVAR_LISP it.
	(display_tool_bar_line): Add HEIGHT arg for desired row height.
	Make tool-bar row the desired height.  Use default face for border
	below tool-bar.
	(tool_bar_lines_needed): Add N_ROWS arg.  Use it to return number of
	actual tool-bar rows.
	(redisplay_tool_bar): Calculate f->n_tool_bar_rows initially.
	Adjust the height of the tool-bar rows to fill tool-bar screen area.
	(redisplay_tool_bar): Calculate f->n_tool_bar_rows when tool-bar area
	is resized.

2006-03-01  Luc Teirlinck  <teirllm@auburn.edu>

	* search.c (Fregexp_quote): Do not precede a literal `]' with two
	backslashes to try to make clear that it has a literal meaning; it
	does not do that.  (It could close a character alternative
	containing a backslash.)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14  0:39   ` Kenichi Handa
@ 2006-03-14 16:09     ` Richard Stallman
  2006-03-15  3:24       ` Giorgos Keramidas
  0 siblings, 1 reply; 43+ messages in thread
From: Richard Stallman @ 2006-03-14 16:09 UTC (permalink / raw)
  Cc: nickrob, emacs-devel, storm

    But, this Emacs was compiled before these changes:

    2006-03-10  Kim F. Storm  <storm@cua.dk>

	    * alloc.c (USE_POSIX_MEMALIGN): Fix last change.

    2006-03-09  Stefan Monnier  <monnier@iro.umontreal.ca>

	    * alloc.c (USE_POSIX_MEMALIGN): New macro.
	    (ABLOCKS_BASE, lisp_align_malloc, lisp_align_free): Use it.


Could you put a note in src/ChangeLog that you get crashes
with a version compiled before that point?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 16:07   ` Chong Yidong
@ 2006-03-14 16:15     ` Kim F. Storm
  0 siblings, 0 replies; 43+ messages in thread
From: Kim F. Storm @ 2006-03-14 16:15 UTC (permalink / raw)
  Cc: Nick Roberts, emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> storm@cua.dk (Kim F. Storm) writes:
>
>> Nick Roberts <nickrob@snap.net.nz> writes:
>>
>>> I've had Emacs crash/hang in three different ways in recent days.
>>
>> I can second that.  I had another crash today, so it has crashed on me
>> four times in the last week.
>>
>> I suspect the recent changes to the handling (unwind etc) of x errors,
>> but I have no proof, as there is no similarity to the crashes (except
>> that it has now crashed twice in malloc_consolidate (libc internal)
>> called from emacs_blocked_malloc called from XtVaGetValues in
>> x_set_toolkit_scroll_bar_thumb.
>
> You could revert those changes (if you like, I can send you a single
> patch reverting just my changes), run Emacs for a while, and see if
> the crashes still happen.
>
> If it is really the cause, I could change the x_catch_errors so that
> it doesn't use malloc; maybe that will help.

Except for those changes (and there have been quite a lot on top of
each other!), I only see the following (related) change which may be
at play here:

2006-02-26  Stefan Monnier  <monnier@iro.umontreal.ca>

	* lisp.h (struct specbinding, specpdl_ptr): Remove the volatile
	qualifier which was trying to avoid the bug that was fixed by
	yesterday's changes to xterm.c.


-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 11:59       ` Juanma Barranquero
@ 2006-03-14 17:45         ` Richard Stallman
  2006-03-15  8:58           ` Juanma Barranquero
  0 siblings, 1 reply; 43+ messages in thread
From: Richard Stallman @ 2006-03-14 17:45 UTC (permalink / raw)
  Cc: emacs-devel

    Still I find difficult to take the freeze seriously. There must be
    something wrong with me, or perhaps in the air.

If that is what you think, would you please not say it here?
My life is already very frustrating, and you are making it worse.

If you want to blow off steam, please do it privately, not on
this list.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14  4:33 ` Eli Zaretskii
@ 2006-03-14 20:45   ` Nick Roberts
  2006-03-15  4:43     ` Eli Zaretskii
  2006-03-15 20:21     ` Richard Stallman
  0 siblings, 2 replies; 43+ messages in thread
From: Nick Roberts @ 2006-03-14 20:45 UTC (permalink / raw)
  Cc: emacs-devel

 > > 1) It hangs with some kind of mutex lock which I don't understand with a
 > >    brief backtrace of three functions in libc, I think.  The only thing I
 > >    can do, after attaching with GDB, is kill it.
 > 
 > Please show at least that short backtrace.

It happened twice but I've lost the details, sorry.

 > > 2) A garbage collection related crash where mark_object is called
 > >    recursively literally thousands of times,
 > 
 > The fact that there are thousands of recursive calls to mark_object is
 > not in itself a sign of a problem.  It is normal for the mark phase to
 > be deeply recursive.

OK, I didn't know that.  Perhaps I should look at the bottom of the backtrace
(i.e low frame nos) instead of the top.

 > etc/DEBUG has some text on how to debug crashes during GC.  Could you
 > try to use those techniques and see what data structure is corrupted?

I don't have a live process to debug but I think I can get it to crash again.
Anyway this is what I have found using the corefile:

(gdb) p last_marked_index
$1 = 482
(gdb) p last_marked[482]
$2 = 173755437
(gdb) xtype
Lisp_Cons
(gdb) xcons
$3 = (struct Lisp_Cons *) 0xa5b4c28
{
  car = 0x83bc641, 
  u = {
    cdr = 0x837b8c9, 
    chain = 0x837b8c9
  }
}
(gdb) p last_marked[481]
$4 = 167781611
(gdb) xtype
Lisp_String
(gdb) xcons
$5 = (struct Lisp_Cons *) 0xa0024e8
{
  car = 0x4, 
  u = {
    cdr = 0xffffffff, 
    chain = 0xffffffff
  }
}

These last addresses looks suspect I don't know what to do next.  Am I right
to assume that 481 is the index of the very last marked object, 480 the one
before etc.  And that 482 is the index of the oldest marked object in the
array held in a circular fashion?

Incidentally with gdb-ui, if you display a watch expression in the speedbar
and press 'p' on a component (with a live process), Emacs will print the
s-expression in the GUD buffer.  I've just extended it to work for arrays
so you can quickly look at the s-expression of any element of last_marked,
although I don't know if the others are of interest.


-- 
Nick                                           http://www.inet.net.nz/~nickrob

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 16:09   ` Richard Stallman
@ 2006-03-14 20:47     ` Kim F. Storm
  2006-03-14 21:35       ` Chong Yidong
                         ` (3 more replies)
  2006-03-15 15:41     ` Kim F. Storm
  1 sibling, 4 replies; 43+ messages in thread
From: Kim F. Storm @ 2006-03-14 20:47 UTC (permalink / raw)
  Cc: nickrob, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Below are all the C-level changes in the past 14 days that are not
> specific to Windows or MacOS, and that are before the point at which
> Handa reports his Emacs was compiled.  

I had a crash on Mar 6, two on Mar 8, and another on Mar 12.

I don't know if it incidental, but just before the first crash, I had
put (server-start) into my .emacs and used it via emacsclient in
connection with some of those crashes.  

The other change I made is that before Mar 6, I hadn't used an
up-to-date CVS emacs very intensively for some weeks, but on that day,
and the days after I updated and used it quite intensively...

So I don't know exactly when the crashes started ...  but it may be
something done before Mar 1.

>                                        There are not very many of
> them.

True, but if you go back another week, there are some quite fundamental
changes in the way X errors are handled -- which I think is more likely
to be the cause of these problems...

>
> So that means people could try reverting one of these changes and see
> if the crashes stop.  If we try each of them, and record which ones
> have been tried, we should find which one it is.
>
> If you try reverting one of these changes and still get a crash,
> please put a note into src/ChangeLog saying "Checked DATE YOURNAME"
> on a line just after the header line.
>
> It would also be useful if people make a checkout of the March 1 sources
> and edit with them for a while, to verify that they indeed do not crash.
>

I will do that.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 20:47     ` Kim F. Storm
@ 2006-03-14 21:35       ` Chong Yidong
  2006-03-15 20:21         ` Richard Stallman
  2006-03-14 22:38       ` Kim F. Storm
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Chong Yidong @ 2006-03-14 21:35 UTC (permalink / raw)
  Cc: nickrob, rms, emacs-devel

storm@cua.dk (Kim F. Storm) writes:

> True, but if you go back another week, there are some quite fundamental
> changes in the way X errors are handled -- which I think is more likely
> to be the cause of these problems...
>>
>> It would also be useful if people make a checkout of the March 1 sources
>> and edit with them for a while, to verify that they indeed do not crash.
>>
>
> I will do that.

If it seems likely that the X error handler changes are at fault, I
can revert them.  I have a pretty good idea how to fix the crashes
they were originally meant to address in a different, less intrusive
way (the idea is to make those functions that call x_catch_errors in a
signal handler instead call XSetErrorHandler to install a temporary
"ignore all errors" handler.)

Unfortunately, I have not been able to experience these crashes
myself.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 20:47     ` Kim F. Storm
  2006-03-14 21:35       ` Chong Yidong
@ 2006-03-14 22:38       ` Kim F. Storm
  2006-03-15  9:22         ` Nick Roberts
  2006-03-15  3:21       ` Giorgos Keramidas
  2006-03-15 20:21       ` Richard Stallman
  3 siblings, 1 reply; 43+ messages in thread
From: Kim F. Storm @ 2006-03-14 22:38 UTC (permalink / raw)
  Cc: emacs-devel

storm@cua.dk (Kim F. Storm) writes:

>> It would also be useful if people make a checkout of the March 1 sources
>> and edit with them for a while, to verify that they indeed do not crash.
>>
>
> I will do that.

I have checked out versions from Mar 1, Mar 7 and today, and tried various
tasks with each of them for a while without any of them crashing.

Since the crashes only happens occasionally, it is a bit hard to say anything
based on this short period of time.  I will try to use the Mar 1 version for
a few days.

If it is a problem with the X error handler, can someone tell me what
I can possibly try to increase the chance of triggering an x error
that may trigger a crash?

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 20:47     ` Kim F. Storm
  2006-03-14 21:35       ` Chong Yidong
  2006-03-14 22:38       ` Kim F. Storm
@ 2006-03-15  3:21       ` Giorgos Keramidas
  2006-03-15 20:21       ` Richard Stallman
  3 siblings, 0 replies; 43+ messages in thread
From: Giorgos Keramidas @ 2006-03-15  3:21 UTC (permalink / raw)
  Cc: nickrob, rms, emacs-devel

On 2006-03-14 21:47, "Kim F. Storm" <storm@cua.dk> wrote:
>Richard Stallman <rms@gnu.org> writes:
>> Below are all the C-level changes in the past 14 days that are not
>> specific to Windows or MacOS, and that are before the point at which
>> Handa reports his Emacs was compiled.
>
> I had a crash on Mar 6, two on Mar 8, and another on Mar 12.
>
> I don't know if it incidental, but just before the first crash, I had
> put (server-start) into my .emacs and used it via emacsclient in
> connection with some of those crashes.

FWIW,

I've never had a crash after the posix_memalign() fix committed by
Stefan, but I'm using only part of the full feature set of current Emacs
builds.

My current snapshot is compiled with:

   --prefix=/opt/emacs --without-x

and I haven't used (server-start) for a week or so.

Of course this doesn't prove anything.  Just that the crash hasn't
happened with this particular setup... yet.

- Giorgos

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 16:09     ` Richard Stallman
@ 2006-03-15  3:24       ` Giorgos Keramidas
  2006-03-15 20:23         ` Richard Stallman
  0 siblings, 1 reply; 43+ messages in thread
From: Giorgos Keramidas @ 2006-03-15  3:24 UTC (permalink / raw)
  Cc: nickrob, emacs-devel, storm, Kenichi Handa

On 2006-03-14 11:09, Richard Stallman <rms@gnu.org> wrote:
>     But, this Emacs was compiled before these changes:
>
>     2006-03-10  Kim F. Storm  <storm@cua.dk>
>
> 	    * alloc.c (USE_POSIX_MEMALIGN): Fix last change.
>
>     2006-03-09  Stefan Monnier  <monnier@iro.umontreal.ca>
>
> 	    * alloc.c (USE_POSIX_MEMALIGN): New macro.
> 	    (ABLOCKS_BASE, lisp_align_malloc, lisp_align_free): Use it.
>
>
> Could you put a note in src/ChangeLog that you get crashes
> with a version compiled before that point?

I think these changes are very likely to be 100% correct, as not having
them fails to bootstrap too many times.  Their intent was to only use
posix_memalign() when the system's malloc() is used.  Otherwise we risk
having a memory area allocated by posix_memalign() and then freed with
Emacs' internal gmalloc or vice versa.

With these changes, on the other hand, failures during bootstrapping on
FreeBSD/amd64 have stopped immediately here :-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 20:45   ` Nick Roberts
@ 2006-03-15  4:43     ` Eli Zaretskii
  2006-03-15  7:49       ` Nick Roberts
  2006-03-15 20:21     ` Richard Stallman
  1 sibling, 1 reply; 43+ messages in thread
From: Eli Zaretskii @ 2006-03-15  4:43 UTC (permalink / raw)
  Cc: emacs-devel

> From: Nick Roberts <nickrob@snap.net.nz>
> Date: Wed, 15 Mar 2006 09:45:22 +1300
> Cc: emacs-devel@gnu.org
> 
>  > The fact that there are thousands of recursive calls to mark_object is
>  > not in itself a sign of a problem.  It is normal for the mark phase to
>  > be deeply recursive.
> 
> OK, I didn't know that.  Perhaps I should look at the bottom of the backtrace
> (i.e low frame nos) instead of the top.

Actually, it's the other way around: you need to look at the frames
that call mark_object and its subroutines, and try to correlate those
frames with the contents of last_marked[] array.  Through these two
pieces of evidence, you should reconstruct the Lisp data structure
that is being marked (recursively) at the point of crash.

Once the offending data structure is identified, i.e. you know the
name of the Lisp variable/function/whatever that was corrupted, the
next step is to try to figure out how it gets corrupted.

> (gdb) p last_marked_index
> $1 = 482
> (gdb) p last_marked[482]
> $2 = 173755437
> (gdb) xtype
> Lisp_Cons
> (gdb) xcons
> $3 = (struct Lisp_Cons *) 0xa5b4c28
> {
>   car = 0x83bc641, 
>   u = {
>     cdr = 0x837b8c9, 
>     chain = 0x837b8c9
>   }
> }
> (gdb) p last_marked[481]
> $4 = 167781611
> (gdb) xtype
> Lisp_String
> (gdb) xcons
> $5 = (struct Lisp_Cons *) 0xa0024e8
> {
>   car = 0x4, 
>   u = {
>     cdr = 0xffffffff, 
>     chain = 0xffffffff
>   }
> }
> 
> These last addresses looks suspect

Yes.

> I don't know what to do next.

You need to go back in time ;-).  Print previous values in
last_marked[] and correlate them with the backtrace.  In each frame of
the backtrace, you will see what kind of Lisp primitive data type is
being marked, but since some subroutines of mark_object have loops,
you won't see all the components being marked in the backtrace, so
last_marked[] will fill in the blanks.

For each Lisp type you find in last_marked[], try to establish its
type and name, and, if it's a string, the value.  The name and the
string value are the most important parts, since you can then grep the
sources to find out what data structure it could belong to.  Continue
doing this until you find a symbol that is a global or buffer-local
variable you can identify in the sources.

> Am I right to assume that 481 is the index of the very last marked
> object, 480 the one before etc.  And that 482 is the index of the
> oldest marked object in the array held in a circular fashion?

Yes.  You need to go from 481 backwards and examine the objects one by
one.

> Incidentally with gdb-ui, if you display a watch expression in the speedbar
> and press 'p' on a component (with a live process), Emacs will print the
> s-expression in the GUD buffer.

Beware: these features invoke code inside the crashed Emacs version.
Even if you have a live process, if it crashed, it is unsafe to invoke
`pr' and its ilk in that session, because it will most probably get a
SIGSEGV a second time.  You _must_ use only the simple commands xtype,
xcons, xsymbol, xstring, etc.

One other thing: since you are in the middle of the mark stage of GC,
some objects, notably the strings in last_marked[] array, have their
mark bit set and are relocated.  I think xstring, doesn't know how to
cope with that, so you might need to look at lisp.h and reconstruct
the C pointers to the relevant C data structure manually, instead of
using xstring.  (This particular piece of experience is from long ago,
so perhaps this problem is no longer with us with the current
sources.  Just don't be intimidated if some xstring says it cannot
show the value, even though xtype said it's a string; try walking the
C data structures manually.)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15  4:43     ` Eli Zaretskii
@ 2006-03-15  7:49       ` Nick Roberts
  2006-03-15 19:49         ` Eli Zaretskii
  0 siblings, 1 reply; 43+ messages in thread
From: Nick Roberts @ 2006-03-15  7:49 UTC (permalink / raw)
  Cc: emacs-devel

 > > (gdb) p last_marked[481]
 > > $4 = 167781611
 > > (gdb) xtype
 > > Lisp_String
 > > (gdb) xcons
 > > $5 = (struct Lisp_Cons *) 0xa0024e8
 > > {
 > >   car = 0x4, 
 > >   u = {
 > >     cdr = 0xffffffff, 
 > >     chain = 0xffffffff
 > >   }
 > > }
 > > 
 > > These last addresses looks suspect
 > 
 > Yes.

Sorry, that was a mistake, I should have type xstring instead of xcons.

(gdb) p* (struct Lisp_String *) 0xa0024e8
$15 = {
  size = 4, 
  size_byte = -1, 
  intervals = 0x10, 
  data = 0xa66c79c "\301\b!\207"
}

which is what the variable ptr points to and it crashes out on the line:

MARK_INTERVAL_TREE (ptr->intervals);

 > > I don't know what to do next.
 > 
 > You need to go back in time ;-).  Print previous values in
 > last_marked[] and correlate them with the backtrace.  In each frame of
 > the backtrace, you will see what kind of Lisp primitive data type is
 > being marked, but since some subroutines of mark_object have loops,
 > you won't see all the components being marked in the backtrace, so
 > last_marked[] will fill in the blanks.
 > 
 > For each Lisp type you find in last_marked[], try to establish its
 > type and name, and, if it's a string, the value.  The name and the
 > string value are the most important parts, since you can then grep the
 > sources to find out what data structure it could belong to.  Continue
 > doing this until you find a symbol that is a global or buffer-local
 > variable you can identify in the sources.

Here are some values below but I can't see a connection between them.  I
guess I should try to work out what created (struct Lisp_String *) 0xa0024e8.

Nick


(gdb) p last_marked[482]
$24 = 173755437
(gdb) xtyp
Lisp_Cons
(gdb) p last_marked[481]
$1 = 167781611
(gdb) xtyp
Lisp_String
(gdb) xstring
$2 = (struct Lisp_String *) 0xa0024e8
"\301\b!\207"
(gdb) p last_marked[480]
$3 = 138964225
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$4 = (struct Lisp_Symbol *) 0x8486d00
"rev"
(gdb) p last_marked[479]
$5 = 174656941
(gdb) xtyp
Lisp_Cons
(gdb) xcons
$6 = (struct Lisp_Cons *) 0xa690da8
{
  car = 0x8486d01, 
  u = {
    cdr = 0x837b8c9, 
    chain = 0x837b8c9
  }
}
(gdb) p last_marked[478]
$11 = 140320329
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$12 = (struct Lisp_Symbol *) 0x85d1e48
"backend"
(gdb) p last_marked[477]
$13 = 174656909
(gdb) xtyp
Lisp_Cons
(gdb) xcons
$14 = (struct Lisp_Cons *) 0xa690d88
{
  car = 0x85d1e49, 
  u = {
    cdr = 0xa690dad, 
    chain = 0xa690dad
  }
}
(gdb) p last_marked[476]
$21 = 175717180
(gdb) xtyp
Lisp_Vectorlike
PVEC_COMPILED
(gdb) p last_marked[475]
$22 = 137869537
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$23 = (struct Lisp_Symbol *) 0x837b8e0
"unbound"
(gdb) p last_marked[482]
$24 = 173755437
(gdb) xtyp
Lisp_Cons
(gdb) p last_marked[474]
$25 = 172548329
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$26 = (struct Lisp_Symbol *) 0xa48e0e8
"vc-default-show-log-entry"
(gdb) p last_marked[473]
$1 = 160558849
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$2 = (struct Lisp_Symbol *) 0x991ef00
"ediff-skip-merge-regions-that-differ-from-default"
(gdb) p last_marked[472]
$3 = 137869513
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$4 = (struct Lisp_Symbol *) 0x837b8c8
"nil"
(gdb) p last_marked[471]
$5 = 137869513
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$6 = (struct Lisp_Symbol *) 0x837b8c8
"nil"
(gdb) p last_marked[470]
$7 = 137869513
(gdb) xtyp
Lisp_Symbol
(gdb) xsym
$8 = (struct Lisp_Symbol *) 0x837b8c8
"nil"
(gdb) p last_marked[469]
$9 = 376392
(gdb) xtyp
Lisp_Int
(gdb) xint
$10 = 47049
(gdb) p last_marked[468]
$13 = 148534611
(gdb) xtyp
Lisp_String
(gdb) xstring
$14 = (struct Lisp_String *) 0x8da7550
"/home/nickrob/emacs/lisp/mail/sendmail.elc"

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 17:45         ` Richard Stallman
@ 2006-03-15  8:58           ` Juanma Barranquero
  2006-03-17 16:32             ` Richard Stallman
  0 siblings, 1 reply; 43+ messages in thread
From: Juanma Barranquero @ 2006-03-15  8:58 UTC (permalink / raw)


On 3/14/06, Richard Stallman <rms@gnu.org> wrote:

> If that is what you think, would you please not say it here?

No, that is not what I think really. I don't think there's anything
wrong with me regarding the freeze. I honestly think there's something
wrong with "the freeze".

> My life is already very frustrating, and you are making it worse.

I'm really sorry. That was not my intention.

> If you want to blow off steam, please do it privately, not on
> this list.

No, I don't have any steam to blow off.

--
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 22:38       ` Kim F. Storm
@ 2006-03-15  9:22         ` Nick Roberts
  2006-03-15  9:28           ` David Kastrup
  2006-03-15 11:35           ` Jan D.
  0 siblings, 2 replies; 43+ messages in thread
From: Nick Roberts @ 2006-03-15  9:22 UTC (permalink / raw)
  Cc: emacs-devel

 > If it is a problem with the X error handler, can someone tell me what
 > I can possibly try to increase the chance of triggering an x error
 > that may trigger a crash?

Its probably not x error related but I can get Emacs (built Mar 13) to crash
every time by:

M-x tool-bar-mode
C-x d <RET>

I guess if others can't there might be something wrong with my build
(it doesn't crash without the tool bar).

-- 
Nick                                           http://www.inet.net.nz/~nickrob

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15  9:22         ` Nick Roberts
@ 2006-03-15  9:28           ` David Kastrup
  2006-03-15 11:35           ` Jan D.
  1 sibling, 0 replies; 43+ messages in thread
From: David Kastrup @ 2006-03-15  9:28 UTC (permalink / raw)
  Cc: emacs-devel, Kim F. Storm

Nick Roberts <nickrob@snap.net.nz> writes:

>  > If it is a problem with the X error handler, can someone tell me what
>  > I can possibly try to increase the chance of triggering an x error
>  > that may trigger a crash?
>
> Its probably not x error related but I can get Emacs (built Mar 13) to crash
> every time by:
>
> M-x tool-bar-mode
> C-x d <RET>
>
> I guess if others can't there might be something wrong with my build
> (it doesn't crash without the tool bar).

Emacs starts with the tool bar on by default.  Do you indeed turn it
_off_ and then Emacs crashes?

Or have you omitted relevant details from your setup?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15  9:22         ` Nick Roberts
  2006-03-15  9:28           ` David Kastrup
@ 2006-03-15 11:35           ` Jan D.
  1 sibling, 0 replies; 43+ messages in thread
From: Jan D. @ 2006-03-15 11:35 UTC (permalink / raw)
  Cc: emacs-devel, Kim F. Storm

>  > If it is a problem with the X error handler, can someone tell me what
>  > I can possibly try to increase the chance of triggering an x error
>  > that may trigger a crash?
> 
> Its probably not x error related but I can get Emacs (built Mar 13) to crash
> every time by:
> 
> M-x tool-bar-mode
> C-x d <RET>
> 
> I guess if others can't there might be something wrong with my build
> (it doesn't crash without the tool bar).

I can't.  Tested with and without tool bar (not clear what you have) on
x86 and AMD-64.  Which toolkit did you build with?

	Jan D.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 16:09   ` Richard Stallman
  2006-03-14 20:47     ` Kim F. Storm
@ 2006-03-15 15:41     ` Kim F. Storm
  2006-03-15 17:05       ` Luc Teirlinck
                         ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Kim F. Storm @ 2006-03-15 15:41 UTC (permalink / raw)
  Cc: nickrob, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Below are all the C-level changes in the past 14 days that are not
> specific to Windows or MacOS, and that are before the point at which
> Handa reports his Emacs was compiled.  There are not very many of
> them.

Well, I think I have located a probably cause of the crashes
in function extend_face_to_end_of_line, and the way it is used
in redisplay_tool_bar.

My recent changes to the tool-bar display has revealed this error.
I don't know if it can explain other crashes we have seen.

I will install a fix later today.


My humble apologies to Chong Yidong for pointing at his X error patches!
They are probably working just fine.  Sorry!!!

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 15:41     ` Kim F. Storm
@ 2006-03-15 17:05       ` Luc Teirlinck
  2006-03-15 17:21       ` Chong Yidong
  2006-03-15 19:03       ` Kim F. Storm
  2 siblings, 0 replies; 43+ messages in thread
From: Luc Teirlinck @ 2006-03-15 17:05 UTC (permalink / raw)
  Cc: nickrob, rms, emacs-devel

Kim Storm wrote:

   Well, I think I have located a probably cause of the crashes
   in function extend_face_to_end_of_line, and the way it is used
   in redisplay_tool_bar.

   My recent changes to the tool-bar display has revealed this error.
   I don't know if it can explain other crashes we have seen.

It might explain why I (and other people) never saw _any_ of these
crashes:  I do not use the toolbar.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 15:41     ` Kim F. Storm
  2006-03-15 17:05       ` Luc Teirlinck
@ 2006-03-15 17:21       ` Chong Yidong
  2006-03-15 19:03       ` Kim F. Storm
  2 siblings, 0 replies; 43+ messages in thread
From: Chong Yidong @ 2006-03-15 17:21 UTC (permalink / raw)
  Cc: nickrob, rms, emacs-devel

storm@cua.dk (Kim F. Storm) writes:

> My recent changes to the tool-bar display has revealed this error.
> I don't know if it can explain other crashes we have seen.
>
> I will install a fix later today.
>
>
> My humble apologies to Chong Yidong for pointing at his X error patches!
> They are probably working just fine.  Sorry!!!

No need to --- according to the changelog, they were the only major
changes in the time period.  I'm guessing you already had your
tool-bar changes in your tree; that was why you started getting
crashes in the beginning of March, earlier than everyone else.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 15:41     ` Kim F. Storm
  2006-03-15 17:05       ` Luc Teirlinck
  2006-03-15 17:21       ` Chong Yidong
@ 2006-03-15 19:03       ` Kim F. Storm
  2006-03-15 21:40         ` Nick Roberts
  2 siblings, 1 reply; 43+ messages in thread
From: Kim F. Storm @ 2006-03-15 19:03 UTC (permalink / raw)
  Cc: emacs-devel

storm@cua.dk (Kim F. Storm) writes:

> I will install a fix later today.

Done.


I also fixed an old bug where the tool-bar window was twice the
necessary height, but no icons were shown if the tool-bar row is
exactly the same width as the window.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15  7:49       ` Nick Roberts
@ 2006-03-15 19:49         ` Eli Zaretskii
  2006-03-15 21:40           ` Nick Roberts
  0 siblings, 1 reply; 43+ messages in thread
From: Eli Zaretskii @ 2006-03-15 19:49 UTC (permalink / raw)
  Cc: emacs-devel

> From: Nick Roberts <nickrob@snap.net.nz>
> Date: Wed, 15 Mar 2006 20:49:55 +1300
> Cc: emacs-devel@gnu.org
> 
> (gdb) p* (struct Lisp_String *) 0xa0024e8
> $15 = {
>   size = 4, 
>   size_byte = -1, 
>   intervals = 0x10, 
>   data = 0xa66c79c "\301\b!\207"
> }
> 
> which is what the variable ptr points to and it crashes out on the line:
> 
> MARK_INTERVAL_TREE (ptr->intervals);

I think ptr->intervals is the reason for the crash, because
MARK_INTERVAL_TREE dereferences it, and 0x10 is too small to be a
valid address.

> Here are some values below but I can't see a connection between them.

A simple list of values recorded in last_marked[] won't do.  You need
to correlate it with the innermost frames you see in the backtrace,
and from that correlation figure out the name of the Lisp data
structure that is being marked.  The connection between the values
recorded in last_marked[] will be revealed if you look at the code,
because, e.g., when GC finds a cons, it recursively marks its car and
its cdr.  By looking at the code, you should be able to find this and
other similar connections between the values, like A being a property
of B etc.

> I guess I should try to work out what created (struct Lisp_String *)
> 0xa0024e8.

Not who created it, but what higher-level Lisp data is it part of.
Btw, looking at the value of that string, namely

> (gdb) p* (struct Lisp_String *) 0xa0024e8
> $15 = {
>   size = 4, 
>   size_byte = -1, 
>   intervals = 0x10, 
>   data = 0xa66c79c "\301\b!\207"
> }

It sounds like its data is some bytecode.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 20:45   ` Nick Roberts
  2006-03-15  4:43     ` Eli Zaretskii
@ 2006-03-15 20:21     ` Richard Stallman
  2006-03-16 20:18       ` Richard Stallman
  1 sibling, 1 reply; 43+ messages in thread
From: Richard Stallman @ 2006-03-15 20:21 UTC (permalink / raw)
  Cc: eliz, emacs-devel

     > The fact that there are thousands of recursive calls to mark_object is
     > not in itself a sign of a problem.  It is normal for the mark phase to
     > be deeply recursive.

    OK, I didn't know that.  Perhaps I should look at the bottom of the backtrace
    (i.e low frame nos) instead of the top.

That would be useful if you want to see what Emacs was doing when it
garbage collected, so as to see what recent previous activity might
have been responsible for the clobberage.

However, for finding out what data was clobbered, you need to look at
the innermost frames.  Finding out what data was clobbered is often useful
because often the clobberage is not entirely random.  It may, for instance,
be an overrun problem affecting the data immediately before in memory.

    $5 = (struct Lisp_Cons *) 0xa0024e8
    {
      car = 0x4, 
      u = {
	cdr = 0xffffffff, 
	chain = 0xffffffff
      }
    }

    These last addresses looks suspect I don't know what to do next.

It seems definitely invalid.

So we know that the code that clobbers can store -1.  That may be useful.
Is it always -1?

However, it seems clear that all the other data near this one are cons
cells too.  And cons cell slots are only used as cons cells.  An
overrun on a nearby cons cell seems rather implausible as a source of
error.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 20:47     ` Kim F. Storm
                         ` (2 preceding siblings ...)
  2006-03-15  3:21       ` Giorgos Keramidas
@ 2006-03-15 20:21       ` Richard Stallman
  3 siblings, 0 replies; 43+ messages in thread
From: Richard Stallman @ 2006-03-15 20:21 UTC (permalink / raw)
  Cc: nickrob, emacs-devel

    I don't know if it incidental, but just before the first crash, I had
    put (server-start) into my .emacs and used it via emacsclient in
    connection with some of those crashes.  

Could you (or someone) try building an Emacs from mid-Feb and put in
(server-start) and see if crashes happen?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-14 21:35       ` Chong Yidong
@ 2006-03-15 20:21         ` Richard Stallman
  0 siblings, 0 replies; 43+ messages in thread
From: Richard Stallman @ 2006-03-15 20:21 UTC (permalink / raw)
  Cc: nickrob, emacs-devel, storm

    If it seems likely that the X error handler changes are at fault, I
    can revert them.  I have a pretty good idea how to fix the crashes
    they were originally meant to address in a different, less intrusive
    way (the idea is to make those functions that call x_catch_errors in a
    signal handler instead call XSetErrorHandler to install a temporary
    "ignore all errors" handler.)

It is worth a try, to see if this solves the problem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15  3:24       ` Giorgos Keramidas
@ 2006-03-15 20:23         ` Richard Stallman
  0 siblings, 0 replies; 43+ messages in thread
From: Richard Stallman @ 2006-03-15 20:23 UTC (permalink / raw)
  Cc: nickrob, storm, handa, emacs-devel

    I think these changes are very likely to be 100% correct, as not having
    them fails to bootstrap too many times.

That does not prove they are correct--it only proves they fixed a real
problem.  It remains possible that they also cause a bug in some
less-frequent case.

The real reason we know these changes are not responsible for these
crashes is that the crashes happened before these changes were made.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 19:03       ` Kim F. Storm
@ 2006-03-15 21:40         ` Nick Roberts
  0 siblings, 0 replies; 43+ messages in thread
From: Nick Roberts @ 2006-03-15 21:40 UTC (permalink / raw)


 > > I will install a fix later today.
 > 
 > Done.
 > 
 > 
 > I also fixed an old bug where the tool-bar window was twice the
 > necessary height, but no icons were shown if the tool-bar row is
 > exactly the same width as the window.

[Kim, I've not mailed you directly because everything I send to you gets
rejected as spam]

I can't reproduce the crash now.  However, I couldn't reproduce it after an
intermediate build without your changes.  And the backtrace for one crash
was in PRODUCE_GLYPHS in display_tool_bar_line which is _before_
extend_face_to_end_of_line.

I also found out that the crash didn't occur with -Q (which presumably why
no-one else saw it).  I had presumed it was a display problem but just
commenting out one line:

(setq gud-pdb-command-name "/usr/lib/python2.3/pdb.py")

or, even shortening the string length, stopped the crash, so I guess it
was memory related.

Nick

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 19:49         ` Eli Zaretskii
@ 2006-03-15 21:40           ` Nick Roberts
  2006-03-16 20:18             ` Richard Stallman
  0 siblings, 1 reply; 43+ messages in thread
From: Nick Roberts @ 2006-03-15 21:40 UTC (permalink / raw)
  Cc: emacs-devel

 > > Here are some values below but I can't see a connection between them.
 > 
 > A simple list of values recorded in last_marked[] won't do.  You need
 > to correlate it with the innermost frames you see in the backtrace,
 > and from that correlation figure out the name of the Lisp data
 > structure that is being marked.  The connection between the values
 > recorded in last_marked[] will be revealed if you look at the code,
 > because, e.g., when GC finds a cons, it recursively marks its car and
 > its cdr.  By looking at the code, you should be able to find this and
 > other similar connections between the values, like A being a property
 > of B etc.

OK, this tells me more than I could find in DEBUG. 

 > > I guess I should try to work out what created (struct Lisp_String *)
 > > 0xa0024e8.
 > 
 > Not who created it, but what higher-level Lisp data is it part of.
 > Btw, looking at the value of that string, namely
 > 
 > > (gdb) p* (struct Lisp_String *) 0xa0024e8
 > > $15 = {
 > >   size = 4, 
 > >   size_byte = -1, 
 > >   intervals = 0x10, 
 > >   data = 0xa66c79c "\301\b!\207"
 > > }
 > 
 > It sounds like its data is some bytecode.

I've rebuilt Emacs now but this information will be useful for later.

-- 
Nick                                           http://www.inet.net.nz/~nickrob

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 20:21     ` Richard Stallman
@ 2006-03-16 20:18       ` Richard Stallman
  0 siblings, 0 replies; 43+ messages in thread
From: Richard Stallman @ 2006-03-16 20:18 UTC (permalink / raw)


    So we know that the code that clobbers can store -1.  That may be useful.
    Is it always -1?

I see I was mistaken in reaching that conclusion, since it wasn't
a real cons cell.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15 21:40           ` Nick Roberts
@ 2006-03-16 20:18             ` Richard Stallman
  2006-03-16 21:25               ` Nick Roberts
  0 siblings, 1 reply; 43+ messages in thread
From: Richard Stallman @ 2006-03-16 20:18 UTC (permalink / raw)
  Cc: eliz, emacs-devel

     > A simple list of values recorded in last_marked[] won't do.  You need
     > to correlate it with the innermost frames you see in the backtrace,
     > and from that correlation figure out the name of the Lisp data
     > structure that is being marked.  The connection between the values
     > recorded in last_marked[] will be revealed if you look at the code,
     > because, e.g., when GC finds a cons, it recursively marks its car and
     > its cdr.  By looking at the code, you should be able to find this and
     > other similar connections between the values, like A being a property
     > of B etc.

    OK, this tells me more than I could find in DEBUG. 

Could you add some of that to DEBUG?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-16 20:18             ` Richard Stallman
@ 2006-03-16 21:25               ` Nick Roberts
  2006-03-18 14:31                 ` Eli Zaretskii
  0 siblings, 1 reply; 43+ messages in thread
From: Nick Roberts @ 2006-03-16 21:25 UTC (permalink / raw)
  Cc: eliz, emacs-devel

 >      > A simple list of values recorded in last_marked[] won't do.  You need
 >      > to correlate it with the innermost frames you see in the backtrace,
 >      > and from that correlation figure out the name of the Lisp data
 >      > structure that is being marked.  The connection between the values
 >      > recorded in last_marked[] will be revealed if you look at the code,
 >      > because, e.g., when GC finds a cons, it recursively marks its car and
 >      > its cdr.  By looking at the code, you should be able to find this and
 >      > other similar connections between the values, like A being a property
 >      > of B etc.
 > 
 >     OK, this tells me more than I could find in DEBUG. 
 > 
 > Could you add some of that to DEBUG?

It would be better coming from Eli, but I will do this if he doesn't have the
time.

-- 
Nick                                           http://www.inet.net.nz/~nickrob

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-15  8:58           ` Juanma Barranquero
@ 2006-03-17 16:32             ` Richard Stallman
  2006-03-17 16:41               ` Juanma Barranquero
  0 siblings, 1 reply; 43+ messages in thread
From: Richard Stallman @ 2006-03-17 16:32 UTC (permalink / raw)
  Cc: emacs-devel

    I honestly think there's something
    wrong with "the freeze".

You've stated your opinion.  Would you please not say that again here?
Saying it again just adds more stress and anger to my life.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-17 16:32             ` Richard Stallman
@ 2006-03-17 16:41               ` Juanma Barranquero
  0 siblings, 0 replies; 43+ messages in thread
From: Juanma Barranquero @ 2006-03-17 16:41 UTC (permalink / raw)


On 3/17/06, Richard Stallman <rms@gnu.org> wrote:

> You've stated your opinion.  Would you please not say that again here?

I'll try to remember it.

--
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Emacs crashes
  2006-03-16 21:25               ` Nick Roberts
@ 2006-03-18 14:31                 ` Eli Zaretskii
  0 siblings, 0 replies; 43+ messages in thread
From: Eli Zaretskii @ 2006-03-18 14:31 UTC (permalink / raw)
  Cc: emacs-devel

> From: Nick Roberts <nickrob@snap.net.nz>
> Date: Fri, 17 Mar 2006 10:25:44 +1300
> Cc: eliz@gnu.org, emacs-devel@gnu.org
> 
>  >      > A simple list of values recorded in last_marked[] won't do.  You need
>  >      > to correlate it with the innermost frames you see in the backtrace,
>  >      > and from that correlation figure out the name of the Lisp data
>  >      > structure that is being marked.  The connection between the values
>  >      > recorded in last_marked[] will be revealed if you look at the code,
>  >      > because, e.g., when GC finds a cons, it recursively marks its car and
>  >      > its cdr.  By looking at the code, you should be able to find this and
>  >      > other similar connections between the values, like A being a property
>  >      > of B etc.
>  > 
>  >     OK, this tells me more than I could find in DEBUG. 
>  > 
>  > Could you add some of that to DEBUG?
> 
> It would be better coming from Eli, but I will do this if he doesn't have the
> time.

Done.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2006-03-18 14:31 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-13 20:23 Emacs crashes Nick Roberts
2006-03-13 20:47 ` Chong Yidong
2006-03-13 22:06 ` Kim F. Storm
2006-03-14  0:39   ` Kenichi Handa
2006-03-14 16:09     ` Richard Stallman
2006-03-15  3:24       ` Giorgos Keramidas
2006-03-15 20:23         ` Richard Stallman
2006-03-14  1:02   ` Juanma Barranquero
2006-03-14  9:36     ` David Kastrup
2006-03-14 11:59       ` Juanma Barranquero
2006-03-14 17:45         ` Richard Stallman
2006-03-15  8:58           ` Juanma Barranquero
2006-03-17 16:32             ` Richard Stallman
2006-03-17 16:41               ` Juanma Barranquero
2006-03-14  1:37   ` Nick Roberts
2006-03-14 16:07   ` Chong Yidong
2006-03-14 16:15     ` Kim F. Storm
2006-03-14 16:09   ` Richard Stallman
2006-03-14 20:47     ` Kim F. Storm
2006-03-14 21:35       ` Chong Yidong
2006-03-15 20:21         ` Richard Stallman
2006-03-14 22:38       ` Kim F. Storm
2006-03-15  9:22         ` Nick Roberts
2006-03-15  9:28           ` David Kastrup
2006-03-15 11:35           ` Jan D.
2006-03-15  3:21       ` Giorgos Keramidas
2006-03-15 20:21       ` Richard Stallman
2006-03-15 15:41     ` Kim F. Storm
2006-03-15 17:05       ` Luc Teirlinck
2006-03-15 17:21       ` Chong Yidong
2006-03-15 19:03       ` Kim F. Storm
2006-03-15 21:40         ` Nick Roberts
2006-03-14  4:33 ` Eli Zaretskii
2006-03-14 20:45   ` Nick Roberts
2006-03-15  4:43     ` Eli Zaretskii
2006-03-15  7:49       ` Nick Roberts
2006-03-15 19:49         ` Eli Zaretskii
2006-03-15 21:40           ` Nick Roberts
2006-03-16 20:18             ` Richard Stallman
2006-03-16 21:25               ` Nick Roberts
2006-03-18 14:31                 ` Eli Zaretskii
2006-03-15 20:21     ` Richard Stallman
2006-03-16 20:18       ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).