* allocate_string_data memory corruption @ 2006-01-18 16:57 Chong Yidong 2006-01-18 20:48 ` Stefan Monnier ` (3 more replies) 0 siblings, 4 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-18 16:57 UTC (permalink / raw) There's been some progress tracking down the hyperthreading / allocate_string related crash. We can now reproduce a crash reliably. However, the memory corruption seems to happen after a debugging call which ought to be a no-op. I'd appreciate if someone could look through this and give a suggestion. The memory corruption occurs in the `data' variable, an sdata struct in alloc.c:allocate_string_data(), shown below. (The following code is not the same as in CVS; we've changed some things for debugging): void allocate_string_data (s, nchars, nbytes) struct Lisp_String *s; int nchars, nbytes; { struct sdata *data; struct sblock *b; ... data = b->next_free; data->string = s; s->data = SDATA_DATA (data); data->nbytes = nbytes; s->size = nchars; s->size_byte = nbytes; s->data[nbytes] = '\0'; bcopy (string_overrun_cookie, (char *) data + needed, GC_STRING_OVERRUN_COOKIE_SIZE); /* no crash here */ if (data->string != s || data->nbytes != nbytes) abort (); check_sblock (current_sblock); /* crash occured here */ if (data->string != s || data->nbytes != nbytes) abort (); ... } In this function, data->string is set to s, and nbytes is set to nbytes. If check_sblock is a no-op, there should be no change. However, we get an abort on the second debugging check: #0 abort () at emacs.c:461 #1 0x0817499e in allocate_string_data (s=0x8d18778, nchars=8, nbytes=8) at alloc.c:2013 s == (struct Lisp_String *) 0x8d18778 data->string == (struct Lisp_String *) 0x8d18788 <-- off by 16 nbytes == 8 data->nbytes == 200 <-- off by 192 nchars == 8 needed == 20 #2 0x08175311 in make_uninit_multibyte_string (nchars=8, nbytes=8) at alloc.c:2472 The definition of check_sblock() we used is as follows: void check_sblock (b) struct sblock *b; { struct sdata *from, *end, *from_end; end = b->next_free; for (from = &b->first_data; from < end; from = from_end) { int nbytes = from->nbytes; if (from->string && (nbytes != string_bytes (from->string))) abort (); nbytes = SDATA_SIZE (nbytes); from_end = (struct sdata *) ((char *) from + nbytes + GC_STRING_EXTRA); } if (from != end) abort(); } int string_bytes (s) struct Lisp_String *s; { int nbytes = (s->size_byte < 0 ? s->size & ~ARRAY_MARK_FLAG : s->size_byte); if (!PURE_POINTER_P (s) && s->data && nbytes != SDATA_NBYTES (SDATA_OF_STRING (s))) abort (); return nbytes; } There is clearly no assignment to any of the sdata structures in these functions. Any ideas? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 16:57 allocate_string_data memory corruption Chong Yidong @ 2006-01-18 20:48 ` Stefan Monnier 2006-01-20 0:45 ` Chong Yidong ` (3 more replies) 2006-01-18 21:35 ` Ken Raeburn ` (2 subsequent siblings) 3 siblings, 4 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-18 20:48 UTC (permalink / raw) Cc: emacs-devel > In this function, data->string is set to s, and nbytes is set to > nbytes. If check_sblock is a no-op, there should be no change. > However, we get an abort on the second debugging check: Most likely the thing that's happening is that check_sblock takes a "long" time during which there's a higher probability for a signal to arrive and the bug itself is that one of the signal handlers does some string allocation (or some other manipulation of those data structures). I'd try something like eassert (!in_allocate_string_data); in_allocate_string_data = 1; ... check_sblock(); ... in_allocate_string_data = 0; BTW, it's possible that -DSYNC_INPUT fixes the bug. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 20:48 ` Stefan Monnier @ 2006-01-20 0:45 ` Chong Yidong 2006-01-20 1:14 ` Richard M. Stallman ` (2 subsequent siblings) 3 siblings, 0 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-20 0:45 UTC (permalink / raw) Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > Most likely the thing that's happening is that check_sblock takes a "long" > time during which there's a higher probability for a signal to arrive and > the bug itself is that one of the signal handlers does some string > allocation (or some other manipulation of those data structures). > > I'd try something like > > eassert (!in_allocate_string_data); > in_allocate_string_data = 1; > ... > check_sblock(); > ... > in_allocate_string_data = 0; OK, I'll follow this up. > BTW, it's possible that -DSYNC_INPUT fixes the bug. Curiously, if you turn on the GC debugging checks at the top of lisp.h, compiling with -DSYNC_INPUT fails: ./temacs --batch --load loadup bootstrap *** glibc detected *** double free or corruption (out): 0x0839a500 *** make[2]: *** [bootstrap-emacs] Aborted make[2]: Leaving directory `/home/cyd/tmp/emacs/src' make[1]: *** [bootstrap-build] Error 2 make[1]: Leaving directory `/home/cyd/tmp/emacs' make: *** [bootstrap] Error 2 (gdb) r -batch Starting program: /home/cyd/tmp/emacs/src/temacs -batch *** glibc detected *** double free or corruption (out): 0x0839a520 *** Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall () (gdb) bt #0 0xffffe410 in __kernel_vsyscall () #1 0xb7bba9b1 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7bbc2c9 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb7bee6ea in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 #4 0xb7bf4f54 in malloc_trim () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7bf52ca in free () from /lib/tls/i686/cmov/libc.so.6 #6 0x0813df9e in init_buffer () at buffer.c:5174 #7 0x0811779e in main (argc=2, argv=0xbfc70214) at emacs.c:1526 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 20:48 ` Stefan Monnier 2006-01-20 0:45 ` Chong Yidong @ 2006-01-20 1:14 ` Richard M. Stallman 2006-01-20 3:48 ` Stefan Monnier 2006-01-23 20:21 ` Stefan Monnier 2006-01-24 17:23 ` Chong Yidong 3 siblings, 1 reply; 58+ messages in thread From: Richard M. Stallman @ 2006-01-20 1:14 UTC (permalink / raw) Cc: cyd, emacs-devel BTW, it's possible that -DSYNC_INPUT fixes the bug. The reason why -DSYNC_INPUT isn't the default is that redraw events don't get handled until Emacs reads input. Can you figure out some other way to cause redraws to be handled reasonably soon? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 1:14 ` Richard M. Stallman @ 2006-01-20 3:48 ` Stefan Monnier 0 siblings, 0 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-20 3:48 UTC (permalink / raw) Cc: cyd, emacs-devel > BTW, it's possible that -DSYNC_INPUT fixes the bug. > The reason why -DSYNC_INPUT isn't the default > is that redraw events don't get handled until Emacs reads input. Huh? Where did you get this idea? Events get processed as soon as we hit a QUIT or a UNBLOCK_INPUT, which is a lot more frequent than reading input. I've been using -DSYNC_INPUT without noticing any detrimental effect for more than year. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 20:48 ` Stefan Monnier 2006-01-20 0:45 ` Chong Yidong 2006-01-20 1:14 ` Richard M. Stallman @ 2006-01-23 20:21 ` Stefan Monnier 2006-01-24 17:23 ` Chong Yidong 3 siblings, 0 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-23 20:21 UTC (permalink / raw) Cc: emacs-devel >> In this function, data->string is set to s, and nbytes is set to >> nbytes. If check_sblock is a no-op, there should be no change. >> However, we get an abort on the second debugging check: > Most likely the thing that's happening is that check_sblock takes a "long" > time during which there's a higher probability for a signal to arrive and > the bug itself is that one of the signal handlers does some string > allocation (or some other manipulation of those data structures). > I'd try something like > eassert (!in_allocate_string_data); > in_allocate_string_data = 1; > ... > check_sblock(); > ... > in_allocate_string_data = 0; > BTW, it's possible that -DSYNC_INPUT fixes the bug. My recent eassert (!handling_signal) patch uncovered another case where Lisp objects are allocated asynchronously (and this time, it's allocating a string rather than a vector, so it's more serious): A backtrace is attached. I don't know how to go about fixing this problem (other than use -DSYNC_INPUT). I seem to remember discussions in the past about moving the note_mouse_movement code to the normal main loop (i.e. generate a special Lisp event and bind it in special-event-map) which could solve this problem, but at the cost of delaying the processing of mouse_movement even more than -DSYNC_INPUT would. Stefan #0 abort () at emacs.c:464 #1 0x00000000006136c7 in die (msg=0x736410 "assertion failed: !handling_signal", file=0x736190 "alloc.c", line=1845) at alloc.c:6127 #2 0x0000000000609d50 in allocate_string () at alloc.c:1845 #3 0x000000000060b182 in make_uninit_multibyte_string (nchars=63, nbytes=63) at alloc.c:2454 #4 0x000000000060b117 in make_uninit_string (length=63) at alloc.c:2435 #5 0x000000000060acfd in make_unibyte_string (contents=0x10047b8 "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1", length=63) at alloc.c:2349 #6 0x000000000060acc2 in make_string (contents=0x10047b8 "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1", nbytes=63) at alloc.c:2334 #7 0x000000000060b0fa in build_string (str=0x10047b8 "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1") at alloc.c:2423 #8 0x000000000053227e in x_load_font (f=0x1b9bb40, fontname=0x10047b8 "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1", size=0) at xterm.c:9726 #9 0x000000000054f2fd in fs_load_font (f=0x1b9bb40, c=0, fontname=0x10047b8 "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1", id=-1, face=0x0) at fontset.c:700 #10 0x000000000054ef6e in fontset_font_pattern (f=0x1b9bb40, id=102, c=0) at fontset.c:633 #11 0x0000000000516fc3 in choose_face_font (f=0x1b9bb40, attrs=0x210da48, fontset=102, c=0, needs_overstrike=0x7fbfffc264) at xfaces.c:6888 #12 0x0000000000507d08 in load_face_font (f=0x1b9bb40, face=0x210d9d0, c=0) at xfaces.c:1257 #13 0x00000000005184d6 in realize_face (cache=0x174e8c0, attrs=0x7fbfffc380, c=0, base_face=0x0, former_face_id=-1) at xfaces.c:7153 #14 0x00000000005145c4 in lookup_face (f=0x1b9bb40, attr=0x7fbfffc380, c=0, base_face=0x0) at xfaces.c:5685 #15 0x0000000000519b2c in face_at_string_position (w=0x1f9de00, string={i = 6917529027653345456, s = {val = 12263600, type = Lisp_String}, u = {val = 12263600, type = Lisp_String}}, pos=1, bufpos=0, region_beg=0, region_end=0, endptr=0x7fbfffc504, base_face_id=14, mouse_p=1) at xfaces.c:7786 #16 0x000000000048fdec in note_mode_line_or_margin_highlight (window={i = -9223372036821623296, s = {val = 33152512, type = Lisp_Vectorlike}, u = {val = 33152512, type = Lisp_Vectorlike}}, x=20, y=53, area=ON_MODE_LINE) at xdisp.c:22128 #17 0x000000000049010a in note_mouse_highlight (f=0x1b9bb40, x=121, y=494) at xdisp.c:22219 #18 0x0000000000523b09 in note_mouse_movement (frame=0x1b9bb40, event=0x7fbfffcb30) at xterm.c:3614 #19 0x000000000052b3ff in handle_one_xevent (dpyinfo=0x10315b0, eventp=0x7fbfffcfd0, finish=0x7fbfffcf6c, hold_quit=0x7fbfffe120) at xterm.c:6573 #20 0x000000000052bfee in XTread_socket (sd=2, expected=1, hold_quit=0x7fbfffe120) at xterm.c:7021 #21 0x0000000000580165 in read_avail_input (expected=1) at keyboard.c:6710 #22 0x0000000000580356 in handle_async_input () at keyboard.c:6856 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 20:48 ` Stefan Monnier ` (2 preceding siblings ...) 2006-01-23 20:21 ` Stefan Monnier @ 2006-01-24 17:23 ` Chong Yidong 3 siblings, 0 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-24 17:23 UTC (permalink / raw) Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> In this function, data->string is set to s, and nbytes is set to >> nbytes. If check_sblock is a no-op, there should be no change. >> However, we get an abort on the second debugging check: > > BTW, it's possible that -DSYNC_INPUT fixes the bug. I just got a reply from Friedrich Friedrichs: CFLAGS="-O0 -DSYNC_INPUT -g" ./configure make clean make Then see if you can crash Emacs again. (The point here is to see if the -DSYNC_INPUT compilation flag helps remove the problem.) I can't crash it with -DSYNC_INPUT enabled. (At least it doesn't crash during the usual operations and I was able to re-publish my whole planner-muse-wiki without problems.) So it looks like signal handling during string/cons allocation is indeed the culprit. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 16:57 allocate_string_data memory corruption Chong Yidong 2006-01-18 20:48 ` Stefan Monnier @ 2006-01-18 21:35 ` Ken Raeburn 2006-01-18 23:56 ` Chong Yidong 2006-01-20 1:14 ` Richard M. Stallman 2006-01-18 22:06 ` Eli Zaretskii 2006-01-20 1:14 ` Richard M. Stallman 3 siblings, 2 replies; 58+ messages in thread From: Ken Raeburn @ 2006-01-18 21:35 UTC (permalink / raw) Cc: emacs-devel On Jan 18, 2006, at 11:57, Chong Yidong wrote: > There's been some progress tracking down the hyperthreading / > allocate_string related crash. We can now reproduce a crash reliably. That's good news... sort of... :-) > In this function, data->string is set to s, and nbytes is set to > nbytes. If check_sblock is a no-op, there should be no change. By "no-op", do you mean, for example, a macro or previously-defined empty function, such that the compiler will produce different code for allocate_string_data? I don't know if you're fluent in x86 assembly, but I'd check to see if the function's code differs between the two cases. If it doesn't, I think the next thing I'd try would be a watchpoint under gdb to see what happens during check_sblock. If you need to run some unpredictable large number of invocations of the function to trigger the problem, commands can be run at a breakpoint to enable the watchpoint right before the first check, and disable it after the second. You can use convenience variables to store copies of s, data, etc. If the assembly code does differ, I'd inspect the failing version more carefully. And maybe try to tweak the source or build options such that check_sblock doesn't influence how allocate_string_data is compiled. Is this consistent across OSes? E.g., Linux and *BSD or Solaris? How about compiler versions? Could be a subtle OS bug in task switching or something. Anything interesting going on with signal handlers at the time? > #1 0x0817499e in allocate_string_data (s=0x8d18778, nchars=8, > nbytes=8) at alloc.c:2013 > > s == (struct Lisp_String *) 0x8d18778 > data->string == (struct Lisp_String *) 0x8d18788 <-- off by 16 > > nbytes == 8 > data->nbytes == 200 <-- off by 192 > > nchars == 8 > needed == 20 And you've checked, for example, that data hasn't changed, that s and nchars still accurately reflect what the caller passed in, etc? Sometimes gdb can get confused if the compiler is too clever. My Red Hat system at work has hyperthreading in its cpu, perhaps I could help if you've got a portable test case setup? Ken ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 21:35 ` Ken Raeburn @ 2006-01-18 23:56 ` Chong Yidong 2006-01-19 8:53 ` Romain Francoise 2006-01-20 1:14 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-18 23:56 UTC (permalink / raw) Cc: emacs-devel Ken Raeburn <raeburn@raeburn.org> writes: > By "no-op", do you mean, for example, a macro or previously-defined > empty function, such that the compiler will produce different code > for allocate_string_data? Sorry, that was the wrong phrase to use. I meant that check_sblock only scans the values in current_sblock, and aborts if they are illegal -- it does not write into the heap at any point. > Is this consistent across OSes? E.g., Linux and *BSD or Solaris? > How about compiler versions? Could be a subtle OS bug in task > switching or something. Anything interesting going on with signal > handlers at the time? The user is running SuSE 9.1. He has a certain Emacs Lisp setup that he can use to crash Emacs reproducibly, when hyperthreading is turned on. Without hyperthreading, he has gotten one or two strange bus errors, but these seem to be difficult to reproduce, and we're not sure if they are real. His Emacs Lisp setup seems to be pretty idiosyncratic. He said before that he didn't know how to whittle it down into a small test case, but I can ask him again. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 23:56 ` Chong Yidong @ 2006-01-19 8:53 ` Romain Francoise 2006-01-19 20:57 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Romain Francoise @ 2006-01-19 8:53 UTC (permalink / raw) Cc: Ken Raeburn, emacs-devel Chong Yidong <cyd@stupidchicken.com> writes: > He has a certain Emacs Lisp setup that he can use to crash Emacs > reproducibly, when hyperthreading is turned on. Without > hyperthreading, he has gotten one or two strange bus errors, but these > seem to be difficult to reproduce, and we're not sure if they are > real. Hmm... bus errors are quite uncommon on i386, are you 100% sure that we're looking at a software problem here? It's possible that this user's machine has faulty RAM chips that cause random memory corruption, and that HyperThreading worsens the problem because it increases memory pressure. Can the problem be reproduced on several machines? If not, could the user run memtest86+ on that one machine? If this is a problem specific to HT, it strikes me as suspicious that we got only *one* report given the number of people who use the CVS version of Emacs. (For the record, my main work machine has HyperThreading and Emacs hasn't crashed there for many months now.) -- Romain Francoise <romain@orebokech.com> | The sea! the sea! the open it's a miracle -- http://orebokech.com/ | sea! The blue, the fresh, the | ever free! --Bryan W. Procter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-19 8:53 ` Romain Francoise @ 2006-01-19 20:57 ` Stefan Monnier 2006-01-19 22:48 ` Kim F. Storm 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2006-01-19 20:57 UTC (permalink / raw) Cc: Chong Yidong, Ken Raeburn, emacs-devel >> He has a certain Emacs Lisp setup that he can use to crash Emacs >> reproducibly, when hyperthreading is turned on. Without >> hyperthreading, he has gotten one or two strange bus errors, but these >> seem to be difficult to reproduce, and we're not sure if they are >> real. > Hmm... bus errors are quite uncommon on i386, are you 100% sure that > we're looking at a software problem here? It's possible that this > user's machine has faulty RAM chips that cause random memory corruption, > and that HyperThreading worsens the problem because it increases memory > pressure. > Can the problem be reproduced on several machines? If not, could the > user run memtest86+ on that one machine? > If this is a problem specific to HT, it strikes me as suspicious that we > got only *one* report given the number of people who use the CVS version > of Emacs. > (For the record, my main work machine has HyperThreading and Emacs > hasn't crashed there for many months now.) I still believe in the race-condition-with-signal-handler, because I'm absolutely sure there are bugs left in that area (which is why I came up with the SYNC_INPUT patch in the first place). Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-19 20:57 ` Stefan Monnier @ 2006-01-19 22:48 ` Kim F. Storm 2006-01-20 3:46 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kim F. Storm @ 2006-01-19 22:48 UTC (permalink / raw) Cc: Chong Yidong, Romain Francoise, Ken Raeburn, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > I still believe in the race-condition-with-signal-handler, because I'm > absolutely sure there are bugs left in that area (which is why I came up > with the SYNC_INPUT patch in the first place). Please remind me why we don't use SYNC_INPUT by default (except on the mac)? Because it is *bug fun* to debug memory corruption? ;-/ -- Kim F. Storm <storm@cua.dk> http://www.cua.dk ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-19 22:48 ` Kim F. Storm @ 2006-01-20 3:46 ` Stefan Monnier 2006-01-20 22:58 ` Richard M. Stallman 2006-01-25 3:26 ` Chong Yidong 0 siblings, 2 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-20 3:46 UTC (permalink / raw) Cc: Chong Yidong, Romain Francoise, Ken Raeburn, emacs-devel >> I still believe in the race-condition-with-signal-handler, because I'm >> absolutely sure there are bugs left in that area (which is why I came up >> with the SYNC_INPUT patch in the first place). > Please remind me why we don't use SYNC_INPUT by default (except on the mac)? Probably first and foremost because I never bothered too hard to try and get the default to be changed. Second, because it supposedly may delay the processing of C-g since we then processs signals with a "polling" model (if we get stuck in a loop with no QUIT and no BLOCK_INPUT we won't ever process the C-g. It seems to be that even without SYNC_INPUT, under X11, a C-g is not process until we reach a QUIT, so the difference is not clear to me). > Because it is *bug fun* to debug memory corruption? ;-/ No: because it's even more fun to debug *asynchronous* memory corruption ;-) Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 3:46 ` Stefan Monnier @ 2006-01-20 22:58 ` Richard M. Stallman 2006-01-25 3:26 ` Chong Yidong 1 sibling, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-20 22:58 UTC (permalink / raw) Cc: cyd, romain, emacs-devel, raeburn, storm Huh? Where did you get this idea? Events get processed as soon as we hit a QUIT or a UNBLOCK_INPUT, which is a lot more frequent than reading input. I've been using -DSYNC_INPUT without noticing any detrimental effect for more than year. I apologize for my fallible memory--that is what I remembered. I guess the real issue is this one with C-g. processing of C-g since we then processs signals with a "polling" model (if we get stuck in a loop with no QUIT and no BLOCK_INPUT we won't ever process the C-g. It seems to be that even without SYNC_INPUT, under X11, a C-g is not process until we reach a QUIT, so the difference is not clear to me). Many loops in Emacs use immediate_quit rather than QUIT. If quitting out of them does not work, it is a serious problem. One possible fix would be to get rid of immediate_quit, and use QUIT in every loop. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 3:46 ` Stefan Monnier 2006-01-20 22:58 ` Richard M. Stallman @ 2006-01-25 3:26 ` Chong Yidong 2006-01-25 15:45 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-25 3:26 UTC (permalink / raw) Cc: Romain Francoise, emacs-devel, Ken Raeburn, Kim F. Storm >> Please remind me why we don't use SYNC_INPUT by default > > ... it supposedly may delay the processing of C-g since we then > processs signals with a "polling" model (if we get stuck in a loop > with no QUIT and no BLOCK_INPUT we won't ever process the C-g. It > seems to be that even without SYNC_INPUT, under X11, a C-g is not > process until we reach a QUIT, so the difference is not clear to > me). How about this idea... The problem is that the X signal handler (XTread_socket) does stuff that can clobber the heap. Richard's objection to SYNC_INPUT is that C-g may not be read in some loops. But Stefan pointed out that under X, C-g is not processed until we read a QUIT. My proposal is to make signal handling work like SYNC_INPUT when we are running in X (i.e., when read_socket_hook is NULL). The C-g behavior is no worse than before, and the terminal behavior remains the same as before. We lose nothing, while resolving the memory clobberage bugs. This change would look something like this: *** emacs/src/keyboard.c.~1.847.~ 2006-01-20 10:05:43.000000000 -0500 --- emacs/src/keyboard.c 2006-01-24 22:12:26.000000000 -0500 *************** *** 6897,6903 **** EMACS_SET_SECS_USECS (*input_available_clear_time, 0, 0); #ifndef SYNC_INPUT ! handle_async_input (); #endif #ifdef BSD4_1 --- 6897,6904 ---- EMACS_SET_SECS_USECS (*input_available_clear_time, 0, 0); #ifndef SYNC_INPUT ! if (!read_socket_hook) ! handle_async_input (); #endif #ifdef BSD4_1 *** emacs/src/lisp.h.~1.547.~ 2005-12-11 15:37:36.000000000 -0500 --- emacs/src/lisp.h 2006-01-24 22:13:41.000000000 -0500 *************** *** 1830,1835 **** --- 1830,1837 ---- Fthrow (Vthrow_on_input, Qt); \ Fsignal (Qquit, Qnil); \ } \ + else if (!read_socket_hook) \ + handle_async_input (); \ } while (0) #endif /* not SYNC_INPUT */ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-25 3:26 ` Chong Yidong @ 2006-01-25 15:45 ` Richard M. Stallman 0 siblings, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-25 15:45 UTC (permalink / raw) Cc: romain, raeburn, monnier, storm, emacs-devel But Stefan pointed out that under X, C-g is not processed until we read a QUIT. That claim is surprising to me--what is the reason for that? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 21:35 ` Ken Raeburn 2006-01-18 23:56 ` Chong Yidong @ 2006-01-20 1:14 ` Richard M. Stallman 2006-01-20 9:28 ` Ken Raeburn 1 sibling, 1 reply; 58+ messages in thread From: Richard M. Stallman @ 2006-01-20 1:14 UTC (permalink / raw) Cc: cyd, emacs-devel Is this consistent across OSes? E.g., Linux and *BSD or Solaris? Linux is a kernel--not comparable to Solaris and *BSD. If you mean the entire system that is basically GNU with Linux added, please call it GNU/Linux. To call the GNU system "Linux" is unfair to the GNU Project (which includes you, since you're helping develop a part of it). See http://www.gnu.org/gnu/gnu-linux-faq.html. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 1:14 ` Richard M. Stallman @ 2006-01-20 9:28 ` Ken Raeburn 2006-01-20 22:58 ` Richard M. Stallman 0 siblings, 1 reply; 58+ messages in thread From: Ken Raeburn @ 2006-01-20 9:28 UTC (permalink / raw) Cc: cyd, emacs-devel On Jan 19, 2006, at 20:14, Richard M. Stallman wrote: > Is this consistent across OSes? E.g., Linux and *BSD or Solaris? > > Linux is a kernel--not comparable to Solaris and *BSD. If you mean > the entire system that is basically GNU with Linux added, please call > it GNU/Linux. > > To call the GNU system "Linux" is unfair to the GNU Project (which > includes you, since you're helping develop a part of it). See > http://www.gnu.org/gnu/gnu-linux-faq.html. Since hyperthreading seems to be a factor, actually, I was thinking of the kernel. Perhaps I should've said "*BSD kernels" and "Solaris kernels"... (But async signals and hardware problems, as previously suggested, do sound like better candidates to investigate first, if the problem isn't easily reproduced by anyone else.) Ken ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 9:28 ` Ken Raeburn @ 2006-01-20 22:58 ` Richard M. Stallman 0 siblings, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-20 22:58 UTC (permalink / raw) Cc: cyd, emacs-devel Since hyperthreading seems to be a factor, actually, I was thinking of the kernel. Perhaps I should've said "*BSD kernels" and "Solaris kernels"... Yes. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 16:57 allocate_string_data memory corruption Chong Yidong 2006-01-18 20:48 ` Stefan Monnier 2006-01-18 21:35 ` Ken Raeburn @ 2006-01-18 22:06 ` Eli Zaretskii 2006-01-18 23:48 ` David Kastrup 2006-01-18 23:48 ` Chong Yidong 2006-01-20 1:14 ` Richard M. Stallman 3 siblings, 2 replies; 58+ messages in thread From: Eli Zaretskii @ 2006-01-18 22:06 UTC (permalink / raw) Cc: emacs-devel > From: Chong Yidong <cyd@stupidchicken.com> > Date: Wed, 18 Jan 2006 11:57:02 -0500 > > /* no crash here */ > if (data->string != s || data->nbytes != nbytes) abort (); > > check_sblock (current_sblock); > > /* crash occured here */ > if (data->string != s || data->nbytes != nbytes) abort (); > ... > } > > > In this function, data->string is set to s, and nbytes is set to > nbytes. If check_sblock is a no-op, there should be no change. > However, we get an abort on the second debugging check: > > #0 abort () at emacs.c:461 Are you in fact 110% sure the abort happens on the second debugging check? Can you tell how did you verify this? (The line number shown by GDB is not a reliable evidence.) Was this code compiled without optimizations? If not, recompile with "-O0" and see if you get abort or crash in another place. Btw, does the value of `data' itself change in any way? You didn't show its value at the offending line. > There is clearly no assignment to any of the sdata structures in these > functions. > > Any ideas? One idea is to put watchpoints on the relevant memory addresses and see who hits them. For maximum reliability, look at the machine instruction whose address GDB shows when the watchpoint is hit, and then disassemble around that address to see what source line(s) did that. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 22:06 ` Eli Zaretskii @ 2006-01-18 23:48 ` David Kastrup 2006-01-18 23:48 ` Chong Yidong 1 sibling, 0 replies; 58+ messages in thread From: David Kastrup @ 2006-01-18 23:48 UTC (permalink / raw) Cc: Chong Yidong, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Chong Yidong <cyd@stupidchicken.com> >> Date: Wed, 18 Jan 2006 11:57:02 -0500 >> >> /* no crash here */ >> if (data->string != s || data->nbytes != nbytes) abort (); >> >> check_sblock (current_sblock); >> >> /* crash occured here */ >> if (data->string != s || data->nbytes != nbytes) abort (); >> ... >> } >> >> >> In this function, data->string is set to s, and nbytes is set to >> nbytes. If check_sblock is a no-op, there should be no change. >> However, we get an abort on the second debugging check: >> >> #0 abort () at emacs.c:461 > > Are you in fact 110% sure the abort happens on the second debugging > check? Can you tell how did you verify this? (The line number shown > by GDB is not a reliable evidence.) > > Was this code compiled without optimizations? If not, recompile with > "-O0" and see if you get abort or crash in another place. In particular (from DEBUG): ** When you are trying to analyze failed assertions, it will be essential to compile Emacs either completely without optimizations or at least (when using GCC) with the -fno-crossjumping option. Failure to do so may make the compiler recycle the same abort call for all assertions in a given function, rendering the stack backtrace useless for identifying the specific failed assertion. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 22:06 ` Eli Zaretskii 2006-01-18 23:48 ` David Kastrup @ 2006-01-18 23:48 ` Chong Yidong 2006-01-19 1:15 ` Stefan Monnier 2006-01-19 4:36 ` Eli Zaretskii 1 sibling, 2 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-18 23:48 UTC (permalink / raw) Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Are you in fact 110% sure the abort happens on the second debugging > check? Can you tell how did you verify this? (The line number shown > by GDB is not a reliable evidence.) > > Was this code compiled without optimizations? If not, recompile with > "-O0" and see if you get abort or crash in another place. I'll double-check with the user, but I believe it was compiled with -O0. We had to move the check around in that function while we were narrowing down where the memory corruption occurs, and the line number reported by GDB seemed to be correct. > Btw, does the value of `data' itself change in any way? You didn't > show its value at the offending line. `data' is just a variable, stored on the stack, that points to the next free region of memory in current_sblock (b->next_free). I haven't considered the possibility that `data' itself could have changed. I'll look into it. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 23:48 ` Chong Yidong @ 2006-01-19 1:15 ` Stefan Monnier 2006-01-19 3:21 ` Ken Raeburn 2006-01-19 4:36 ` Eli Zaretskii 1 sibling, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2006-01-19 1:15 UTC (permalink / raw) Cc: Eli Zaretskii, emacs-devel >> Are you in fact 110% sure the abort happens on the second debugging >> check? Can you tell how did you verify this? (The line number shown >> by GDB is not a reliable evidence.) >> >> Was this code compiled without optimizations? If not, recompile with >> "-O0" and see if you get abort or crash in another place. > I'll double-check with the user, but I believe it was compiled with > -O0. We had to move the check around in that function while we were > narrowing down where the memory corruption occurs, and the line number > reported by GDB seemed to be correct. If you use eassert instead of "if (...) abort();" you won't have to worry about it because the line number is embedded in the error message, so you don't even need debug-symbols. OTOH you need to compile with -DENABLE_CHECKING. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-19 1:15 ` Stefan Monnier @ 2006-01-19 3:21 ` Ken Raeburn 0 siblings, 0 replies; 58+ messages in thread From: Ken Raeburn @ 2006-01-19 3:21 UTC (permalink / raw) Cc: Chong Yidong, Eli Zaretskii, emacs-devel On Jan 18, 2006, at 20:15, Stefan Monnier wrote: > If you use eassert instead of "if (...) abort();" you won't have to > worry > about it because the line number is embedded in the error message, > so you > don't even need debug-symbols. > OTOH you need to compile with -DENABLE_CHECKING. BTW, a configure option to turn on ENABLE_CHECKING is pretty trivial, I'll check it in if people want it... I assumed it wouldn't be of general enough interest, but we have --enable-asserts for the XASSERTS code... maybe we should have one configure option turn on both? Ken ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 23:48 ` Chong Yidong 2006-01-19 1:15 ` Stefan Monnier @ 2006-01-19 4:36 ` Eli Zaretskii 1 sibling, 0 replies; 58+ messages in thread From: Eli Zaretskii @ 2006-01-19 4:36 UTC (permalink / raw) Cc: emacs-devel > Cc: emacs-devel@gnu.org > From: Chong Yidong <cyd@stupidchicken.com> > Date: Wed, 18 Jan 2006 18:48:58 -0500 > > > Btw, does the value of `data' itself change in any way? You didn't > > show its value at the offending line. > > `data' is just a variable, stored on the stack, that points to the > next free region of memory in current_sblock (b->next_free). I > haven't considered the possibility that `data' itself could have > changed. I'll look into it. If `data' is on the stack, perhaps it's changed by some ``no-op'' code that smashes the stack. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-18 16:57 allocate_string_data memory corruption Chong Yidong ` (2 preceding siblings ...) 2006-01-18 22:06 ` Eli Zaretskii @ 2006-01-20 1:14 ` Richard M. Stallman 2006-01-20 3:56 ` Stefan Monnier 3 siblings, 1 reply; 58+ messages in thread From: Richard M. Stallman @ 2006-01-20 1:14 UTC (permalink / raw) Cc: emacs-devel /* no crash here */ if (data->string != s || data->nbytes != nbytes) abort (); check_sblock (current_sblock); /* crash occured here */ if (data->string != s || data->nbytes != nbytes) abort (); ... My first question is, is the value of `data' itself the same at those two places? In other words, did the memory locations pointed to get clobbered, or did the variable `data' itself get clobbered? Another question: suppose you replace check_sblock with a delay loop. Does it still happen? Try various values of the delay. I just noticed that allocate_string does nothing to prevent signals from being handled. Neither does Fcons. The result is that if a signal comes inside these lines, /* Pop a Lisp_String off the free-list. */ s = string_free_list; string_free_list = NEXT_FREE_LISP_STRING (s); then the same string header object could be allocated both at main program level and in the signal handler. Or other things could go wrong, depending on precisely where the signal arrived. So it seems that these functions need BLOCK_INPUT. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 1:14 ` Richard M. Stallman @ 2006-01-20 3:56 ` Stefan Monnier 2006-01-20 14:49 ` Chong Yidong 2006-01-20 22:58 ` Richard M. Stallman 0 siblings, 2 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-20 3:56 UTC (permalink / raw) Cc: Chong Yidong, emacs-devel > I just noticed that allocate_string does nothing to prevent > signals from being handled. Neither does Fcons. The result Indeed, both those functions assume that signal handlers do not allocate cons cells or strings. In the Emacs-21-pretest days, Gerd tracked down a bug where this assumption was broken (the keybuf data filled by the signal handler was using a cons cell) and fixed it by changing the keybuf data so that the cons cell wasn't needed any more. > So it seems that these functions need BLOCK_INPUT. I think instead they should be disallowed in signal handlers. And AFAIK they are disallowed. But it's quite possible that some signal handler does it, even though it's disallowed. Maybe eassert(!handling_signal) should be added to allocate_string (and maybe it will catch the current bug). Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 3:56 ` Stefan Monnier @ 2006-01-20 14:49 ` Chong Yidong 2006-01-21 19:57 ` Richard M. Stallman 2006-01-20 22:58 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-20 14:49 UTC (permalink / raw) Cc: rms, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> I just noticed that allocate_string does nothing to prevent >> signals from being handled. Neither does Fcons. The result >> >> So it seems that these functions need BLOCK_INPUT. > > I think instead they should be disallowed in signal handlers. > And AFAIK they are disallowed. But it's quite possible that some signal > handler does it, even though it's disallowed. Why, is there any disadvantage to using BLOCK_INPUT in allocate_string and Fcons? It seems like a more robust solution than checking individual signal handlers to make sure they behave properly. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 14:49 ` Chong Yidong @ 2006-01-21 19:57 ` Richard M. Stallman 2006-01-22 17:37 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Richard M. Stallman @ 2006-01-21 19:57 UTC (permalink / raw) Cc: monnier, emacs-devel > I think instead they should be disallowed in signal handlers. > And AFAIK they are disallowed. But it's quite possible that some signal > handler does it, even though it's disallowed. Why, is there any disadvantage to using BLOCK_INPUT in allocate_string and Fcons? It seems like a more robust solution than checking individual signal handlers to make sure they behave properly. Either approach can work. If we've been using the solution of not allocating Lisp objects in the signal handlers, let's stick with that for the time being. In any case please do try the debugging statement that Stefan suggested, so we can see if this is indeed the cause of the problem. And it would be good to install that debugging statement permanently, for as long as we still use this design. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-21 19:57 ` Richard M. Stallman @ 2006-01-22 17:37 ` Stefan Monnier 0 siblings, 0 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-22 17:37 UTC (permalink / raw) Cc: Chong Yidong, emacs-devel >> I think instead they should be disallowed in signal handlers. And AFAIK >> they are disallowed. But it's quite possible that some signal handler >> does it, even though it's disallowed. > Why, is there any disadvantage to using BLOCK_INPUT in allocate_string > and Fcons? Performance, Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 3:56 ` Stefan Monnier 2006-01-20 14:49 ` Chong Yidong @ 2006-01-20 22:58 ` Richard M. Stallman 2006-01-21 4:48 ` Stefan Monnier 1 sibling, 1 reply; 58+ messages in thread From: Richard M. Stallman @ 2006-01-20 22:58 UTC (permalink / raw) Cc: cyd, emacs-devel Maybe eassert(!handling_signal) should be added to allocate_string (and maybe it will catch the current bug). It seems worth a try. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-20 22:58 ` Richard M. Stallman @ 2006-01-21 4:48 ` Stefan Monnier 2006-01-21 17:31 ` Chong Yidong ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-21 4:48 UTC (permalink / raw) Cc: cyd, emacs-devel > Maybe eassert(!handling_signal) should be added to allocate_string > (and maybe it will catch the current bug). > It seems worth a try. There's actually one candidate: #1 0x081dd84a in die (msg=0x8319288 "assertion failed: !handling_signal", file=0x8318980 "alloc.c", line=2744) at alloc.c:6210 #2 0x081e0f25 in Fcons (car=141994859, cdr=140190650) at alloc.c:2744 #3 0x08130686 in x_catch_errors (dpy=0x8808db8) at xterm.c:7462 #4 0x0813bb08 in x_real_positions (f=0x88c2518, xptr=0x47, yptr=0x47) at xfns.c:580 #5 0x08133d09 in handle_one_xevent (dpyinfo=0x8814cf0, eventp=0xbfffdbfc, finish=0xbfffdc88, hold_quit=0xbfffecbc) at xterm.c:5871 #6 0x081376bb in XTread_socket (sd=0, expected=1, hold_quit=0xbfffecbc) at xterm.c:6981 #7 0x08174b69 in read_avail_input (expected=1) at keyboard.c:6703 #8 0x08174d2a in handle_async_input () at keyboard.c:6855 if you look at x_catch_errors, you'll see that it allocates one lisp_cons cell, one lisp_string and one lisp_misc. Whether it's the cause of the bugs we see, I don't know, but since it's run from the signal handler, it can be executed at potentially any time. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-21 4:48 ` Stefan Monnier @ 2006-01-21 17:31 ` Chong Yidong 2006-01-22 3:57 ` Richard M. Stallman 2006-01-22 16:45 ` Stefan Monnier 2006-01-23 0:55 ` Stefan Monnier 2 siblings, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-21 17:31 UTC (permalink / raw) Cc: rms, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > if you look at x_catch_errors, you'll see that it allocates one lisp_cons cell, > one lisp_string and one lisp_misc. Whether it's the cause of the bugs we > see, I don't know, but since it's run from the signal handler, it can be > executed at potentially any time. How do we get around this? Is it OK for x_catch_errors to return without doing anything, if it is called in a signal handler? And, as I've asked previously, do you see any disadvantage to simply using BLOCK_INPUT in the string/cons allocation functions? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-21 17:31 ` Chong Yidong @ 2006-01-22 3:57 ` Richard M. Stallman 0 siblings, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-22 3:57 UTC (permalink / raw) Cc: monnier, emacs-devel How do we get around this? Is it OK for x_catch_errors to return without doing anything, if it is called in a signal handler? I don't think so. Catching errors is the only way to recover from certain errors that the X server reports. And, as I've asked previously, do you see any disadvantage to simply using BLOCK_INPUT in the string/cons allocation functions? The reason I'd rather not do it is simply that it is a big change in the design paradigm that has been followed thus far. It is safer to keep the paradigm unchanged, and fix whatever place fails to follow it. After the release, we could look at changing this paradigm, perhaps to use SYNC_INPUT. One way to fix it now is to make a different function, which uses preallocated data, for use in a signal handler. Does this patch fix the problem? *** xterm.c 19 Jan 2006 12:29:03 -0500 1.891 --- xterm.c 21 Jan 2006 16:30:44 -0500 *************** *** 336,341 **** --- 336,347 ---- void x_wm_set_window_state P_ ((struct frame *, int)); void x_wm_set_icon_pixmap P_ ((struct frame *, int)); void x_initialize P_ ((void)); + + static int x_signal_catch_errors P_ ((Display *)); + static void x_signal_uncatch_errors P_ ((int)); + static int x_signal_had_errors_p P_ (()); + static Lisp_Object x_signal_catch_errors_unwind P_ (()); + static void x_font_min_bounds P_ ((XFontStruct *, int *, int *)); static int x_compute_min_glyph_bounds P_ ((struct frame *)); static void x_update_end P_ ((struct frame *)); *************** *** 3725,3731 **** structure is changing at the same time this function is running. So at least we must not crash from them. */ ! count = x_catch_errors (FRAME_X_DISPLAY (*fp)); if (FRAME_X_DISPLAY_INFO (*fp)->grabbed && last_mouse_frame && FRAME_LIVE_P (last_mouse_frame)) --- 3731,3737 ---- structure is changing at the same time this function is running. So at least we must not crash from them. */ ! count = x_signal_catch_errors (FRAME_X_DISPLAY (*fp)); if (FRAME_X_DISPLAY_INFO (*fp)->grabbed && last_mouse_frame && FRAME_LIVE_P (last_mouse_frame)) *************** *** 3791,3800 **** #endif /* USE_X_TOOLKIT */ } ! if (x_had_errors_p (FRAME_X_DISPLAY (*fp))) f1 = 0; ! x_uncatch_errors (FRAME_X_DISPLAY (*fp), count); /* If not, is it one of our scroll bars? */ if (! f1) --- 3797,3806 ---- #endif /* USE_X_TOOLKIT */ } ! if (x_signal_had_errors_p ()) f1 = 0; ! x_signal_uncatch_errors (count); /* If not, is it one of our scroll bars? */ if (! f1) *************** *** 5711,5717 **** Display *d = event.xclient.display; /* Catch and ignore errors, in case window has been iconified by a window manager such as GWM. */ ! int count = x_catch_errors (d); XSetInputFocus (d, event.xclient.window, /* The ICCCM says this is the only valid choice. */ --- 5717,5723 ---- Display *d = event.xclient.display; /* Catch and ignore errors, in case window has been iconified by a window manager such as GWM. */ ! int count = x_signal_catch_errors (d); XSetInputFocus (d, event.xclient.window, /* The ICCCM says this is the only valid choice. */ *************** *** 5720,5726 **** /* This is needed to detect the error if there is an error. */ XSync (d, False); ! x_uncatch_errors (d, count); } /* Not certain about handling scroll bars here */ #endif /* 0 */ --- 5726,5732 ---- /* This is needed to detect the error if there is an error. */ XSync (d, False); ! x_signal_uncatch_errors (count); } /* Not certain about handling scroll bars here */ #endif /* 0 */ *************** *** 7591,7596 **** --- 7597,7667 ---- #endif /* ! 0 */ \f + Display *x_signal_catch_errors_dpy; + + Lisp_Object x_signal_error_message_string; + + static int + x_signal_catch_errors (dpy) + Display *dpy; + { + int count = SPECPDL_INDEX (); + + /* Make sure any errors from previous requests have been dealt with. */ + XSync (dpy, False); + + x_signal_catch_errors_dpy = dpy; + record_unwind_protect (x_signal_catch_errors_unwind, Qnil); + + SSET (x_signal_error_message_string, 0, 0); + + return count; + } + + /* Unbind the binding that we made to check for X errors. */ + + static Lisp_Object + x_signal_catch_errors_unwind (old_val) + Lisp_Object old_val; + { + /* The display may have been closed before this function is called. + Check if it is still open before calling XSync. */ + if (x_display_info_for_display (x_signal_catch_errors_dpy) != 0) + { + BLOCK_INPUT; + XSync (x_signal_catch_errors_dpy, False); + UNBLOCK_INPUT; + } + + return Qnil; + } + + /* Nonzero if we had any X protocol errors + since we did x_signal_catch_errors. */ + + static int + x_signal_had_errors_p () + { + Display *dpy = x_signal_catch_errors_dpy; + + /* Make sure to catch any errors incurred so far. */ + XSync (dpy, False); + + return SREF (x_signal_error_message_string, 0) != 0; + } + + /* Stop catching X protocol errors and let them make Emacs die. + DPY should be the display that was passed to x_catch_errors. + COUNT should be the value that was returned by + the corresponding call to x_catch_errors. */ + + static void + x_signal_uncatch_errors (count) + int count; + { + unbind_to (count, Qnil); + } + \f /* Handle SIGPIPE, which can happen when the connection to a server simply goes away. SIGPIPE is handled by x_connection_signal. Don't need to do anything, because the write which caused the *************** *** 10837,10842 **** --- 10908,10916 ---- staticpro (&last_mouse_press_frame); last_mouse_press_frame = Qnil; + + staticpro (&x_signal_error_message_string); + x_signal_error_message_string = make_uninit_string (X_ERROR_MESSAGE_SIZE); DEFVAR_BOOL ("x-use-underline-position-properties", &x_use_underline_position_properties, ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-21 4:48 ` Stefan Monnier 2006-01-21 17:31 ` Chong Yidong @ 2006-01-22 16:45 ` Stefan Monnier 2006-01-22 20:06 ` Andreas Schwab ` (2 more replies) 2006-01-23 0:55 ` Stefan Monnier 2 siblings, 3 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-22 16:45 UTC (permalink / raw) Cc: cyd, emacs-devel >> Maybe eassert(!handling_signal) should be added to allocate_string >> (and maybe it will catch the current bug). >> It seems worth a try. > There's actually one candidate: > #1 0x081dd84a in die (msg=0x8319288 "assertion failed: !handling_signal", > file=0x8318980 "alloc.c", line=2744) at alloc.c:6210 > #2 0x081e0f25 in Fcons (car=141994859, cdr=140190650) at alloc.c:2744 > #3 0x08130686 in x_catch_errors (dpy=0x8808db8) at xterm.c:7462 > #4 0x0813bb08 in x_real_positions (f=0x88c2518, xptr=0x47, yptr=0x47) > at xfns.c:580 > #5 0x08133d09 in handle_one_xevent (dpyinfo=0x8814cf0, eventp=0xbfffdbfc, > finish=0xbfffdc88, hold_quit=0xbfffecbc) at xterm.c:5871 > #6 0x081376bb in XTread_socket (sd=0, expected=1, hold_quit=0xbfffecbc) > at xterm.c:6981 > #7 0x08174b69 in read_avail_input (expected=1) at keyboard.c:6703 > #8 0x08174d2a in handle_async_input () at keyboard.c:6855 > if you look at x_catch_errors, you'll see that it allocates one lisp_cons > cell, one lisp_string and one lisp_misc. Whether it's the cause of the > bugs we see, I don't know, but since it's run from the signal handler, it > can be executed at potentially any time. The patch below should remove this particular problem. Stefan --- xterm.c 20 jan 2006 21:48:47 -0500 1.891 +++ xterm.c 22 jan 2006 11:36:08 -0500 @@ -1,6 +1,6 @@ /* X Communication module for terminals which understand the X protocol. Copyright (C) 1989, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, - 2002, 2003, 2004, 2005 Free Software Foundation, Inc. + 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. This file is part of GNU Emacs. @@ -7458,7 +7458,12 @@ /* If non-nil, this should be a string. It means catch X errors and store the error message in this string. */ -static Lisp_Object x_error_message_string; +struct x_error_message_stack { + char string[X_ERROR_MESSAGE_SIZE]; + Display *dpy; + struct x_error_message_stack *prev; +}; +static struct x_error_message_stack *x_error_message; /* An X error handler which stores the error message in x_error_message_string. This is called from x_error_handler if @@ -7470,7 +7475,7 @@ XErrorEvent *error; { XGetErrorText (display, error->error_code, - SDATA (x_error_message_string), + x_error_message->string, X_ERROR_MESSAGE_SIZE); } @@ -7495,16 +7500,23 @@ Display *dpy; { int count = SPECPDL_INDEX (); + struct x_error_message_stack *data = malloc (sizeof (*data)); + Lisp_Object dummy; +#ifdef ENABLE_CHECKING + dummy = make_number ((EMACS_INT)dpy + (EMACS_INT)x_error_message); +#else + dummy = Qnil +#endif /* Make sure any errors from previous requests have been dealt with. */ XSync (dpy, False); - record_unwind_protect (x_catch_errors_unwind, - Fcons (make_save_value (dpy, 0), - x_error_message_string)); + data->dpy = dpy; + data->string[0] = 0; + data->prev = x_error_message; + x_error_message = data; - x_error_message_string = make_uninit_string (X_ERROR_MESSAGE_SIZE); - SSET (x_error_message_string, 0, 0); + record_unwind_protect (x_catch_errors_unwind, dummy); return count; } @@ -7512,11 +7524,11 @@ /* Unbind the binding that we made to check for X errors. */ static Lisp_Object -x_catch_errors_unwind (old_val) - Lisp_Object old_val; +x_catch_errors_unwind (dummy) + Lisp_Object dummy; { - Lisp_Object first = XCAR (old_val); - Display *dpy = XSAVE_VALUE (first)->pointer; + Display *dpy = x_error_message->dpy; + struct x_error_message_stack *tmp; /* The display may have been closed before this function is called. Check if it is still open before calling XSync. */ @@ -7527,7 +7539,12 @@ UNBLOCK_INPUT; } - x_error_message_string = XCDR (old_val); + tmp = x_error_message; + x_error_message = x_error_message->prev; + free (tmp); + + eassert (dummy == make_number ((EMACS_INT)dpy + (EMACS_INT)x_error_message)); + return Qnil; } @@ -7543,8 +7560,8 @@ /* Make sure to catch any errors incurred so far. */ XSync (dpy, False); - if (SREF (x_error_message_string, 0)) - error (format, SDATA (x_error_message_string)); + if (x_error_message->string[0]) + error (format, x_error_message->string); } /* Nonzero if we had any X protocol errors @@ -7557,7 +7574,7 @@ /* Make sure to catch any errors incurred so far. */ XSync (dpy, False); - return SREF (x_error_message_string, 0) != 0; + return x_error_message->string[0] != 0; } /* Forget about any errors we have had, since we did x_catch_errors on DPY. */ @@ -7566,7 +7583,7 @@ x_clear_errors (dpy) Display *dpy; { - SSET (x_error_message_string, 0, 0); + x_error_message->string[0] = 0; } /* Stop catching X protocol errors and let them make Emacs die. @@ -7748,7 +7765,7 @@ Display *display; XErrorEvent *error; { - if (! NILP (x_error_message_string)) + if (x_error_message) x_error_catcher (display, error); else x_error_quitter (display, error); @@ -10818,8 +10835,7 @@ void syms_of_xterm () { - staticpro (&x_error_message_string); - x_error_message_string = Qnil; + x_error_message = NULL; staticpro (&x_display_name_list); x_display_name_list = Qnil; ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-22 16:45 ` Stefan Monnier @ 2006-01-22 20:06 ` Andreas Schwab 2006-01-23 0:10 ` Richard M. Stallman 2006-01-23 0:35 ` Ken Raeburn 2 siblings, 0 replies; 58+ messages in thread From: Andreas Schwab @ 2006-01-22 20:06 UTC (permalink / raw) Cc: cyd, rms, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > @@ -7495,16 +7500,23 @@ > Display *dpy; > { > int count = SPECPDL_INDEX (); > + struct x_error_message_stack *data = malloc (sizeof (*data)); > + Lisp_Object dummy; > +#ifdef ENABLE_CHECKING > + dummy = make_number ((EMACS_INT)dpy + (EMACS_INT)x_error_message); > +#else > + dummy = Qnil Missing semicolon. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-22 16:45 ` Stefan Monnier 2006-01-22 20:06 ` Andreas Schwab @ 2006-01-23 0:10 ` Richard M. Stallman 2006-01-23 0:35 ` Ken Raeburn 2 siblings, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-23 0:10 UTC (permalink / raw) Cc: cyd, emacs-devel If your fix works, it's fine with me to use your code instead of the code I wrote yesterday. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-22 16:45 ` Stefan Monnier 2006-01-22 20:06 ` Andreas Schwab 2006-01-23 0:10 ` Richard M. Stallman @ 2006-01-23 0:35 ` Ken Raeburn 2006-01-23 1:58 ` Stefan Monnier 2006-01-24 16:46 ` Richard M. Stallman 2 siblings, 2 replies; 58+ messages in thread From: Ken Raeburn @ 2006-01-23 0:35 UTC (permalink / raw) Cc: emacs-devel On Jan 22, 2006, at 11:45, Stefan Monnier wrote: >> if you look at x_catch_errors, you'll see that it allocates one >> lisp_cons >> cell, one lisp_string and one lisp_misc. Whether it's the cause >> of the >> bugs we see, I don't know, but since it's run from the signal >> handler, it >> can be executed at potentially any time. > > The patch below should remove this particular problem. > @@ -7495,16 +7500,23 @@ > Display *dpy; > { > int count = SPECPDL_INDEX (); > + struct x_error_message_stack *data = malloc (sizeof (*data)); It may just push the memory corruption from the Lisp object pool to the malloc pool, since malloc isn't safe for signal handlers either. Is it possible to pre-allocate this storage on a per-display basis, or might we have to deal with multiple error messages from one server without a chance to do processing outside of the signal handler? Ken ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-23 0:35 ` Ken Raeburn @ 2006-01-23 1:58 ` Stefan Monnier 2006-01-23 2:06 ` Stefan Monnier 2006-01-24 16:46 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2006-01-23 1:58 UTC (permalink / raw) Cc: emacs-devel > It may just push the memory corruption from the Lisp object pool to the > malloc pool, since malloc isn't safe for signal handlers either. Indeed: if you want to really be signal-safe, use -DSYNC_INPUT. With -DSYNC_INPUT, Emacs uses a whole lot of things in signal handlers, including malloc and Xlib calls. So my patch doesn't make it much worse. At least on some systems, Emacs uses malloc hooks to wrap malloc operations inside (UN)BLOCK_INPUT, so that this part "works" (i.e. there's no guarantee it's safe, but in practice on those systems it should be). > Is it possible to pre-allocate this storage on a per-display basis, or > might we have to deal with multiple error messages from one server > without a chance to do processing outside of the signal handler? If this were the only place where we do malloc in signal handlers, it might be worth the trouble. But given the current state of affairs, it'd be a waste of time, I think. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-23 1:58 ` Stefan Monnier @ 2006-01-23 2:06 ` Stefan Monnier 0 siblings, 0 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-23 2:06 UTC (permalink / raw) Cc: emacs-devel >> It may just push the memory corruption from the Lisp object pool to the >> malloc pool, since malloc isn't safe for signal handlers either. > Indeed: if you want to really be signal-safe, use -DSYNC_INPUT. > With -DSYNC_INPUT, Emacs uses a whole lot of things in signal handlers, ^^^ out > including malloc and Xlib calls. So my patch doesn't make it much worse. > At least on some systems, Emacs uses malloc hooks to wrap malloc operations > inside (UN)BLOCK_INPUT, so that this part "works" (i.e. there's no > guarantee it's safe, but in practice on those systems it should be). >> Is it possible to pre-allocate this storage on a per-display basis, or >> might we have to deal with multiple error messages from one server >> without a chance to do processing outside of the signal handler? > If this were the only place where we do malloc in signal handlers, it might > be worth the trouble. But given the current state of affairs, it'd be > a waste of time, I think. Sorry, Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-23 0:35 ` Ken Raeburn 2006-01-23 1:58 ` Stefan Monnier @ 2006-01-24 16:46 ` Richard M. Stallman 1 sibling, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-24 16:46 UTC (permalink / raw) Cc: monnier, emacs-devel It may just push the memory corruption from the Lisp object pool to the malloc pool, since malloc isn't safe for signal handlers either. Yes it is; malloc uses BLOCK_INPUT. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-21 4:48 ` Stefan Monnier 2006-01-21 17:31 ` Chong Yidong 2006-01-22 16:45 ` Stefan Monnier @ 2006-01-23 0:55 ` Stefan Monnier 2006-01-24 16:46 ` Richard M. Stallman 2 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2006-01-23 0:55 UTC (permalink / raw) Cc: cyd, emacs-devel >> Maybe eassert(!handling_signal) should be added to allocate_string >> (and maybe it will catch the current bug). >> It seems worth a try. Attached is another candidate for "random memory corruption" (tho, here it uses allocate_vectorlike, whose code is almost completely wrapped in BLOCK_INPUT except for: p->next = all_vectors; all_vectors = p; so it's probably not causing too much memory corrpution). Contrary to the previous one, I have no idea how to fix this one. Stefan (gdb) bt #0 abort () at emacs.c:464 #1 0x081946fa in die (msg=0x8265950 "assertion failed: !handling_signal", file=0x8265209 "alloc.c", line=2866) at alloc.c:6126 #2 0x08197013 in allocate_vectorlike (len=391, type=MEM_TYPE_VECTOR) at alloc.c:2866 #3 0x081971f4 in allocate_vector (nslots=391) at alloc.c:2894 #4 0x08197223 in Fmake_vector (length=3128, init=138309641) at alloc.c:2990 #5 0x08197562 in Fmake_char_table (purpose=138537273, init=138309641) at alloc.c:3015 #6 0x0811a984 in make_fontset (frame=142414076, name=138309641, base=142390252) at fontset.c:401 #7 0x0811b03d in make_fontset_for_ascii_face (f=0x87d10f8, base_fontset_id=3) at fontset.c:580 #8 0x080f49f9 in realize_face (cache=0x87d1780, attrs=0xbfffce90, c=0, base_face=0x0, former_face_id=-1) at xfaces.c:7224 #9 0x080f6236 in lookup_face (f=0x87d10f8, attr=0xbfffce90, c=0, base_face=0x0) at xfaces.c:5685 #10 0x080f66d3 in face_at_buffer_position (w=0x87d1258, pos=768, region_beg=0, region_end=0, endptr=0xbfffd0b0, limit=769, mouse=1) at xfaces.c:7690 #11 0x0809dee9 in note_mouse_highlight (f=0x87d10f8, x=88, y=285) at xdisp.c:22533 #12 0x080fd0ee in note_mouse_movement (frame=0x87d10f8, event=0xbfffd624) at xterm.c:3614 #13 0x08103405 in handle_one_xevent (dpyinfo=0x871fb28, eventp=0xbfffd6f0, finish=0xbfffd77c, hold_quit=0xbfffe7b0) at xterm.c:6573 #14 0x0810683b in XTread_socket (sd=0, expected=1, hold_quit=0xbfffe7b0) at xterm.c:7021 #15 0x0813b1b9 in read_avail_input (expected=<value optimized out>) at keyboard.c:6712 #16 0x0813b35a in handle_async_input () at keyboard.c:6858 #17 0x0813b389 in input_available_signal (signo=29) at keyboard.c:6900 #18 <signal handler called> ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-23 0:55 ` Stefan Monnier @ 2006-01-24 16:46 ` Richard M. Stallman 2006-01-24 17:57 ` Kim F. Storm 2006-01-26 1:41 ` Chong Yidong 0 siblings, 2 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-24 16:46 UTC (permalink / raw) Cc: cyd, emacs-devel One possible solution would be to realize the mouse-highlighted version of each face when we realize the face itself. In effect, that means calling realize_face earlier, so that when lookup_face is called in the signal handler, it will find what it needs. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-24 16:46 ` Richard M. Stallman @ 2006-01-24 17:57 ` Kim F. Storm 2006-01-24 18:33 ` Chong Yidong 2006-01-25 15:45 ` Richard M. Stallman 2006-01-26 1:41 ` Chong Yidong 1 sibling, 2 replies; 58+ messages in thread From: Kim F. Storm @ 2006-01-24 17:57 UTC (permalink / raw) Cc: cyd, Stefan Monnier, emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: > One possible solution would be to realize the mouse-highlighted version > of each face when we realize the face itself. In effect, that means > calling realize_face earlier, so that when lookup_face is called > in the signal handler, it will find what it needs. All of this seems much trickier than just defining SYNC_INPUT by default. -- Kim F. Storm <storm@cua.dk> http://www.cua.dk ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-24 17:57 ` Kim F. Storm @ 2006-01-24 18:33 ` Chong Yidong 2006-01-25 15:45 ` Richard M. Stallman 1 sibling, 0 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-24 18:33 UTC (permalink / raw) Cc: emacs-devel, rms, Stefan Monnier > "Richard M. Stallman" <rms@gnu.org> writes: > >> One possible solution would be to realize the mouse-highlighted version >> of each face when we realize the face itself. In effect, that means >> calling realize_face earlier, so that when lookup_face is called >> in the signal handler, it will find what it needs. > > All of this seems much trickier than just defining SYNC_INPUT by default. I'm beginning to think so too. Just by eyeballing, I can find several other places in handle_one_xevent that can end up calling Fcons or Fmake_vector: 5884: x_handle_dnd_message -> Fmake_vector 6069: record_asynch_buffer_change -> kbd_buffer_store_event -> kbd_buffer_store_event_hold -> Fcons 6865: kbd_buffer_store_event_hold -> Fcons 6882, 6888: gen_help_event -> kbd_buffer_store_event -> kbd_buffer_store_event_hold -> Fcons ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-24 17:57 ` Kim F. Storm 2006-01-24 18:33 ` Chong Yidong @ 2006-01-25 15:45 ` Richard M. Stallman 1 sibling, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-25 15:45 UTC (permalink / raw) Cc: cyd, monnier, emacs-devel All of this seems much trickier than just defining SYNC_INPUT by default. We can do that if we solve its problems. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-24 16:46 ` Richard M. Stallman 2006-01-24 17:57 ` Kim F. Storm @ 2006-01-26 1:41 ` Chong Yidong 2006-01-26 17:46 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-26 1:41 UTC (permalink / raw) Cc: Stefan Monnier, emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: > One possible solution would be to realize the mouse-highlighted version > of each face when we realize the face itself. In effect, that means > calling realize_face earlier, so that when lookup_face is called > in the signal handler, it will find what it needs. This is impossible with the current code. If you look at face_at_buffer_position, you see that it generates the mouse face attributes by merging the mouse faces specified by all the overlays present, plus the mouse-face text property. We obviously can't realize every possible combination of defined mouse faces that have been defined. For the sake of the release, how about just moving the threatened portion of allocate_vectorlike into the BLOCK_INPUT section that's already present in the function? This would also fix the other bug I pointed out in x_handle_dnd_event. After the release, we can come back and redesign the x signal handler to behave nicely, or fix the problems with DSYNC_INPUT and make that the default. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-26 1:41 ` Chong Yidong @ 2006-01-26 17:46 ` Richard M. Stallman 2006-01-26 18:40 ` Stefan Monnier 2006-01-26 19:10 ` Chong Yidong 0 siblings, 2 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-26 17:46 UTC (permalink / raw) Cc: monnier, emacs-devel For the sake of the release, how about just moving the threatened portion of allocate_vectorlike into the BLOCK_INPUT section that's already present in the function? Ok. But can all the known problems be fixed this way? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-26 17:46 ` Richard M. Stallman @ 2006-01-26 18:40 ` Stefan Monnier 2006-01-26 19:45 ` Chong Yidong 2006-01-27 22:32 ` Richard M. Stallman 2006-01-26 19:10 ` Chong Yidong 1 sibling, 2 replies; 58+ messages in thread From: Stefan Monnier @ 2006-01-26 18:40 UTC (permalink / raw) Cc: Chong Yidong, emacs-devel > For the sake of the release, how about just moving the threatened > portion of allocate_vectorlike into the BLOCK_INPUT section that's > already present in the function? > Ok. There's no BLOCK_INPUT used in allocate_vectorlike unless DOUG_LEA_MALLOC is defined. But admittedly, allocate_vectorlike is very slow anyway (at least on platforms where we use conservative stack scanning, i.e. in 99% of the cases) so adding a BLOCK_INPUT there isn't a problem. > But can all the known problems be fixed this way? At the cost of adding BLOCK_INPUT for each allocation function? Sure, but I'd rather not do that. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-26 18:40 ` Stefan Monnier @ 2006-01-26 19:45 ` Chong Yidong 2006-01-27 22:32 ` Richard M. Stallman 2006-01-27 22:32 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-26 19:45 UTC (permalink / raw) Cc: rms, emacs-devel >> But can all the known problems be fixed this way? > > At the cost of adding BLOCK_INPUT for each allocation function? Sure, but > I'd rather not do that. I don't see any better solution. It is non-trivial to fix the behavior of handle_one_xevent with synched input. I, for one, don't see any clean way to fix note note_mouse_highlight and x_handle_dnd_message. And we can't be sure that we fixed all the bugs in the signal handler. Fixing DSYNC_INPUT so that RMS is happy with it would also delay the release considerably. Finally, putting the BLOCK_INPUT's does not seem to affect performance noticeably. rm -f *.elc; time emacs --batch -f batch-byte-compile-if-not-done *.el goes from real 0m53.052s real 0m53.303s user 0m50.528s to user 0m50.511s sys 0m0.537s sys 0m0.526s If there are no objections, I'll commit this, and we can consider this issue closed until *after* the release. *** emacs/src/alloc.c.~1.388.~ 2006-01-26 13:44:24.000000000 -0500 --- emacs/src/alloc.c 2006-01-26 14:03:45.000000000 -0500 *************** *** 1422,1428 **** { INTERVAL val; ! eassert (!handling_signal); if (interval_free_list) { --- 1422,1432 ---- { INTERVAL val; ! /* eassert (!handling_signal); */ ! ! #ifndef SYNC_INPUT ! BLOCK_INPUT; ! #endif if (interval_free_list) { *************** *** 1445,1450 **** --- 1449,1459 ---- } val = &interval_block->intervals[interval_block_index++]; } + + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + consing_since_gc += sizeof (struct interval); intervals_consed++; RESET_INTERVAL (val); *************** *** 1842,1848 **** { struct Lisp_String *s; ! eassert (!handling_signal); /* If the free-list is empty, allocate a new string_block, and add all the Lisp_Strings in it to the free-list. */ --- 1851,1861 ---- { struct Lisp_String *s; ! /* eassert (!handling_signal); */ ! ! #ifndef SYNC_INPUT ! BLOCK_INPUT; ! #endif /* If the free-list is empty, allocate a new string_block, and add all the Lisp_Strings in it to the free-list. */ *************** *** 1873,1878 **** --- 1886,1895 ---- s = string_free_list; string_free_list = NEXT_FREE_LISP_STRING (s); + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + /* Probably not strictly necessary, but play it safe. */ bzero (s, sizeof *s); *************** *** 1920,1925 **** --- 1937,1948 ---- /* Determine the number of bytes needed to store NBYTES bytes of string data. */ needed = SDATA_SIZE (nbytes); + old_data = s->data ? SDATA_OF_STRING (s) : NULL; + old_nbytes = GC_STRING_BYTES (s); + + #ifndef SYNC_INPUT + BLOCK_INPUT; + #endif if (nbytes > LARGE_STRING_BYTES) { *************** *** 1974,1985 **** else b = current_sblock; - old_data = s->data ? SDATA_OF_STRING (s) : NULL; - old_nbytes = GC_STRING_BYTES (s); - data = b->next_free; b->next_free = (struct sdata *) ((char *) data + needed + GC_STRING_EXTRA); data->string = s; s->data = SDATA_DATA (data); #ifdef GC_CHECK_STRING_BYTES --- 1997,2009 ---- else b = current_sblock; data = b->next_free; b->next_free = (struct sdata *) ((char *) data + needed + GC_STRING_EXTRA); + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + data->string = s; s->data = SDATA_DATA (data); #ifdef GC_CHECK_STRING_BYTES *************** *** 2560,2566 **** { register Lisp_Object val; ! eassert (!handling_signal); if (float_free_list) { --- 2584,2594 ---- { register Lisp_Object val; ! /* eassert (!handling_signal); */ ! ! #ifndef SYNC_INPUT ! BLOCK_INPUT; ! #endif if (float_free_list) { *************** *** 2587,2592 **** --- 2615,2624 ---- float_block_index++; } + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + XFLOAT_DATA (val) = float_value; eassert (!FLOAT_MARKED_P (XFLOAT (val))); consing_since_gc += sizeof (struct Lisp_Float); *************** *** 2681,2687 **** { register Lisp_Object val; ! eassert (!handling_signal); if (cons_free_list) { --- 2713,2723 ---- { register Lisp_Object val; ! /* eassert (!handling_signal); */ ! ! #ifndef SYNC_INPUT ! BLOCK_INPUT; ! #endif if (cons_free_list) { *************** *** 2707,2712 **** --- 2743,2752 ---- cons_block_index++; } + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + XSETCAR (val, car); XSETCDR (val, cdr); eassert (!CONS_MARKED_P (XCONS (val))); *************** *** 2880,2887 **** --- 2920,2936 ---- consing_since_gc += nbytes; vector_cells_consed += len; + #ifndef SYNC_INPUT + BLOCK_INPUT; + #endif + p->next = all_vectors; all_vectors = p; + + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + ++n_vectors; return p; } *************** *** 3162,3167 **** --- 3211,3220 ---- eassert (!handling_signal); + #ifndef SYNC_INPUT + BLOCK_INPUT; + #endif + if (symbol_free_list) { XSETSYMBOL (val, symbol_free_list); *************** *** 3183,3188 **** --- 3236,3245 ---- symbol_block_index++; } + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + p = XSYMBOL (val); p->xname = name; p->plist = Qnil; *************** *** 3242,3248 **** { Lisp_Object val; ! eassert (!handling_signal); if (marker_free_list) { --- 3299,3309 ---- { Lisp_Object val; ! /* eassert (!handling_signal); */ ! ! #ifndef SYNC_INPUT ! BLOCK_INPUT; ! #endif if (marker_free_list) { *************** *** 3266,3271 **** --- 3327,3336 ---- marker_block_index++; } + #ifndef SYNC_INPUT + UNBLOCK_INPUT; + #endif + --total_free_markers; consing_since_gc += sizeof (union Lisp_Misc); misc_objects_consed++; ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-26 19:45 ` Chong Yidong @ 2006-01-27 22:32 ` Richard M. Stallman 2006-01-27 23:33 ` Stefan Monnier 2006-01-29 4:58 ` Chong Yidong 0 siblings, 2 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-27 22:32 UTC (permalink / raw) Cc: monnier, emacs-devel I don't see any better solution. It is non-trivial to fix the behavior of handle_one_xevent with synched input. I, for one, don't see any clean way to fix note note_mouse_highlight and x_handle_dnd_message. Could you tell us about these problems? Finally, putting the BLOCK_INPUT's does not seem to affect performance noticeably. That doesn't surprise me--they only do a few instructions in the usual case. Would you please install your patch? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-27 22:32 ` Richard M. Stallman @ 2006-01-27 23:33 ` Stefan Monnier 2006-01-29 14:53 ` Chong Yidong 2006-01-29 4:58 ` Chong Yidong 1 sibling, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2006-01-27 23:33 UTC (permalink / raw) Cc: Chong Yidong, emacs-devel > Finally, putting the BLOCK_INPUT's does not seem to affect performance > noticeably. Could you wrap those BLOCK_INPUT inside #ifndef SYNC_INPUT ? That'd be great, thanks, Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-27 23:33 ` Stefan Monnier @ 2006-01-29 14:53 ` Chong Yidong 0 siblings, 0 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-29 14:53 UTC (permalink / raw) Cc: rms, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Finally, putting the BLOCK_INPUT's does not seem to affect performance >> noticeably. > > Could you wrap those BLOCK_INPUT inside #ifndef SYNC_INPUT ? > That'd be great, thanks, They already are. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-27 22:32 ` Richard M. Stallman 2006-01-27 23:33 ` Stefan Monnier @ 2006-01-29 4:58 ` Chong Yidong 2006-01-30 0:57 ` Richard M. Stallman 1 sibling, 1 reply; 58+ messages in thread From: Chong Yidong @ 2006-01-29 4:58 UTC (permalink / raw) Cc: monnier, emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: > I don't see any better solution. It is non-trivial to fix the > behavior of handle_one_xevent with synched input. I, for one, don't > see any clean way to fix note_mouse_highlight and > x_handle_dnd_message. > > Could you tell us about these problems? Sure. We already discussed the note_mouse_highlight problem somewhat. Basically, it can lead to calling realize_face for a mouse face, which can call allocate_vectorlike and make_uninit_multibyte_string. I don't know how to avoid this, since the mouse face is constructed on the fly by merging overlay and text property mouse faces. As for x_handle_dnd_message, this function has to allocate a vector, recording the filename "dragged and dropped" into Emacs, that is put into the arg member of a struct input_event for later processing. We can't change the arg member from a Lisp_Object to a pointer (and use a malloc'ed block instead of a vector) because the arg member is used by other event handlers to pass stuff (e.g., the toolbar handlers). One solution is to add an additional pointer member to struct input_event, or make a global variable to store drag and drop events. But I'd rather not put any more effort into tweaking handle_one_xevent to avoid allocating Lisp structures. After the release, we can fix DSYNC_INPUT and make it the default -- DSYNC_INPUT just seems like a better design. > Would you please install your patch? Done. Can we move forward with the release now? I believe this was the main bug blocking the process. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-29 4:58 ` Chong Yidong @ 2006-01-30 0:57 ` Richard M. Stallman 2006-01-30 1:06 ` Chong Yidong 0 siblings, 1 reply; 58+ messages in thread From: Richard M. Stallman @ 2006-01-30 0:57 UTC (permalink / raw) Cc: monnier, emacs-devel > I don't see any better solution. It is non-trivial to fix the > behavior of handle_one_xevent with synched input. I, for one, don't > see any clean way to fix note_mouse_highlight and > x_handle_dnd_message. Perhaps there is a misunderstanding. You said it was hard to fix these "with synched input". So I thought you were talking about problems in SYNC_INPUT. But your response now seems to refer to the problems that we have recently observed _without_ SYNC_INPUT. We already discussed the note_mouse_highlight problem somewhat. Basically, it can lead to calling realize_face for a mouse face, which can call allocate_vectorlike and make_uninit_multibyte_string. I don't know how to avoid this, since the mouse face is constructed on the fly by merging overlay and text property mouse faces. Using SYNC_INPUT would fix that, wouldn't it? And I think it would fix those other two problems, too. So, do you know of any problems in SYNC_INPUT aside from not handling immediate_quit? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-30 0:57 ` Richard M. Stallman @ 2006-01-30 1:06 ` Chong Yidong 0 siblings, 0 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-30 1:06 UTC (permalink / raw) Cc: monnier, emacs-devel > So, do you know of any problems in SYNC_INPUT > aside from not handling immediate_quit? No. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-26 18:40 ` Stefan Monnier 2006-01-26 19:45 ` Chong Yidong @ 2006-01-27 22:32 ` Richard M. Stallman 1 sibling, 0 replies; 58+ messages in thread From: Richard M. Stallman @ 2006-01-27 22:32 UTC (permalink / raw) Cc: cyd, emacs-devel > But can all the known problems be fixed this way? At the cost of adding BLOCK_INPUT for each allocation function? Sure, but I'd rather not do that. That is the only localized simple fix we know of, so we have to do that. We could switch to SYNC_INPUT at some point if its problems are solved. One is the immediate_quit problem; Yidong seems to say there are others. For now, I'd rather stay with these BLOCK_INPUT calls. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: allocate_string_data memory corruption 2006-01-26 17:46 ` Richard M. Stallman 2006-01-26 18:40 ` Stefan Monnier @ 2006-01-26 19:10 ` Chong Yidong 1 sibling, 0 replies; 58+ messages in thread From: Chong Yidong @ 2006-01-26 19:10 UTC (permalink / raw) Cc: emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: > For the sake of the release, how about just moving the threatened > portion of allocate_vectorlike into the BLOCK_INPUT section that's > already present in the function? > > But can all the known problems be fixed this way? Yes. ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2006-01-30 1:06 UTC | newest] Thread overview: 58+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-18 16:57 allocate_string_data memory corruption Chong Yidong 2006-01-18 20:48 ` Stefan Monnier 2006-01-20 0:45 ` Chong Yidong 2006-01-20 1:14 ` Richard M. Stallman 2006-01-20 3:48 ` Stefan Monnier 2006-01-23 20:21 ` Stefan Monnier 2006-01-24 17:23 ` Chong Yidong 2006-01-18 21:35 ` Ken Raeburn 2006-01-18 23:56 ` Chong Yidong 2006-01-19 8:53 ` Romain Francoise 2006-01-19 20:57 ` Stefan Monnier 2006-01-19 22:48 ` Kim F. Storm 2006-01-20 3:46 ` Stefan Monnier 2006-01-20 22:58 ` Richard M. Stallman 2006-01-25 3:26 ` Chong Yidong 2006-01-25 15:45 ` Richard M. Stallman 2006-01-20 1:14 ` Richard M. Stallman 2006-01-20 9:28 ` Ken Raeburn 2006-01-20 22:58 ` Richard M. Stallman 2006-01-18 22:06 ` Eli Zaretskii 2006-01-18 23:48 ` David Kastrup 2006-01-18 23:48 ` Chong Yidong 2006-01-19 1:15 ` Stefan Monnier 2006-01-19 3:21 ` Ken Raeburn 2006-01-19 4:36 ` Eli Zaretskii 2006-01-20 1:14 ` Richard M. Stallman 2006-01-20 3:56 ` Stefan Monnier 2006-01-20 14:49 ` Chong Yidong 2006-01-21 19:57 ` Richard M. Stallman 2006-01-22 17:37 ` Stefan Monnier 2006-01-20 22:58 ` Richard M. Stallman 2006-01-21 4:48 ` Stefan Monnier 2006-01-21 17:31 ` Chong Yidong 2006-01-22 3:57 ` Richard M. Stallman 2006-01-22 16:45 ` Stefan Monnier 2006-01-22 20:06 ` Andreas Schwab 2006-01-23 0:10 ` Richard M. Stallman 2006-01-23 0:35 ` Ken Raeburn 2006-01-23 1:58 ` Stefan Monnier 2006-01-23 2:06 ` Stefan Monnier 2006-01-24 16:46 ` Richard M. Stallman 2006-01-23 0:55 ` Stefan Monnier 2006-01-24 16:46 ` Richard M. Stallman 2006-01-24 17:57 ` Kim F. Storm 2006-01-24 18:33 ` Chong Yidong 2006-01-25 15:45 ` Richard M. Stallman 2006-01-26 1:41 ` Chong Yidong 2006-01-26 17:46 ` Richard M. Stallman 2006-01-26 18:40 ` Stefan Monnier 2006-01-26 19:45 ` Chong Yidong 2006-01-27 22:32 ` Richard M. Stallman 2006-01-27 23:33 ` Stefan Monnier 2006-01-29 14:53 ` Chong Yidong 2006-01-29 4:58 ` Chong Yidong 2006-01-30 0:57 ` Richard M. Stallman 2006-01-30 1:06 ` Chong Yidong 2006-01-27 22:32 ` Richard M. Stallman 2006-01-26 19:10 ` Chong Yidong
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).