* mark_object crash in 22.1 and latest CVS (as of tonight) @ 2007-11-09 3:55 Kalman Reti 2007-11-09 11:32 ` Kalman Reti 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-09 3:55 UTC (permalink / raw) To: bug-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 128 bytes --] See attached file for gdb session of garbage collector crash in a linux emacs built from sources checked out tonight. Kalman [-- Attachment #2: emacscrash.text --] [-- Type: text/plain, Size: 7611 bytes --] This is a garbage-collect crash in a built-from-CVS emacs tree checked out tonight (Nov 8, 2007). I had originally experienced this crash in 22.1, both on Windows and Linux, but wanted to make sure the bug existed in the latest version before reporting it. I've written some functions which issue Shell Commands to interact with our perforce server at work; these commands parse the *Shell Output Buffer* to pick up bits of information. These have been working very well for me, but today I got a reproducible case that crashes Emacs. Unfortunately, it is only reproducible after issuing many commands against our perforce server. So I built from sources, ran with gdb, and captured the following information. The object it trips over is always a misc free cell and it always hits the default leg of the case statement in mark_object. Let me know if you need me to collect more information. $ gdb ./emacs gdb ./emacs GNU gdb Red Hat Linux (5.3.90-0.20030710.41.2.1rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"...Using host libthread_db library "/lib/libthread_db.so.1". DISPLAY = :1.0 TERM = dumb Breakpoint 1 at 0x80e039a: file emacs.c, line 431. Breakpoint 2 at 0x80f7145: file sysdep.c, line 1435. (gdb) run Starting program: /u/kreti/gnuemacs-linux11/emacs/src/emacs -geometry 80x40+0+0 Breakpoint 1, abort () at emacs.c:431 431 kill (getpid (), SIGABRT); (gdb) where #0 abort () at emacs.c:431 #1 0x0812b179 in mark_object (arg=147211050) at alloc.c:5734 #2 0x0812b1da in mark_object (arg=141537485) at alloc.c:5751 #3 0x0812b1da in mark_object (arg=141537437) at alloc.c:5751 #4 0x0812b2b2 in mark_buffer (buf=146936428) at alloc.c:5808 #5 0x0812ae48 in mark_object (arg=146936428) at alloc.c:5558 #6 0x0812b0ec in mark_object (arg=138283458) at alloc.c:5679 #7 0x0812b026 in mark_object (arg=137558905) at alloc.c:5639 #8 0x0812b1da in mark_object (arg=137859549) at alloc.c:5751 #9 0x0812b1da in mark_object (arg=137859861) at alloc.c:5751 #10 0x0812b0e3 in mark_object (arg=137640922) at alloc.c:5678 #11 0x0812b026 in mark_object (arg=137728969) at alloc.c:5639 #12 0x0812b1da in mark_object (arg=137854981) at alloc.c:5751 #13 0x0812b1da in mark_object (arg=137400253) at alloc.c:5751 #14 0x0812b038 in mark_object (arg=141380333) at alloc.c:5641 #15 0x0812aec3 in mark_object (arg=141391156) at alloc.c:5581 #16 0x0812b02f in mark_object (arg=137826961) at alloc.c:5640 #17 0x0812b1da in mark_object (arg=141380237) at alloc.c:5751 #18 0x0812b038 in mark_object (arg=137459345) at alloc.c:5641 #19 0x0812b1da in mark_object (arg=139297181) at alloc.c:5751 #20 0x0812b038 in mark_object (arg=139297133) at alloc.c:5641 #21 0x0812b1da in mark_object (arg=137860261) at alloc.c:5751 #22 0x0812b038 in mark_object (arg=137678425) at alloc.c:5641 #23 0x0812b1da in mark_object (arg=141473229) at alloc.c:5751 #24 0x0812aec3 in mark_object (arg=141688156) at alloc.c:5581 #25 0x0812b02f in mark_object (arg=144524753) at alloc.c:5640 #26 0x0812b1da in mark_object (arg=144499205) at alloc.c:5751 #27 0x0812b1da in mark_object (arg=144499437) at alloc.c:5751 #28 0x0812b02f in mark_object (arg=144524729) at alloc.c:5640 #29 0x0812ad6f in mark_vectorlike (ptr=0x830c968) at alloc.c:5456 #30 0x0812b004 in mark_object (arg=137415020) at alloc.c:5628 #31 0x0812a786 in Fgarbage_collect () at alloc.c:5141 #32 0x0813df5a in Ffuncall (nargs=1, args=0xbffec420) at eval.c:3021 #33 0x081619b4 in Fbyte_code (bytestr=144658787, vector=144663148, maxdepth=56) at bytecode.c:679 #34 0x0813e46a in funcall_lambda (fun=144663356, nargs=3, arg_vector=0xbffec4e0) at eval.c:3211 #35 0x0813e1b6 in apply_lambda (fun=144663356, args=146885917, eval_flag=1) at eval.c:3135 #36 0x0813d703 in Feval (form=146885909) at eval.c:2415 #37 0x0813b089 in Fsetq (args=146885901) at eval.c:552 #38 0x0813d43a in Feval (form=146885893) at eval.c:2302 #39 0x0813d50d in Feval (form=146885885) at eval.c:2340 #40 0x0813df6f in Ffuncall (nargs=2, args=0xbffec834) at eval.c:3024 #41 0x081619b4 in Fbyte_code (bytestr=136524459, vector=136524476, maxdepth=24) at bytecode.c:679 #42 0x0813e46a in funcall_lambda (fun=136524420, nargs=1, arg_vector=0xbffec944) at eval.c:3211 #43 0x0813e089 in Ffuncall (nargs=2, args=0xbffec940) at eval.c:3081 #44 0x081619b4 in Fbyte_code (bytestr=136524707, vector=136524724, maxdepth=24) at bytecode.c:679 #45 0x0813e46a in funcall_lambda (fun=136524668, nargs=1, arg_vector=0xbffeca54) at eval.c:3211 #46 0x0813e089 in Ffuncall (nargs=2, args=0xbffeca50) at eval.c:3081 #47 0x081619b4 in Fbyte_code (bytestr=136522907, vector=136522924, maxdepth=16) at bytecode.c:679 #48 0x0813e46a in funcall_lambda (fun=136522876, nargs=0, arg_vector=0xbffecb84) at eval.c:3211 #49 0x0813e089 in Ffuncall (nargs=1, args=0xbffecb80) at eval.c:3081 #50 0x0813dc34 in apply1 (fn=138307105, arg=137413969) at eval.c:2765 #51 0x081398fc in Fcall_interactively (function=138307105, record_flag=137413969, keys=137462244) at callint.c:385 #52 0x080edb15 in Fcommand_execute (cmd=138307105, record_flag=137413969, keys=137413969, special=137413969) at keyboard.c:10363 #53 0x080e3c65 in command_loop_1 () at keyboard.c:1939 #54 0x0813c422 in internal_condition_case (bfun=0x80e2f70 <command_loop_1>, handlers=137480609, hfun=0x80e2a3c <cmd_error>) at eval.c:1493 #55 0x080e2d0e in command_loop_2 () at keyboard.c:1396 #56 0x0813bf93 in internal_catch (tag=137462905, func=0x80e2cf0 <command_loop_2>, arg=137413969) at eval.c:1229 #57 0x080e2c9c in command_loop () at keyboard.c:1375 #58 0x080e26c0 in recursive_edit_1 () at keyboard.c:984 #59 0x080e2800 in Frecursive_edit () at keyboard.c:1046 #60 0x080e1695 in main (argc=3, argv=0xbffed334) at emacs.c:1777 Lisp Backtrace: "garbage-collect" (0xbffec424) "changesets-between" (0xbffec4e0) "setq" (0xbffec668) "length" (0xbffec728) "eval" (0xbffec838) "eval-last-sexp-1" (0xbffec944) "eval-last-sexp" (0xbffeca54) "eval-print-last-sexp" (0xbffecb84) "call-interactively" (0xbffecd30) (gdb) print 146936428 $1 = 146936428 (gdb) pr #<buffer > (gdb) print 141537437 $2 = 141537437 (gdb) pr ((1 . 73) ("//depot/release-13-30/src/Makefile#42 - edit change 227204 (text) " . 1) (#<misc free cell> . -58) (#<misc free cell> . -65) (#<marker in no buffer> . -58) (#<marker in no buffer> . -64) (1 . 73) ("//depot/V13-30-patch/src/Makefile ... #1 change 227756 branch on 2007/11/02 by majormajor@majormajor-p4branch-auto521 (text) 'Create' ... ... branch from //depot/release-13-30/src/Makefile#1,#42 " . 1) (#<marker in no buffer> . -143) (#<marker in no buffer> . -202) (#<misc free cell> . -164) (#<misc free cell> . -176) (#<marker in no buffer> . -200) (#<marker in no buffer> . -202) (1 . 204) ("//depot/V13-30-patch/src/Makefile#1 - branch change 227756 (text) " . 1) (#<marker in no buffer> . -58) (#<marker in no buffer> . -65) (#<marker in no buffer> . -58) (#<marker in no buffer> . -64) (1 . 73)) (gdb) print 141537485 $3 = 141537485 (gdb) pr (#<misc free cell> . -58) (gdb) print 147211050 $4 = 147211050 (gdb) pr #<misc free cell> (gdb) xmiscfree 147211050 $5 = (struct Lisp_Free *) 0x8c64328 (gdb) pr 18401381 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-09 3:55 mark_object crash in 22.1 and latest CVS (as of tonight) Kalman Reti @ 2007-11-09 11:32 ` Kalman Reti 2007-11-10 10:19 ` Kalman Reti [not found] ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org> 0 siblings, 2 replies; 22+ messages in thread From: Kalman Reti @ 2007-11-09 11:32 UTC (permalink / raw) To: bug-gnu-emacs; +Cc: kalman.reti Adding a subcase of Lisp_Misc_Free inside the switch (XMISCTYPE (obj)) inside the case Lisp_Misc: (in mark_object) which calls break (i.e. ignores it) causes my crash to go away. I don't understand how Lisp_Misc_Free objects are supposed to be handled, so I'm not terribly confident of this fix. On Nov 8, 2007 10:55 PM, Kalman Reti <kalman.reti@gmail.com> wrote: > See attached file for gdb session of garbage collector crash in > a linux emacs built from sources checked out tonight. > > Kalman > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-09 11:32 ` Kalman Reti @ 2007-11-10 10:19 ` Kalman Reti [not found] ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org> 1 sibling, 0 replies; 22+ messages in thread From: Kalman Reti @ 2007-11-10 10:19 UTC (permalink / raw) To: bug-gnu-emacs; +Cc: kalman.reti So, a little more research indicates that my fix is likely wrong, since in 2004 an equivalent fix was made and then rescinded after the code was added to remove markers that were in buffer undo lists at the end of the GC. Perhaps there are other places where such markers could exist, e.g. perhaps the place(s) storing what (match-data) returns. Can anyone elucidate the theory of Lisp_Misc_Free objects? Is the fact that any pointers to such objects exist after the GC the real bug or are they allowed to survive a GC and are somehow supposed to be handled in some other way elsewhere? Since I have a reproducible test case I'd be happy to track down where these are coming from, but I need some help (in the form of information) to know what I'm chasing. On Nov 9, 2007 6:32 AM, Kalman Reti <kalman.reti@gmail.com> wrote: > Adding a subcase of Lisp_Misc_Free inside the > > switch (XMISCTYPE (obj)) > > inside the > > case Lisp_Misc: > > (in mark_object) which calls break (i.e. ignores it) causes my crash > to go away. > > I don't understand how Lisp_Misc_Free objects are supposed to > be handled, so I'm not terribly confident of this fix. > > > > On Nov 8, 2007 10:55 PM, Kalman Reti <kalman.reti@gmail.com> wrote: > > See attached file for gdb session of garbage collector crash in > > a linux emacs built from sources checked out tonight. > > > > Kalman > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <E1Ir5Gz-0002TS-8T@fencepost.gnu.org>]
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) [not found] ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org> @ 2007-11-12 11:40 ` Kalman Reti 2007-11-12 22:03 ` Stefan Monnier 2007-11-13 5:10 ` Richard Stallman 0 siblings, 2 replies; 22+ messages in thread From: Kalman Reti @ 2007-11-12 11:40 UTC (permalink / raw) To: rms, emacs-devel, bug-gnu-emacs; +Cc: kalman.reti On Nov 11, 2007 12:22 AM, Richard Stallman <rms@gnu.org> wrote: > The first questions are, what object contains the bad pointer? > What data type is it? What data structure is it part of? The gdb pr output near the end of the attachment in my first message shows it is part of a list, which, in turn, is part of a buffer. I assumed someone would recognize WHAT part of a buffer from the contents of the, list, a mixture of conses with marker-in-no-buffer in the car of some and Lisp_Misc_Free in the car of others, the cdr's being negative numbers of pretty small absolute magnitude. If it isn't recognizable from its contents, I'll have to wait till I'm next at work to find out exactly which slot in the buffer this list comes from using gdb. The code I'm running is pretty simple, it executes a shell command (i.e. a perforce command) and then uses search-forward-regexp to find relevant lines in the output, capturing things like revision number or branch using match-string after the regexp matches. The searching is done within a save-excursion which switches to the *Shell Command Output* buffer. I suspect one could reproduce the bug without issuing perforce commands, I'll give that a stab tonight. > > Once you answer those, you can try to figure out how it happened > that the data structure ended up with a bad pointer. > Maybe GC failed to mark that pointer, so the misc object got freed > even though it was still in use. Are there any tools to help with this, e.g. an allocation trace or GC trace? I'm afraid this is the first time I've looked at the Emacs src code. [rest of message elided] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-12 11:40 ` Kalman Reti @ 2007-11-12 22:03 ` Stefan Monnier 2007-11-13 0:30 ` Kalman Reti 2007-11-13 20:03 ` Richard Stallman 2007-11-13 5:10 ` Richard Stallman 1 sibling, 2 replies; 22+ messages in thread From: Stefan Monnier @ 2007-11-12 22:03 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, rms, emacs-devel >> The first questions are, what object contains the bad pointer? >> What data type is it? What data structure is it part of? > The gdb pr output near the end of the attachment in my first message > shows it is part of a list, which, in turn, is part of a buffer. I assumed > someone would recognize WHAT part of a buffer from the contents of the, > list, a mixture of conses with marker-in-no-buffer in the car of some and > Lisp_Misc_Free in the car of others, the cdr's being negative numbers > of pretty small absolute magnitude. If it isn't recognizable from its contents, > I'll have to wait till I'm next at work to find out exactly which slot > in the buffer > this list comes from using gdb. Sounds like the contents of the buffer-undo-list. Especially since this variable is GC'd specially and getting it right is tricky. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-12 22:03 ` Stefan Monnier @ 2007-11-13 0:30 ` Kalman Reti 2007-11-13 20:03 ` Richard Stallman 1 sibling, 0 replies; 22+ messages in thread From: Kalman Reti @ 2007-11-13 0:30 UTC (permalink / raw) To: Stefan Monnier; +Cc: bug-gnu-emacs, kalman.reti, rms, emacs-devel I looked at the code, and there are comments saying both that the undo_list should be before the name slot and that it should come after. In the CVS code, it definitely comes after which looks to me like it will get marked twice, once in the normal loop which starts at name and marks all the following objects and then again at the special code for marking the undo list. This is contrary to what the comments say should be happening, but I don't know which of the comments or the code is right. On Nov 12, 2007 5:03 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> The first questions are, what object contains the bad pointer? > >> What data type is it? What data structure is it part of? > > > The gdb pr output near the end of the attachment in my first message > > shows it is part of a list, which, in turn, is part of a buffer. I assumed > > someone would recognize WHAT part of a buffer from the contents of the, > > list, a mixture of conses with marker-in-no-buffer in the car of some and > > Lisp_Misc_Free in the car of others, the cdr's being negative numbers > > of pretty small absolute magnitude. If it isn't recognizable from its contents, > > I'll have to wait till I'm next at work to find out exactly which slot > > in the buffer > > this list comes from using gdb. > > Sounds like the contents of the buffer-undo-list. Especially since this > variable is GC'd specially and getting it right is tricky. > > > Stefan > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-12 22:03 ` Stefan Monnier 2007-11-13 0:30 ` Kalman Reti @ 2007-11-13 20:03 ` Richard Stallman 2007-11-14 17:39 ` Kalman Reti 1 sibling, 1 reply; 22+ messages in thread From: Richard Stallman @ 2007-11-13 20:03 UTC (permalink / raw) To: Stefan Monnier; +Cc: kalman.reti, bug-gnu-emacs, emacs-devel > I'll have to wait till I'm next at work to find out exactly which slot > in the buffer > this list comes from using gdb. Sounds like the contents of the buffer-undo-list. Especially since this variable is GC'd specially and getting it right is tricky. It should be easy to verify that guess by examining the undo list slot in the buffer object. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-13 20:03 ` Richard Stallman @ 2007-11-14 17:39 ` Kalman Reti 2007-11-14 18:51 ` Stefan Monnier 2007-11-15 3:08 ` Richard Stallman 0 siblings, 2 replies; 22+ messages in thread From: Kalman Reti @ 2007-11-14 17:39 UTC (permalink / raw) To: rms; +Cc: bug-gnu-emacs, kalman.reti, Stefan Monnier, emacs-devel On Nov 13, 2007 3:03 PM, Richard Stallman <rms@gnu.org> wrote: > > I'll have to wait till I'm next at work to find out exactly which slot > > in the buffer > > this list comes from using gdb. > > Sounds like the contents of the buffer-undo-list. Especially since this > variable is GC'd specially and getting it right is tricky. > > It should be easy to verify that guess by examining the undo list slot > in the buffer object. > By moving up the stack in gdb at the time of the abort, I was able to see that the top-level mark_object call is from the undo list processing in Fgarbage_collect. The undo list is for the *Shell Command Output* buffer, and is very long since that buffer gets used over and over again for the many shell commands the elisp code issues. Looking harder at the code, I'm convinced that the undo_list should come before the name entry in the buffer structure, so I moved it there. However, I still get the crash. My first experiment of putting a proceeding breakpoint in the undo_list processing which printed out the list failed to result in an obvious correlation between elements of the undo_list the last time it was processed and the time which resulted in the abort. I suspect that the Lisp_Misc_Free cells were markers which should have been removed but for some as yet unknown reason, weren't. I'll have to craft a more thorough experiment next time. Anyone know what the elements of the undo_list mean? Some are conses with a marker in their CAR and a number in their CDR, some are just conses of two numbers and some are conses of a string and a number. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-14 17:39 ` Kalman Reti @ 2007-11-14 18:51 ` Stefan Monnier 2007-11-15 1:00 ` Kalman Reti 2007-11-15 3:08 ` Richard Stallman 1 sibling, 1 reply; 22+ messages in thread From: Stefan Monnier @ 2007-11-14 18:51 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, rms, emacs-devel > Anyone know what the elements of the undo_list mean? Some are conses > with a marker in their CAR and a number in their CDR, some are just > conses of two numbers and some are conses of a string and a number. It's documented in the docstring of buffer-undo-list. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-14 18:51 ` Stefan Monnier @ 2007-11-15 1:00 ` Kalman Reti 2007-11-15 17:09 ` Richard Stallman 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-15 1:00 UTC (permalink / raw) To: Stefan Monnier; +Cc: bug-gnu-emacs, kalman.reti, rms, emacs-devel On Nov 14, 2007 1:51 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > Anyone know what the elements of the undo_list mean? Some are conses > > with a marker in their CAR and a number in their CDR, some are just > > conses of two numbers and some are conses of a string and a number. > > It's documented in the docstring of buffer-undo-list. Thanks for the pointer. I've done some more experiments; it occurred to me that if the marker in the undo list was gc-marked already when we got to the special processing, then it would be skipped. I verified this by splitting out the last of the three-legged-and conditions into its own if. Presumably this means that the marker is shared in some other structure which got marked previously. Could the last match data and the undo list perhaps share a marker? Where is the last match data kept? If it isn't there, any suggestions on how to go about finding out where another pointer to this marker is stored? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-15 1:00 ` Kalman Reti @ 2007-11-15 17:09 ` Richard Stallman 2007-11-16 12:05 ` Kalman Reti 0 siblings, 1 reply; 22+ messages in thread From: Richard Stallman @ 2007-11-15 17:09 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel I've done some more experiments; it occurred to me that if the marker in the undo list was gc-marked already when we got to the special processing, then it would be skipped. I looked to see what you mean, and I see that some elements do get removed from the undo list. I hadn't remembered that -- sorry. Is this the special processing you mean? /* If a buffer's undo list is Qt, that means that undo is turned off in that buffer. Calling truncate_undo_list on Qt tends to return NULL, which effectively turns undo back on. So don't call truncate_undo_list if undo_list is Qt. */ if (! EQ (nextb->undo_list, Qt)) { ... If so, it is supposed to delete elements for markers that weren't already marked by GC. And then it marks the undo list in the normal way. Does it look like that code failed to remove an element which was supposed to update a marker? Was the marker already corrupted (replaced with Lisp_Misc_Free) before the start of the loop? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-15 17:09 ` Richard Stallman @ 2007-11-16 12:05 ` Kalman Reti 2007-11-16 14:07 ` Kalman Reti 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-16 12:05 UTC (permalink / raw) To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel On Nov 15, 2007 12:09 PM, Richard Stallman <rms@gnu.org> wrote: > I've done some more experiments; it occurred to me that if the marker in the > undo list was gc-marked already when we got to the special processing, then > it would be skipped. > > I looked to see what you mean, and I see that some elements do get > removed from the undo list. I hadn't remembered that -- sorry. > > Is this the special processing you mean? > > /* If a buffer's undo list is Qt, that means that undo is > turned off in that buffer. Calling truncate_undo_list on > Qt tends to return NULL, which effectively turns undo back on. > So don't call truncate_undo_list if undo_list is Qt. */ > if (! EQ (nextb->undo_list, Qt)) > { > ... > Yes. > If so, it is supposed to delete elements for markers > that weren't already marked by GC. And then it marks the undo > list in the normal way. I believe it works to do this if you move the undo_list before name. Otherwise, everything on the list is already marked by the normal "start at the name offset and mark until you've reached the buffer struct size" mechanism. > > Does it look like that code failed to remove an element > which was supposed to update a marker? No, it looks like a marker in the list is already marked; this marker gets turned into the Lisp_Misc_Free cell. > > Was the marker already corrupted (replaced with Lisp_Misc_Free) > before the start of the loop? I believe so. I think the culprit is the free_marker call in Fset_match_data. I think this because I added a checking routine which, given a marker, looped over all the cells in all the undo lists of all the buffers to see if that marker was in the caar of one of them, calling a dummy routine (krabort, on which I could set a breakpoint) if so. I added a call to this checking routine in free_misc, fired up my test case and almost immediately got a hit. (The backtrace below can't be the whole story, since this happens much earlier than the crash. A gdb session which is automatically capturing a backtrace at this point and continuing, so I can show you the latest stack trace before the crash, has run overnight now without reaching the crash. Presumably there is some mechanism which removes the Lisp_Misc_Free cell created here before the GC trips over it and that something else [much] later on is causing that mechanism to fail to work in the runnup to the crash.) The early stack trace is at the end of this message. One thing that isn't clear to me is exactly who is calling set-match-data with the reseat argument set to evaporate inside of the shell-command function. This is happening somewhere inside of the shell-command function which my code calls. (gdb) where #0 krabort () at alloc.c:3364 #1 0x08129319 in check_for_problem (marker=147919074) at alloc.c:3380 #2 0x0812934c in free_misc (misc=147919074) at alloc.c:3394 #3 0x0811c354 in Fset_match_data (list=146951973, reseat=137508953) at search.c:3057 #4 0x0813e252 in Ffuncall (nargs=3, args=0xbffea3f0) at eval.c:3027 #5 0x08161c84 in Fbyte_code (bytestr=136239067, vector=136239092, maxdepth=24) at bytecode.c:679 #6 0x0813d87e in Feval (form=136239053) at eval.c:2361 #7 0x0813b22f in Fprogn (args=136239045) at eval.c:450 #8 0x0813eb33 in unbind_to (count=25, value=137414769) at eval.c:3378 #9 0x08162361 in Fbyte_code (bytestr=136238739, vector=136238756, maxdepth=64) at bytecode.c:890 #10 0x0813e73a in funcall_lambda (fun=136238676, nargs=1, arg_vector=0xbffea6e4) at eval.c:3211 #11 0x0813e359 in Ffuncall (nargs=2, args=0xbffea6e0) at eval.c:3081 #12 0x08161c84 in Fbyte_code (bytestr=144608715, vector=144609972, maxdepth=64) at bytecode.c:679 #13 0x0813e73a in funcall_lambda (fun=144610268, nargs=5, arg_vector=0xbffea804) at eval.c:3211 #14 0x0813e359 in Ffuncall (nargs=6, args=0xbffea800) at eval.c:3081 #15 0x08161c84 in Fbyte_code (bytestr=144597347, vector=144598732, maxdepth=48) at bytecode.c:679 #16 0x0813e73a in funcall_lambda (fun=144598884, nargs=3, arg_vector=0xbffea924) at eval.c:3211 #17 0x0813e359 in Ffuncall (nargs=4, args=0xbffea920) at eval.c:3081 #18 0x08161c84 in Fbyte_code (bytestr=144645315, vector=144646532, maxdepth=56) at bytecode.c:679 #19 0x0813e73a in funcall_lambda (fun=144646748, nargs=3, arg_vector=0xbffea9e0) at eval.c:3211 #20 0x0813e486 in apply_lambda (fun=144646748, args=146894853, eval_flag=1) at eval.c:3135 #21 0x0813d9d3 in Feval (form=146896869) at eval.c:2415 #22 0x0813b359 in Fsetq (args=146896861) at eval.c:552 #23 0x0813d70a in Feval (form=146896853) at eval.c:2302 #24 0x0813d7dd in Feval (form=146896845) at eval.c:2340 #25 0x0813e23f in Ffuncall (nargs=2, args=0xbffead34) at eval.c:3024 #26 0x08161c84 in Fbyte_code (bytestr=136525275, vector=136525292, maxdepth=24) at bytecode.c:679 #27 0x0813e73a in funcall_lambda (fun=136525236, nargs=1, arg_vector=0xbffeae44) at eval.c:3211 #28 0x0813e359 in Ffuncall (nargs=2, args=0xbffeae40) at eval.c:3081 #29 0x08161c84 in Fbyte_code (bytestr=136525523, vector=136525540, maxdepth=24) at bytecode.c:679 #30 0x0813e73a in funcall_lambda (fun=136525484, nargs=1, arg_vector=0xbffeaf54) at eval.c:3211 #31 0x0813e359 in Ffuncall (nargs=2, args=0xbffeaf50) at eval.c:3081 #32 0x08161c84 in Fbyte_code (bytestr=136523723, vector=136523740, maxdepth=16) at bytecode.c:679 #33 0x0813e73a in funcall_lambda (fun=136523692, nargs=0, arg_vector=0xbffeb084) at eval.c:3211 #34 0x0813e359 in Ffuncall (nargs=1, args=0xbffeb080) at eval.c:3081 #35 0x0813df04 in apply1 (fn=137580545, arg=137414769) at eval.c:2765 #36 0x08139bcc in Fcall_interactively (function=137580545, record_flag=137414769, keys=137463044) at callint.c:385 #37 0x080edd6d in Fcommand_execute (cmd=137580545, record_flag=137414769, keys=137414769, special=137414769) at keyboard.c:10435 #38 0x080e3e99 in command_loop_1 () at keyboard.c:1939 #39 0x0813c6f2 in internal_condition_case (bfun=0x80e31a4 <command_loop_1>, handlers=137472161, hfun=0x80e2c70 <cmd_error>) at eval.c:1493 #40 0x080e2f42 in command_loop_2 () at keyboard.c:1396 #41 0x0813c263 in internal_catch (tag=137463729, func=0x80e2f24 <command_loop_2>, arg=137414769) at eval.c:1229 #42 0x080e2ed0 in command_loop () at keyboard.c:1375 #43 0x080e28f4 in recursive_edit_1 () at keyboard.c:984 #44 0x080e2a34 in Frecursive_edit () at keyboard.c:1046 #45 0x080e18c9 in main (argc=3, argv=0xbffeb834) at emacs.c:1777 Lisp Backtrace: "set-match-data" (0xbffea3f4) "byte-code" (0xbffea480) "shell-command" (0xbffea6e4) "diffs-between-depot-and-client-different-branches" (0xbffea804) "diffs-between" (0xbffea924) "changesets-between" (0xbffea9e0) "setq" (0xbffeab68) "length" (0xbffeac28) "eval" (0xbffead38) "eval-last-sexp-1" (0xbffeae44) "eval-last-sexp" (0xbffeaf54) "eval-print-last-sexp" (0xbffeb084) "call-interactively" (0xbffeb230) (gdb) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-16 12:05 ` Kalman Reti @ 2007-11-16 14:07 ` Kalman Reti [not found] ` <473DD32F.5070501@gmx.at> 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-16 14:07 UTC (permalink / raw) To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel On Nov 16, 2007 7:05 AM, Kalman Reti <kalman.reti@gmail.com> wrote: > One thing that isn't > clear to me is exactly who is calling set-match-data with the reseat > argument set to evaporate inside of the shell-command function. This is > happening somewhere inside of the shell-command function which my > code calls. > I just figured this part out. The save-match-data macro generates an unwind-protect call to set-match-data with 'evaporate as a second argument. What I haven't figured out is why these are mostly OK. Perhaps it is just a garbage collection being kicked of at an inconvenient time? ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <473DD32F.5070501@gmx.at>]
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) [not found] ` <473DD32F.5070501@gmx.at> @ 2007-11-16 17:56 ` Kalman Reti 2007-11-17 4:54 ` Richard Stallman 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-16 17:56 UTC (permalink / raw) To: martin rudalics; +Cc: bug-gnu-emacs, kalman.reti, rms, emacs-devel On Nov 16, 2007 12:28 PM, martin rudalics <rudalics@gmx.at> wrote: > Do you mean in code wrapped in `save-match-data' you delete some region > of text containing a marker of the saved match-data. It isn't in my code, it is in the shell-command function in simple.el, but essentially this is correct. Most of the guts of calling the subprocess to generate the output is inside save-match-data; I don't know exactly what path results in the markers' getting on the undo list, but if I create a new macro save-match-data-noevaporate that is identical to the original minus the 'evaporate argument to set-match-data and use that inside of shell-command instead of the original, my crash goes away. > Thus > record_marker_adjustment puts an entry on `buffer-undo-list' referencing > that marker. The unwindforms of `save-match-data' call `set-match-data' > with evaporate/reseat non-nil, which calls free_marker and subsequently > free_misc. mark_object - operating from `buffer-undo-list' - detects > that the object is already free and aborts. There is something which causes this not to happen all the time which I have not yet found. If you are lucky and this "something" happens before the next GC, all is well. I'd been doing exactly the same sorts of shell operations in elisp functions for years before encountering one big enough to have a 100% chance of being unlucky. It does many hundreds of shell operations (perhaps even thousands, I haven't counted them) taking many minutes. > > If I understand correctly, this means that either markers used for > saving match-data should not go to `buffer-undo-list' or the "evaporate" > option set by `save-match-data' is inherently broken. > My suspicion is that the save-match-data was intended to be wrapped around very short local uses of markers, not the collection of arbitrary amounts of shell stdout output... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-16 17:56 ` Kalman Reti @ 2007-11-17 4:54 ` Richard Stallman 2007-11-17 5:43 ` Kalman Reti 0 siblings, 1 reply; 22+ messages in thread From: Richard Stallman @ 2007-11-17 4:54 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, emacs-devel My suspicion is that the save-match-data was intended to be wrapped around very short local uses of markers, not the collection of arbitrary amounts of shell stdout output... That's true, but `save-match-data' should work correctly regardless of what goes on in its body. This is a real bug. Thanks for tracking it down. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-17 4:54 ` Richard Stallman @ 2007-11-17 5:43 ` Kalman Reti 0 siblings, 0 replies; 22+ messages in thread From: Kalman Reti @ 2007-11-17 5:43 UTC (permalink / raw) To: rms; +Cc: bug-gnu-emacs, kalman.reti, emacs-devel On Nov 16, 2007 11:54 PM, Richard Stallman <rms@gnu.org> wrote: > My suspicion is that the save-match-data was intended to be wrapped around > very short local uses of markers, not the collection of arbitrary amounts of > shell stdout output... > > That's true, but `save-match-data' should work correctly regardless > of what goes on in its body. This is a real bug. > > Thanks for tracking it down. > You're quite welcome. BTW, I applied Stefan's search.c diff to a fresh copy of the CVS sources and successfully ran my test case. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-14 17:39 ` Kalman Reti 2007-11-14 18:51 ` Stefan Monnier @ 2007-11-15 3:08 ` Richard Stallman 2007-11-15 8:38 ` Kalman Reti 1 sibling, 1 reply; 22+ messages in thread From: Richard Stallman @ 2007-11-15 3:08 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel Nothing gets "removed" from the undo list in normal use. It gets truncated, which drops off elements at the end, but other than that all that normally happens is that editing operations add elements. Markers in the list should not become free, because the undo list itself should preserve them from GC. If this is reproducible, can you put a breakpoint at Fgarbage_collect and examine the data just before the GC which gets this crash? Examine that list using the x... commands, and see if that marker is already free. Looking harder at the code, I'm convinced that the undo_list should come before the name entry in the buffer structure, Definitely not. It needs to be AFTER `name' so that it will be marked by GC. Anyone know what the elements of the undo_list mean? Some are conses with a marker in their CAR and a number in their CDR, some are just conses of two numbers and some are conses of a string and a number. The Lisp Manual documents these. Node `Undo'. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-15 3:08 ` Richard Stallman @ 2007-11-15 8:38 ` Kalman Reti 2007-11-16 20:48 ` Kalman Reti 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-15 8:38 UTC (permalink / raw) To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel On Nov 14, 2007 10:08 PM, Richard Stallman <rms@gnu.org> wrote: > Nothing gets "removed" from the undo list in normal use. It gets > truncated, which drops off elements at the end, but other than that > all that normally happens is that editing operations add elements. > > Markers in the list should not become free, because the undo list > itself should preserve them from GC. > > If this is reproducible, can you put a breakpoint at Fgarbage_collect > and examine the data just before the GC which gets this crash? > Examine that list using the x... commands, and see if that marker > is already free. > > Looking harder at the code, I'm convinced that the undo_list should come before > the name entry in the buffer structure, > > Definitely not. It needs to be AFTER `name' so that it will be marked > by GC. There is special code at the end of Fgarbage_collect (just before the call to gc_sweep) which seems like it would have no point if this were true. It removes elements referring to unmarked markers and then explicitly marks the undo_list slot afterwards. The comment there reads: /* Now that we have stripped the elements that need not be in the undo_list any more, we can finally mark the list. */ mark_object (nextb->undo_list); It seems to me that if the undo_list were after name, then all the markers in the list would have already been marked and this code would be an elaborate no-op, no? > > Anyone know what the elements of the undo_list mean? Some are conses > with a marker > in their CAR and a number in their CDR, some are just conses of two > numbers and some > are conses of a string and a number. > > The Lisp Manual documents these. Node `Undo'. Thanks. Someone already pointed me at the documentation string for buffer-undo-list. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-15 8:38 ` Kalman Reti @ 2007-11-16 20:48 ` Kalman Reti 2007-11-16 21:59 ` Stefan Monnier 0 siblings, 1 reply; 22+ messages in thread From: Kalman Reti @ 2007-11-16 20:48 UTC (permalink / raw) To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel On Nov 15, 2007 3:38 AM, Kalman Reti <kalman.reti@gmail.com> wrote: > On Nov 14, 2007 10:08 PM, Richard Stallman <rms@gnu.org> wrote: > > > > Definitely not. It needs to be AFTER `name' so that it will be marked > > by GC. I've performed the experiment of building code straight from CVS and putting a breakpoint in the special code for handling un-gc-marked-markers. In my (long running previously resulting in a crash) test case, this breakpoint NEVER is reached. When I move the undo_list to before name and redo the experiment, I hit the breakpoint many many times. So either the special undo_list handling code should be removed or the undo_list moved before name in buffer.h. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-16 20:48 ` Kalman Reti @ 2007-11-16 21:59 ` Stefan Monnier 2007-11-16 23:09 ` martin rudalics 0 siblings, 1 reply; 22+ messages in thread From: Stefan Monnier @ 2007-11-16 21:59 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, rms, emacs-devel > When I move the undo_list to before name and redo the experiment, I hit > the breakpoint many many times. > So either the special undo_list handling code should be removed or the > undo_list moved before name in buffer.h. Agreed. The field was moved by Richard on 14-Oct-2002 but the change log doesn't say why this was done, so I just undid it. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-16 21:59 ` Stefan Monnier @ 2007-11-16 23:09 ` martin rudalics 0 siblings, 0 replies; 22+ messages in thread From: martin rudalics @ 2007-11-16 23:09 UTC (permalink / raw) To: Stefan Monnier; +Cc: Kalman Reti, emacs-devel, bug-gnu-emacs, rms > Agreed. The field was moved by Richard on 14-Oct-2002 but the change > log doesn't say why this was done, so I just undid it. Does this mean those cells always survived the current cycle? Then we now have a chance to test whether the "remove unmarked markers from the undo list" stuff really works in one and the same collection cycle. Interesting. Stefan, unless you have already done so, could you please fix those identic "If a buffer's undo list is Qt, ..." comments in alloc.c too? The second mentions truncate_undo_list which hardly makes sense. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: mark_object crash in 22.1 and latest CVS (as of tonight) 2007-11-12 11:40 ` Kalman Reti 2007-11-12 22:03 ` Stefan Monnier @ 2007-11-13 5:10 ` Richard Stallman 1 sibling, 0 replies; 22+ messages in thread From: Richard Stallman @ 2007-11-13 5:10 UTC (permalink / raw) To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, emacs-devel I assumed someone would recognize WHAT part of a buffer from the contents of the, list, a mixture of conses with marker-in-no-buffer in the car of some and Lisp_Misc_Free in the car of others, the cdr's being negative numbers of pretty small absolute magnitude. I didn't see that when I looked at the other message. Can anyone guess what data this is? > Once you answer those, you can try to figure out how it happened > that the data structure ended up with a bad pointer. > Maybe GC failed to mark that pointer, so the misc object got freed > even though it was still in use. Are there any tools to help with this, e.g. an allocation trace or GC trace? I'm afraid this is the first time I've looked at the Emacs src code. The x... GDB commands in .gdbinit are useful for examining data structures during GC. `last_marked' and `last_marked_index' keep track of the sequence of data objects that were marked. You can use that to determine precisely how the bad data was reached. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-11-17 5:43 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-09 3:55 mark_object crash in 22.1 and latest CVS (as of tonight) Kalman Reti 2007-11-09 11:32 ` Kalman Reti 2007-11-10 10:19 ` Kalman Reti [not found] ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org> 2007-11-12 11:40 ` Kalman Reti 2007-11-12 22:03 ` Stefan Monnier 2007-11-13 0:30 ` Kalman Reti 2007-11-13 20:03 ` Richard Stallman 2007-11-14 17:39 ` Kalman Reti 2007-11-14 18:51 ` Stefan Monnier 2007-11-15 1:00 ` Kalman Reti 2007-11-15 17:09 ` Richard Stallman 2007-11-16 12:05 ` Kalman Reti 2007-11-16 14:07 ` Kalman Reti [not found] ` <473DD32F.5070501@gmx.at> 2007-11-16 17:56 ` Kalman Reti 2007-11-17 4:54 ` Richard Stallman 2007-11-17 5:43 ` Kalman Reti 2007-11-15 3:08 ` Richard Stallman 2007-11-15 8:38 ` Kalman Reti 2007-11-16 20:48 ` Kalman Reti 2007-11-16 21:59 ` Stefan Monnier 2007-11-16 23:09 ` martin rudalics 2007-11-13 5:10 ` Richard Stallman
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).