unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* mark_object crash in 22.1 and latest CVS (as of tonight)
@ 2007-11-09  3:55 Kalman Reti
  2007-11-09 11:32 ` Kalman Reti
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-09  3:55 UTC (permalink / raw)
  To: bug-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 128 bytes --]

See attached file for gdb session of garbage collector crash in
a linux emacs built from sources checked out tonight.

  Kalman

[-- Attachment #2: emacscrash.text --]
[-- Type: text/plain, Size: 7611 bytes --]

This is a garbage-collect crash in a built-from-CVS emacs tree
checked out tonight (Nov 8, 2007).

I had originally experienced this crash in 22.1, both on Windows
and Linux, but wanted to make sure the bug existed in the latest
version before reporting it.

I've written some functions which issue Shell Commands to interact
with our perforce server at work; these commands parse the *Shell Output
Buffer* to pick up bits of information.  These have been working very
well for me, but today I got a reproducible case that crashes Emacs.
Unfortunately, it is only reproducible after issuing many commands
against our perforce server.

So I built from sources, ran with gdb, and captured the following information.
The object it trips over is always a misc free cell and it always
hits the default leg of the case statement in mark_object.

Let me know if you need me to collect more information.

$ gdb ./emacs
gdb ./emacs
GNU gdb Red Hat Linux (5.3.90-0.20030710.41.2.1rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...Using host libthread_db library "/lib/libthread_db.so.1".

DISPLAY = :1.0
TERM = dumb
Breakpoint 1 at 0x80e039a: file emacs.c, line 431.
Breakpoint 2 at 0x80f7145: file sysdep.c, line 1435.
(gdb) run
Starting program: /u/kreti/gnuemacs-linux11/emacs/src/emacs -geometry 80x40+0+0

Breakpoint 1, abort () at emacs.c:431
431	  kill (getpid (), SIGABRT);
(gdb) where
#0  abort () at emacs.c:431
#1  0x0812b179 in mark_object (arg=147211050) at alloc.c:5734
#2  0x0812b1da in mark_object (arg=141537485) at alloc.c:5751
#3  0x0812b1da in mark_object (arg=141537437) at alloc.c:5751
#4  0x0812b2b2 in mark_buffer (buf=146936428) at alloc.c:5808
#5  0x0812ae48 in mark_object (arg=146936428) at alloc.c:5558
#6  0x0812b0ec in mark_object (arg=138283458) at alloc.c:5679
#7  0x0812b026 in mark_object (arg=137558905) at alloc.c:5639
#8  0x0812b1da in mark_object (arg=137859549) at alloc.c:5751
#9  0x0812b1da in mark_object (arg=137859861) at alloc.c:5751
#10 0x0812b0e3 in mark_object (arg=137640922) at alloc.c:5678
#11 0x0812b026 in mark_object (arg=137728969) at alloc.c:5639
#12 0x0812b1da in mark_object (arg=137854981) at alloc.c:5751
#13 0x0812b1da in mark_object (arg=137400253) at alloc.c:5751
#14 0x0812b038 in mark_object (arg=141380333) at alloc.c:5641
#15 0x0812aec3 in mark_object (arg=141391156) at alloc.c:5581
#16 0x0812b02f in mark_object (arg=137826961) at alloc.c:5640
#17 0x0812b1da in mark_object (arg=141380237) at alloc.c:5751
#18 0x0812b038 in mark_object (arg=137459345) at alloc.c:5641
#19 0x0812b1da in mark_object (arg=139297181) at alloc.c:5751
#20 0x0812b038 in mark_object (arg=139297133) at alloc.c:5641
#21 0x0812b1da in mark_object (arg=137860261) at alloc.c:5751
#22 0x0812b038 in mark_object (arg=137678425) at alloc.c:5641
#23 0x0812b1da in mark_object (arg=141473229) at alloc.c:5751
#24 0x0812aec3 in mark_object (arg=141688156) at alloc.c:5581
#25 0x0812b02f in mark_object (arg=144524753) at alloc.c:5640
#26 0x0812b1da in mark_object (arg=144499205) at alloc.c:5751
#27 0x0812b1da in mark_object (arg=144499437) at alloc.c:5751
#28 0x0812b02f in mark_object (arg=144524729) at alloc.c:5640
#29 0x0812ad6f in mark_vectorlike (ptr=0x830c968) at alloc.c:5456
#30 0x0812b004 in mark_object (arg=137415020) at alloc.c:5628
#31 0x0812a786 in Fgarbage_collect () at alloc.c:5141
#32 0x0813df5a in Ffuncall (nargs=1, args=0xbffec420) at eval.c:3021
#33 0x081619b4 in Fbyte_code (bytestr=144658787, vector=144663148, maxdepth=56)
    at bytecode.c:679
#34 0x0813e46a in funcall_lambda (fun=144663356, nargs=3, 
    arg_vector=0xbffec4e0) at eval.c:3211
#35 0x0813e1b6 in apply_lambda (fun=144663356, args=146885917, eval_flag=1)
    at eval.c:3135
#36 0x0813d703 in Feval (form=146885909) at eval.c:2415
#37 0x0813b089 in Fsetq (args=146885901) at eval.c:552
#38 0x0813d43a in Feval (form=146885893) at eval.c:2302
#39 0x0813d50d in Feval (form=146885885) at eval.c:2340
#40 0x0813df6f in Ffuncall (nargs=2, args=0xbffec834) at eval.c:3024
#41 0x081619b4 in Fbyte_code (bytestr=136524459, vector=136524476, maxdepth=24)
    at bytecode.c:679
#42 0x0813e46a in funcall_lambda (fun=136524420, nargs=1, 
    arg_vector=0xbffec944) at eval.c:3211
#43 0x0813e089 in Ffuncall (nargs=2, args=0xbffec940) at eval.c:3081
#44 0x081619b4 in Fbyte_code (bytestr=136524707, vector=136524724, maxdepth=24)
    at bytecode.c:679
#45 0x0813e46a in funcall_lambda (fun=136524668, nargs=1, 
    arg_vector=0xbffeca54) at eval.c:3211
#46 0x0813e089 in Ffuncall (nargs=2, args=0xbffeca50) at eval.c:3081
#47 0x081619b4 in Fbyte_code (bytestr=136522907, vector=136522924, maxdepth=16)
    at bytecode.c:679
#48 0x0813e46a in funcall_lambda (fun=136522876, nargs=0, 
    arg_vector=0xbffecb84) at eval.c:3211
#49 0x0813e089 in Ffuncall (nargs=1, args=0xbffecb80) at eval.c:3081
#50 0x0813dc34 in apply1 (fn=138307105, arg=137413969) at eval.c:2765
#51 0x081398fc in Fcall_interactively (function=138307105, 
    record_flag=137413969, keys=137462244) at callint.c:385
#52 0x080edb15 in Fcommand_execute (cmd=138307105, record_flag=137413969, 
    keys=137413969, special=137413969) at keyboard.c:10363
#53 0x080e3c65 in command_loop_1 () at keyboard.c:1939
#54 0x0813c422 in internal_condition_case (bfun=0x80e2f70 <command_loop_1>, 
    handlers=137480609, hfun=0x80e2a3c <cmd_error>) at eval.c:1493
#55 0x080e2d0e in command_loop_2 () at keyboard.c:1396
#56 0x0813bf93 in internal_catch (tag=137462905, 
    func=0x80e2cf0 <command_loop_2>, arg=137413969) at eval.c:1229
#57 0x080e2c9c in command_loop () at keyboard.c:1375
#58 0x080e26c0 in recursive_edit_1 () at keyboard.c:984
#59 0x080e2800 in Frecursive_edit () at keyboard.c:1046
#60 0x080e1695 in main (argc=3, argv=0xbffed334) at emacs.c:1777

Lisp Backtrace:
"garbage-collect" (0xbffec424)
"changesets-between" (0xbffec4e0)
"setq" (0xbffec668)
"length" (0xbffec728)
"eval" (0xbffec838)
"eval-last-sexp-1" (0xbffec944)
"eval-last-sexp" (0xbffeca54)
"eval-print-last-sexp" (0xbffecb84)
"call-interactively" (0xbffecd30)
(gdb) print 146936428
$1 = 146936428
(gdb) pr
#<buffer >
(gdb) print 141537437
$2 = 141537437
(gdb) pr
((1 . 73) ("//depot/release-13-30/src/Makefile#42 - edit change 227204 (text)
" . 1) (#<misc free cell> . -58) (#<misc free cell> . -65) (#<marker in no buffer> . -58) (#<marker in no buffer> . -64) (1 . 73) ("//depot/V13-30-patch/src/Makefile
... #1 change 227756 branch on 2007/11/02 by majormajor@majormajor-p4branch-auto521 (text) 'Create'
... ... branch from //depot/release-13-30/src/Makefile#1,#42
" . 1) (#<marker in no buffer> . -143) (#<marker in no buffer> . -202) (#<misc free cell> . -164) (#<misc free cell> . -176) (#<marker in no buffer> . -200) (#<marker in no buffer> . -202) (1 . 204) ("//depot/V13-30-patch/src/Makefile#1 - branch change 227756 (text)
" . 1) (#<marker in no buffer> . -58) (#<marker in no buffer> . -65) (#<marker in no buffer> . -58) (#<marker in no buffer> . -64) (1 . 73))
(gdb) print 141537485
$3 = 141537485
(gdb) pr
(#<misc free cell> . -58)
(gdb) print 147211050
$4 = 147211050
(gdb) pr
#<misc free cell>
(gdb) xmiscfree 147211050
$5 = (struct Lisp_Free *) 0x8c64328
(gdb) pr
18401381

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-09  3:55 mark_object crash in 22.1 and latest CVS (as of tonight) Kalman Reti
@ 2007-11-09 11:32 ` Kalman Reti
  2007-11-10 10:19   ` Kalman Reti
       [not found]   ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org>
  0 siblings, 2 replies; 22+ messages in thread
From: Kalman Reti @ 2007-11-09 11:32 UTC (permalink / raw)
  To: bug-gnu-emacs; +Cc: kalman.reti

Adding a subcase of Lisp_Misc_Free inside the

     switch (XMISCTYPE (obj))

inside the

    case Lisp_Misc:

(in mark_object) which calls break (i.e. ignores it) causes my crash
to go away.

I don't understand how Lisp_Misc_Free objects are supposed to
be handled, so I'm not terribly confident of this fix.


On Nov 8, 2007 10:55 PM, Kalman Reti <kalman.reti@gmail.com> wrote:
> See attached file for gdb session of garbage collector crash in
> a linux emacs built from sources checked out tonight.
>
>   Kalman
>




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-09 11:32 ` Kalman Reti
@ 2007-11-10 10:19   ` Kalman Reti
       [not found]   ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org>
  1 sibling, 0 replies; 22+ messages in thread
From: Kalman Reti @ 2007-11-10 10:19 UTC (permalink / raw)
  To: bug-gnu-emacs; +Cc: kalman.reti

So, a little more research indicates that my fix is likely wrong, since
in 2004 an equivalent fix was made and then rescinded after the
code was added to remove markers that were in buffer undo lists
at the end of the GC.  Perhaps there are other places where such
markers could exist, e.g. perhaps the place(s) storing what (match-data)
returns.

Can anyone elucidate the theory of Lisp_Misc_Free objects?  Is the
fact that any pointers to such objects exist after the GC the real
bug or are they allowed to survive a GC and are somehow supposed
to be handled in some other way elsewhere?

Since I have a reproducible test case I'd be happy to track down
where these are coming from, but I need some help (in the form
of information) to know what I'm chasing.

On Nov 9, 2007 6:32 AM, Kalman Reti <kalman.reti@gmail.com> wrote:
> Adding a subcase of Lisp_Misc_Free inside the
>
>      switch (XMISCTYPE (obj))
>
> inside the
>
>     case Lisp_Misc:
>
> (in mark_object) which calls break (i.e. ignores it) causes my crash
> to go away.
>
> I don't understand how Lisp_Misc_Free objects are supposed to
> be handled, so I'm not terribly confident of this fix.
>
>
>
> On Nov 8, 2007 10:55 PM, Kalman Reti <kalman.reti@gmail.com> wrote:
> > See attached file for gdb session of garbage collector crash in
> > a linux emacs built from sources checked out tonight.
> >
> >   Kalman
> >
>




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
       [not found]   ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org>
@ 2007-11-12 11:40     ` Kalman Reti
  2007-11-12 22:03       ` Stefan Monnier
  2007-11-13  5:10       ` Richard Stallman
  0 siblings, 2 replies; 22+ messages in thread
From: Kalman Reti @ 2007-11-12 11:40 UTC (permalink / raw)
  To: rms, emacs-devel, bug-gnu-emacs; +Cc: kalman.reti

On Nov 11, 2007 12:22 AM, Richard Stallman <rms@gnu.org> wrote:

> The first questions are, what object contains the bad pointer?
> What data type is it?  What data structure is it part of?

The gdb pr output near the end of the attachment in my first message
shows it is part of a list, which, in turn, is part of a buffer.  I assumed
someone would recognize WHAT part of a buffer from the contents of the,
list, a mixture of conses with marker-in-no-buffer in the car of some and
Lisp_Misc_Free  in the car of others, the cdr's being negative numbers
of pretty small absolute magnitude.  If it isn't recognizable from its contents,
I'll have to wait till I'm next at work to find out exactly which slot
in the buffer
this list comes from using gdb.

The code I'm running is pretty simple, it executes a shell command (i.e.
a perforce command) and then uses search-forward-regexp to find
relevant lines in the output, capturing things like revision number or
branch using match-string after the regexp matches.  The searching
is done within a save-excursion which switches to  the *Shell Command
Output*  buffer.  I suspect one could reproduce the bug without issuing
perforce commands,  I'll give that a stab tonight.

>
> Once you answer those, you can try to figure out how it happened
> that the data structure ended up with a bad pointer.
> Maybe GC failed to mark that pointer, so the misc object got freed
> even though it was still in use.

Are there any tools to help with this, e.g. an allocation trace or GC trace?
I'm afraid this is the first time I've looked at the Emacs src code.

[rest of message elided]




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-12 11:40     ` Kalman Reti
@ 2007-11-12 22:03       ` Stefan Monnier
  2007-11-13  0:30         ` Kalman Reti
  2007-11-13 20:03         ` Richard Stallman
  2007-11-13  5:10       ` Richard Stallman
  1 sibling, 2 replies; 22+ messages in thread
From: Stefan Monnier @ 2007-11-12 22:03 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, rms, emacs-devel

>> The first questions are, what object contains the bad pointer?
>> What data type is it?  What data structure is it part of?

> The gdb pr output near the end of the attachment in my first message
> shows it is part of a list, which, in turn, is part of a buffer.  I assumed
> someone would recognize WHAT part of a buffer from the contents of the,
> list, a mixture of conses with marker-in-no-buffer in the car of some and
> Lisp_Misc_Free  in the car of others, the cdr's being negative numbers
> of pretty small absolute magnitude.  If it isn't recognizable from its contents,
> I'll have to wait till I'm next at work to find out exactly which slot
> in the buffer
> this list comes from using gdb.

Sounds like the contents of the buffer-undo-list.  Especially since this
variable is GC'd specially and getting it right is tricky.


        Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-12 22:03       ` Stefan Monnier
@ 2007-11-13  0:30         ` Kalman Reti
  2007-11-13 20:03         ` Richard Stallman
  1 sibling, 0 replies; 22+ messages in thread
From: Kalman Reti @ 2007-11-13  0:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: bug-gnu-emacs, kalman.reti, rms, emacs-devel

I looked at the code, and there are comments saying both that
the undo_list should be before the name slot and that it should
come after.  In the CVS code, it definitely comes after which looks
to me like it will get marked twice, once in the normal loop which
starts at name and marks all the following objects and then again
at the special code for marking the undo list.  This is contrary to
what the comments say should be happening, but I don't know which
of the comments or the code is right.

On Nov 12, 2007 5:03 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >> The first questions are, what object contains the bad pointer?
> >> What data type is it?  What data structure is it part of?
>
> > The gdb pr output near the end of the attachment in my first message
> > shows it is part of a list, which, in turn, is part of a buffer.  I assumed
> > someone would recognize WHAT part of a buffer from the contents of the,
> > list, a mixture of conses with marker-in-no-buffer in the car of some and
> > Lisp_Misc_Free  in the car of others, the cdr's being negative numbers
> > of pretty small absolute magnitude.  If it isn't recognizable from its contents,
> > I'll have to wait till I'm next at work to find out exactly which slot
> > in the buffer
> > this list comes from using gdb.
>
> Sounds like the contents of the buffer-undo-list.  Especially since this
> variable is GC'd specially and getting it right is tricky.
>
>
>         Stefan
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-12 11:40     ` Kalman Reti
  2007-11-12 22:03       ` Stefan Monnier
@ 2007-11-13  5:10       ` Richard Stallman
  1 sibling, 0 replies; 22+ messages in thread
From: Richard Stallman @ 2007-11-13  5:10 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, emacs-devel

      I assumed
    someone would recognize WHAT part of a buffer from the contents of the,
    list, a mixture of conses with marker-in-no-buffer in the car of some and
    Lisp_Misc_Free  in the car of others, the cdr's being negative numbers
    of pretty small absolute magnitude.

I didn't see that when I looked at the other message.  Can anyone
guess what data this is?

    > Once you answer those, you can try to figure out how it happened
    > that the data structure ended up with a bad pointer.
    > Maybe GC failed to mark that pointer, so the misc object got freed
    > even though it was still in use.

    Are there any tools to help with this, e.g. an allocation trace or GC trace?
    I'm afraid this is the first time I've looked at the Emacs src code.

The x... GDB commands in .gdbinit are useful for examining data
structures during GC.

`last_marked' and `last_marked_index' keep track of the sequence of
data objects that were marked.  You can use that to determine precisely
how the bad data was reached.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-12 22:03       ` Stefan Monnier
  2007-11-13  0:30         ` Kalman Reti
@ 2007-11-13 20:03         ` Richard Stallman
  2007-11-14 17:39           ` Kalman Reti
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Stallman @ 2007-11-13 20:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: kalman.reti, bug-gnu-emacs, emacs-devel

    > I'll have to wait till I'm next at work to find out exactly which slot
    > in the buffer
    > this list comes from using gdb.

    Sounds like the contents of the buffer-undo-list.  Especially since this
    variable is GC'd specially and getting it right is tricky.

It should be easy to verify that guess by examining the undo list slot
in the buffer object.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-13 20:03         ` Richard Stallman
@ 2007-11-14 17:39           ` Kalman Reti
  2007-11-14 18:51             ` Stefan Monnier
  2007-11-15  3:08             ` Richard Stallman
  0 siblings, 2 replies; 22+ messages in thread
From: Kalman Reti @ 2007-11-14 17:39 UTC (permalink / raw)
  To: rms; +Cc: bug-gnu-emacs, kalman.reti, Stefan Monnier, emacs-devel

On Nov 13, 2007 3:03 PM, Richard Stallman <rms@gnu.org> wrote:
>     > I'll have to wait till I'm next at work to find out exactly which slot
>     > in the buffer
>     > this list comes from using gdb.
>
>     Sounds like the contents of the buffer-undo-list.  Especially since this
>     variable is GC'd specially and getting it right is tricky.
>
> It should be easy to verify that guess by examining the undo list slot
> in the buffer object.
>

By moving up the stack in gdb at the time of the abort, I was able to see
that the top-level mark_object call is from the undo list processing
in Fgarbage_collect.

The undo list is for the *Shell Command Output* buffer, and is very long
since that buffer gets used over and over again for the many shell commands
the elisp code issues.

Looking harder at the code, I'm convinced that the undo_list should come before
the name entry in the buffer structure, so I moved it there.  However,
I still get
the crash.

My first experiment of putting a proceeding breakpoint in the
undo_list processing
which printed out the list failed to result in an obvious correlation
between elements
of the undo_list the last time it was processed and the time which
resulted in the
abort. I suspect that the Lisp_Misc_Free cells were markers which
should have been
removed but for some as yet unknown reason, weren't.  I'll have to
craft a more thorough
experiment next time.

Anyone know what the elements of the undo_list mean?  Some are conses
with a marker
in their CAR and a number in their CDR, some are just conses of two
numbers and some
are conses of a string and a number.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-14 17:39           ` Kalman Reti
@ 2007-11-14 18:51             ` Stefan Monnier
  2007-11-15  1:00               ` Kalman Reti
  2007-11-15  3:08             ` Richard Stallman
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2007-11-14 18:51 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, rms, emacs-devel

> Anyone know what the elements of the undo_list mean?  Some are conses
> with a marker in their CAR and a number in their CDR, some are just
> conses of two numbers and some are conses of a string and a number.

It's documented in the docstring of buffer-undo-list.


        Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-14 18:51             ` Stefan Monnier
@ 2007-11-15  1:00               ` Kalman Reti
  2007-11-15 17:09                 ` Richard Stallman
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-15  1:00 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: bug-gnu-emacs, kalman.reti, rms, emacs-devel

On Nov 14, 2007 1:51 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> > Anyone know what the elements of the undo_list mean?  Some are conses
> > with a marker in their CAR and a number in their CDR, some are just
> > conses of two numbers and some are conses of a string and a number.
>
> It's documented in the docstring of buffer-undo-list.

Thanks for the pointer.

I've done some more experiments; it occurred to me that if the marker in the
undo list was gc-marked already when we got to the special processing, then
it would be skipped.  I verified this by splitting out the last of the
three-legged-and
conditions into its own if.  Presumably this means that the marker is
shared in some
other structure which got marked previously.  Could the last match data and the
undo list perhaps share a marker?  Where is the last match data kept?
If it isn't
there, any suggestions on how to go about finding out where another pointer to
this marker is stored?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-14 17:39           ` Kalman Reti
  2007-11-14 18:51             ` Stefan Monnier
@ 2007-11-15  3:08             ` Richard Stallman
  2007-11-15  8:38               ` Kalman Reti
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Stallman @ 2007-11-15  3:08 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel

Nothing gets "removed" from the undo list in normal use.  It gets
truncated, which drops off elements at the end, but other than that
all that normally happens is that editing operations add elements.

Markers in the list should not become free, because the undo list
itself should preserve them from GC.

If this is reproducible, can you put a breakpoint at Fgarbage_collect
and examine the data just before the GC which gets this crash?
Examine that list using the x... commands, and see if that marker
is already free.

    Looking harder at the code, I'm convinced that the undo_list should come before
    the name entry in the buffer structure,

Definitely not.  It needs to be AFTER `name' so that it will be marked
by GC.

    Anyone know what the elements of the undo_list mean?  Some are conses
    with a marker
    in their CAR and a number in their CDR, some are just conses of two
    numbers and some
    are conses of a string and a number.

The Lisp Manual documents these.  Node `Undo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-15  3:08             ` Richard Stallman
@ 2007-11-15  8:38               ` Kalman Reti
  2007-11-16 20:48                 ` Kalman Reti
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-15  8:38 UTC (permalink / raw)
  To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel

On Nov 14, 2007 10:08 PM, Richard Stallman <rms@gnu.org> wrote:
> Nothing gets "removed" from the undo list in normal use.  It gets
> truncated, which drops off elements at the end, but other than that
> all that normally happens is that editing operations add elements.
>
> Markers in the list should not become free, because the undo list
> itself should preserve them from GC.
>
> If this is reproducible, can you put a breakpoint at Fgarbage_collect
> and examine the data just before the GC which gets this crash?
> Examine that list using the x... commands, and see if that marker
> is already free.
>
>     Looking harder at the code, I'm convinced that the undo_list should come before
>     the name entry in the buffer structure,
>
> Definitely not.  It needs to be AFTER `name' so that it will be marked
> by GC.

There is special code at the end of Fgarbage_collect (just before the
call to gc_sweep) which seems like it would have no point if this were
true.  It removes elements referring to unmarked markers and then
explicitly marks the undo_list slot afterwards.  The comment there reads:

	/* Now that we have stripped the elements that need not be in the
	   undo_list any more, we can finally mark the list.  */
	mark_object (nextb->undo_list);

It seems to me that if the undo_list were after name, then all the markers
in the list would have already been marked and this code would be an
elaborate no-op, no?

>
>     Anyone know what the elements of the undo_list mean?  Some are conses
>     with a marker
>     in their CAR and a number in their CDR, some are just conses of two
>     numbers and some
>     are conses of a string and a number.
>
> The Lisp Manual documents these.  Node `Undo'.

Thanks.  Someone already pointed me at the documentation string for
buffer-undo-list.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-15  1:00               ` Kalman Reti
@ 2007-11-15 17:09                 ` Richard Stallman
  2007-11-16 12:05                   ` Kalman Reti
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Stallman @ 2007-11-15 17:09 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel

    I've done some more experiments; it occurred to me that if the marker in the
    undo list was gc-marked already when we got to the special processing, then
    it would be skipped.

I looked to see what you mean, and I see that some elements do get
removed from the undo list.  I hadn't remembered that -- sorry.

Is this the special processing you mean?

	/* If a buffer's undo list is Qt, that means that undo is
	   turned off in that buffer.  Calling truncate_undo_list on
	   Qt tends to return NULL, which effectively turns undo back on.
	   So don't call truncate_undo_list if undo_list is Qt.  */
	if (! EQ (nextb->undo_list, Qt))
	  {
	  ...

If so, it is supposed to delete elements for markers
that weren't already marked by GC.  And then it marks the undo
list in the normal way.

Does it look like that code failed to remove an element
which was supposed to update a marker?

Was the marker already corrupted (replaced with Lisp_Misc_Free)
before the start of the loop?






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-15 17:09                 ` Richard Stallman
@ 2007-11-16 12:05                   ` Kalman Reti
  2007-11-16 14:07                     ` Kalman Reti
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-16 12:05 UTC (permalink / raw)
  To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel

On Nov 15, 2007 12:09 PM, Richard Stallman <rms@gnu.org> wrote:
>     I've done some more experiments; it occurred to me that if the marker in the
>     undo list was gc-marked already when we got to the special processing, then
>     it would be skipped.
>
> I looked to see what you mean, and I see that some elements do get
> removed from the undo list.  I hadn't remembered that -- sorry.
>
> Is this the special processing you mean?
>
>         /* If a buffer's undo list is Qt, that means that undo is
>            turned off in that buffer.  Calling truncate_undo_list on
>            Qt tends to return NULL, which effectively turns undo back on.
>            So don't call truncate_undo_list if undo_list is Qt.  */
>         if (! EQ (nextb->undo_list, Qt))
>           {
>           ...
>

Yes.

> If so, it is supposed to delete elements for markers
> that weren't already marked by GC.  And then it marks the undo
> list in the normal way.

I believe it works to do this if you move the undo_list before name.
Otherwise, everything on the list is already marked by the normal
"start at the name offset and mark until you've reached the buffer
struct size" mechanism.

>
> Does it look like that code failed to remove an element
> which was supposed to update a marker?

No, it looks like a marker in the list is already marked; this
marker gets turned into the Lisp_Misc_Free cell.

>
> Was the marker already corrupted (replaced with Lisp_Misc_Free)
> before the start of the loop?

I believe so.  I think the culprit is the free_marker call in Fset_match_data.
I think this because I added a checking routine which, given a marker, looped
over all the cells in all the undo lists of all the buffers to see if
that marker
was in the caar of one of them, calling a dummy routine (krabort, on which
I could set a breakpoint) if so.  I added a call to this checking routine in
free_misc, fired up my test case and almost immediately got a hit.  (The
backtrace below can't be the whole story, since this happens much earlier
than the crash.  A gdb session which is automatically capturing a
backtrace at this
point and continuing, so I can show you the latest stack trace before the
crash, has run overnight now without reaching the crash.  Presumably
there is some mechanism which removes the Lisp_Misc_Free cell created
here before the GC trips over it and that something else [much] later on
is causing that mechanism to fail to work in the runnup to the crash.)

The early stack trace is at the end of this message.  One thing that isn't
clear to me is exactly who is calling set-match-data with the reseat
argument set to evaporate inside of the shell-command function.  This is
happening somewhere inside of the shell-command function which my
code calls.

(gdb) where
#0  krabort () at alloc.c:3364
#1  0x08129319 in check_for_problem (marker=147919074) at alloc.c:3380
#2  0x0812934c in free_misc (misc=147919074) at alloc.c:3394
#3  0x0811c354 in Fset_match_data (list=146951973, reseat=137508953)
    at search.c:3057
#4  0x0813e252 in Ffuncall (nargs=3, args=0xbffea3f0) at eval.c:3027
#5  0x08161c84 in Fbyte_code (bytestr=136239067, vector=136239092, maxdepth=24)
    at bytecode.c:679
#6  0x0813d87e in Feval (form=136239053) at eval.c:2361
#7  0x0813b22f in Fprogn (args=136239045) at eval.c:450
#8  0x0813eb33 in unbind_to (count=25, value=137414769) at eval.c:3378
#9  0x08162361 in Fbyte_code (bytestr=136238739, vector=136238756, maxdepth=64)
    at bytecode.c:890
#10 0x0813e73a in funcall_lambda (fun=136238676, nargs=1,
    arg_vector=0xbffea6e4) at eval.c:3211
#11 0x0813e359 in Ffuncall (nargs=2, args=0xbffea6e0) at eval.c:3081
#12 0x08161c84 in Fbyte_code (bytestr=144608715, vector=144609972, maxdepth=64)
    at bytecode.c:679
#13 0x0813e73a in funcall_lambda (fun=144610268, nargs=5,
    arg_vector=0xbffea804) at eval.c:3211
#14 0x0813e359 in Ffuncall (nargs=6, args=0xbffea800) at eval.c:3081
#15 0x08161c84 in Fbyte_code (bytestr=144597347, vector=144598732, maxdepth=48)
    at bytecode.c:679
#16 0x0813e73a in funcall_lambda (fun=144598884, nargs=3,
    arg_vector=0xbffea924) at eval.c:3211
#17 0x0813e359 in Ffuncall (nargs=4, args=0xbffea920) at eval.c:3081
#18 0x08161c84 in Fbyte_code (bytestr=144645315, vector=144646532, maxdepth=56)
    at bytecode.c:679
#19 0x0813e73a in funcall_lambda (fun=144646748, nargs=3,
    arg_vector=0xbffea9e0) at eval.c:3211
#20 0x0813e486 in apply_lambda (fun=144646748, args=146894853, eval_flag=1)
    at eval.c:3135
#21 0x0813d9d3 in Feval (form=146896869) at eval.c:2415
#22 0x0813b359 in Fsetq (args=146896861) at eval.c:552
#23 0x0813d70a in Feval (form=146896853) at eval.c:2302
#24 0x0813d7dd in Feval (form=146896845) at eval.c:2340
#25 0x0813e23f in Ffuncall (nargs=2, args=0xbffead34) at eval.c:3024
#26 0x08161c84 in Fbyte_code (bytestr=136525275, vector=136525292, maxdepth=24)
    at bytecode.c:679
#27 0x0813e73a in funcall_lambda (fun=136525236, nargs=1,
    arg_vector=0xbffeae44) at eval.c:3211
#28 0x0813e359 in Ffuncall (nargs=2, args=0xbffeae40) at eval.c:3081
#29 0x08161c84 in Fbyte_code (bytestr=136525523, vector=136525540, maxdepth=24)
    at bytecode.c:679
#30 0x0813e73a in funcall_lambda (fun=136525484, nargs=1,
    arg_vector=0xbffeaf54) at eval.c:3211
#31 0x0813e359 in Ffuncall (nargs=2, args=0xbffeaf50) at eval.c:3081
#32 0x08161c84 in Fbyte_code (bytestr=136523723, vector=136523740, maxdepth=16)
    at bytecode.c:679
#33 0x0813e73a in funcall_lambda (fun=136523692, nargs=0,
    arg_vector=0xbffeb084) at eval.c:3211
#34 0x0813e359 in Ffuncall (nargs=1, args=0xbffeb080) at eval.c:3081
#35 0x0813df04 in apply1 (fn=137580545, arg=137414769) at eval.c:2765
#36 0x08139bcc in Fcall_interactively (function=137580545,
    record_flag=137414769, keys=137463044) at callint.c:385
#37 0x080edd6d in Fcommand_execute (cmd=137580545, record_flag=137414769,
    keys=137414769, special=137414769) at keyboard.c:10435
#38 0x080e3e99 in command_loop_1 () at keyboard.c:1939
#39 0x0813c6f2 in internal_condition_case (bfun=0x80e31a4 <command_loop_1>,
    handlers=137472161, hfun=0x80e2c70 <cmd_error>) at eval.c:1493
#40 0x080e2f42 in command_loop_2 () at keyboard.c:1396
#41 0x0813c263 in internal_catch (tag=137463729,
    func=0x80e2f24 <command_loop_2>, arg=137414769) at eval.c:1229
#42 0x080e2ed0 in command_loop () at keyboard.c:1375
#43 0x080e28f4 in recursive_edit_1 () at keyboard.c:984
#44 0x080e2a34 in Frecursive_edit () at keyboard.c:1046
#45 0x080e18c9 in main (argc=3, argv=0xbffeb834) at emacs.c:1777

Lisp Backtrace:
"set-match-data" (0xbffea3f4)
"byte-code" (0xbffea480)
"shell-command" (0xbffea6e4)
"diffs-between-depot-and-client-different-branches" (0xbffea804)
"diffs-between" (0xbffea924)
"changesets-between" (0xbffea9e0)
"setq" (0xbffeab68)
"length" (0xbffeac28)
"eval" (0xbffead38)
"eval-last-sexp-1" (0xbffeae44)
"eval-last-sexp" (0xbffeaf54)
"eval-print-last-sexp" (0xbffeb084)
"call-interactively" (0xbffeb230)
(gdb)




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-16 12:05                   ` Kalman Reti
@ 2007-11-16 14:07                     ` Kalman Reti
       [not found]                       ` <473DD32F.5070501@gmx.at>
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-16 14:07 UTC (permalink / raw)
  To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel

On Nov 16, 2007 7:05 AM, Kalman Reti <kalman.reti@gmail.com> wrote:

>                                                                              One thing that isn't
> clear to me is exactly who is calling set-match-data with the reseat
> argument set to evaporate inside of the shell-command function.  This is
> happening somewhere inside of the shell-command function which my
> code calls.
>

I just figured this part out.  The save-match-data macro generates an
unwind-protect call to set-match-data with 'evaporate as a second argument.

What I haven't figured out is why these are mostly OK.  Perhaps it is just
a garbage collection being kicked of at an inconvenient time?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
       [not found]                       ` <473DD32F.5070501@gmx.at>
@ 2007-11-16 17:56                         ` Kalman Reti
  2007-11-17  4:54                           ` Richard Stallman
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-16 17:56 UTC (permalink / raw)
  To: martin rudalics; +Cc: bug-gnu-emacs, kalman.reti, rms, emacs-devel

On Nov 16, 2007 12:28 PM, martin rudalics <rudalics@gmx.at> wrote:
> Do you mean in code wrapped in `save-match-data' you delete some region
> of text containing a marker of the saved match-data.

It isn't in my code, it is in the shell-command function in simple.el, but
essentially this is correct.

Most of the guts of calling the subprocess to generate the output is
inside save-match-data; I don't know exactly what path results in the
markers' getting on the undo list, but if I create a new macro
save-match-data-noevaporate that is identical to the original
minus the 'evaporate argument to set-match-data and use that
inside of shell-command instead of the original, my crash goes away.

>                                                                                 Thus
> record_marker_adjustment puts an entry on `buffer-undo-list' referencing
> that marker.  The unwindforms of `save-match-data' call `set-match-data'
> with evaporate/reseat non-nil, which calls free_marker and subsequently
> free_misc.  mark_object - operating from `buffer-undo-list' - detects
> that the object is already free and aborts.

There is something which causes this not to happen all the time which
I have not yet found.  If you are lucky and this "something" happens before the
next GC, all is well.  I'd been doing exactly the same sorts of shell operations
in elisp functions for years before encountering one big enough to have a 100%
chance of being unlucky.  It does many hundreds of shell operations (perhaps
even thousands, I haven't counted them) taking many minutes.

>
> If I understand correctly, this means that either markers used for
> saving match-data should not go to `buffer-undo-list' or the "evaporate"
> option set by `save-match-data' is inherently broken.
>

My suspicion is that the save-match-data was intended to be wrapped around
very short local uses of markers, not the collection of arbitrary amounts of
shell stdout output...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-15  8:38               ` Kalman Reti
@ 2007-11-16 20:48                 ` Kalman Reti
  2007-11-16 21:59                   ` Stefan Monnier
  0 siblings, 1 reply; 22+ messages in thread
From: Kalman Reti @ 2007-11-16 20:48 UTC (permalink / raw)
  To: rms; +Cc: bug-gnu-emacs, kalman.reti, monnier, emacs-devel

On Nov 15, 2007 3:38 AM, Kalman Reti <kalman.reti@gmail.com> wrote:
> On Nov 14, 2007 10:08 PM, Richard Stallman <rms@gnu.org> wrote:
> >
> > Definitely not.  It needs to be AFTER `name' so that it will be marked
> > by GC.

I've performed the experiment of building code straight from CVS and
putting a breakpoint in the special code for handling un-gc-marked-markers.

In my (long running previously resulting in a crash) test case,
this breakpoint NEVER is reached.

When I move the undo_list to before name and redo the experiment, I hit
the breakpoint many many times.

So either the special undo_list handling code should be removed or the
undo_list moved before name in buffer.h.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-16 20:48                 ` Kalman Reti
@ 2007-11-16 21:59                   ` Stefan Monnier
  2007-11-16 23:09                     ` martin rudalics
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2007-11-16 21:59 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, rms, emacs-devel

> When I move the undo_list to before name and redo the experiment, I hit
> the breakpoint many many times.

> So either the special undo_list handling code should be removed or the
> undo_list moved before name in buffer.h.

Agreed.  The field was moved by Richard on 14-Oct-2002 but the change
log doesn't say why this was done, so I just undid it.


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-16 21:59                   ` Stefan Monnier
@ 2007-11-16 23:09                     ` martin rudalics
  0 siblings, 0 replies; 22+ messages in thread
From: martin rudalics @ 2007-11-16 23:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Kalman Reti, emacs-devel, bug-gnu-emacs, rms

> Agreed.  The field was moved by Richard on 14-Oct-2002 but the change
> log doesn't say why this was done, so I just undid it.

Does this mean those cells always survived the current cycle?
Then we now have a chance to test whether the "remove unmarked
markers from the undo list" stuff really works in one and the
same collection cycle.  Interesting.

Stefan, unless you have already done so, could you please fix
those identic

"If a buffer's undo list is Qt, ..."

comments in alloc.c too?  The second mentions truncate_undo_list
which hardly makes sense.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-16 17:56                         ` Kalman Reti
@ 2007-11-17  4:54                           ` Richard Stallman
  2007-11-17  5:43                             ` Kalman Reti
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Stallman @ 2007-11-17  4:54 UTC (permalink / raw)
  To: Kalman Reti; +Cc: bug-gnu-emacs, kalman.reti, emacs-devel

    My suspicion is that the save-match-data was intended to be wrapped around
    very short local uses of markers, not the collection of arbitrary amounts of
    shell stdout output...

That's true, but `save-match-data' should work correctly regardless
of what goes on in its body.  This is a real bug.

Thanks for tracking it down.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: mark_object crash in 22.1 and latest CVS (as of tonight)
  2007-11-17  4:54                           ` Richard Stallman
@ 2007-11-17  5:43                             ` Kalman Reti
  0 siblings, 0 replies; 22+ messages in thread
From: Kalman Reti @ 2007-11-17  5:43 UTC (permalink / raw)
  To: rms; +Cc: bug-gnu-emacs, kalman.reti, emacs-devel

On Nov 16, 2007 11:54 PM, Richard Stallman <rms@gnu.org> wrote:
>     My suspicion is that the save-match-data was intended to be wrapped around
>     very short local uses of markers, not the collection of arbitrary amounts of
>     shell stdout output...
>
> That's true, but `save-match-data' should work correctly regardless
> of what goes on in its body.  This is a real bug.
>
> Thanks for tracking it down.
>

You're quite welcome.  BTW, I applied Stefan's search.c diff to a fresh
copy of the CVS sources and successfully ran my test case.




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-11-17  5:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-09  3:55 mark_object crash in 22.1 and latest CVS (as of tonight) Kalman Reti
2007-11-09 11:32 ` Kalman Reti
2007-11-10 10:19   ` Kalman Reti
     [not found]   ` <E1Ir5Gz-0002TS-8T@fencepost.gnu.org>
2007-11-12 11:40     ` Kalman Reti
2007-11-12 22:03       ` Stefan Monnier
2007-11-13  0:30         ` Kalman Reti
2007-11-13 20:03         ` Richard Stallman
2007-11-14 17:39           ` Kalman Reti
2007-11-14 18:51             ` Stefan Monnier
2007-11-15  1:00               ` Kalman Reti
2007-11-15 17:09                 ` Richard Stallman
2007-11-16 12:05                   ` Kalman Reti
2007-11-16 14:07                     ` Kalman Reti
     [not found]                       ` <473DD32F.5070501@gmx.at>
2007-11-16 17:56                         ` Kalman Reti
2007-11-17  4:54                           ` Richard Stallman
2007-11-17  5:43                             ` Kalman Reti
2007-11-15  3:08             ` Richard Stallman
2007-11-15  8:38               ` Kalman Reti
2007-11-16 20:48                 ` Kalman Reti
2007-11-16 21:59                   ` Stefan Monnier
2007-11-16 23:09                     ` martin rudalics
2007-11-13  5:10       ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).