* Help please! To track down GC trying to free an already freed object.
@ 2019-04-02 11:25 Alan Mackenzie
2019-04-02 15:04 ` Eli Zaretskii
2019-04-02 19:09 ` Daniel Colascione
0 siblings, 2 replies; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-02 11:25 UTC (permalink / raw)
To: emacs-devel
Hello, Emacs.
I get this problem after a recent merge of master into
/scratch/accurate-warning-pos (my branch where I'm trying to implement
correct source positions in the byte compiler's warning messages). This
was a large merge, including bringing in the portable dumper.
Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
being freed already has type PVEC_FREE, i.e. has been freed already.
This object is a "symbol with position", a type of pseudovector which
doesn't yet exist outside of this scratch branch.
At a guess, I'm setting some data structure in the C code to a Lisp
structure containing this object, but failing to apply static protection
to this C variable. Or something like that.
This failure occurs during the byte compilation of .../lisp/registry.el
in a make or make bootstrap. The failure only occurs when this byte
compilation is started as -batch from the command line. So my use of
GDB is from the command line, not within a running Emacs.
With GDB, I can break at the creation of this symbol-with-position
object and again at its (first) freeing with this breakpoint:
break setup_on_free_list if (v == 0x5555561d0450)
. However, this isn't helping me to track down the Lisp object which
still references this symbol-with-position. I've tried to find the
address of Emacs's data segment, so as to be able to search through it
for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
thing to do.
Could somebody who has experience in this sort of thing please suggest
how I might proceed with the debugging, or possibly offer me some other
sort of help or hints.
Thanks in advance!
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 11:25 Help please! To track down GC trying to free an already freed object Alan Mackenzie
@ 2019-04-02 15:04 ` Eli Zaretskii
2019-04-02 20:42 ` Alan Mackenzie
2019-04-02 19:09 ` Daniel Colascione
1 sibling, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-04-02 15:04 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Tue, 2 Apr 2019 11:25:37 +0000
> From: Alan Mackenzie <acm@muc.de>
>
> With GDB, I can break at the creation of this symbol-with-position
> object and again at its (first) freeing with this breakpoint:
>
> break setup_on_free_list if (v == 0x5555561d0450)
>
> . However, this isn't helping me to track down the Lisp object which
> still references this symbol-with-position. I've tried to find the
> address of Emacs's data segment, so as to be able to search through it
> for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
> thing to do.
>
> Could somebody who has experience in this sort of thing please suggest
> how I might proceed with the debugging, or possibly offer me some other
> sort of help or hints.
The usual method of debugging such problems is described in etc/DEBUG,
it basically uses the last_marked[] array. You start with the object
at last_marked[last_marked_index - 1], and go backwards (in circular
manner), comparing the objects you find in the array with those you
see in the call-stack frames that call mark_* functions. Just be very
careful when you print the objects; e.g., never use 'pp', because the
function it calls cannot handle marked objects.
If you already tried this, please ask more specific questions.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 11:25 Help please! To track down GC trying to free an already freed object Alan Mackenzie
2019-04-02 15:04 ` Eli Zaretskii
@ 2019-04-02 19:09 ` Daniel Colascione
2019-04-02 19:21 ` Eli Zaretskii
2019-04-02 20:24 ` Alan Mackenzie
1 sibling, 2 replies; 24+ messages in thread
From: Daniel Colascione @ 2019-04-02 19:09 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Hello, Emacs.
>
> I get this problem after a recent merge of master into
> /scratch/accurate-warning-pos (my branch where I'm trying to implement
> correct source positions in the byte compiler's warning messages). This
> was a large merge, including bringing in the portable dumper.
>
> Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
> being freed already has type PVEC_FREE, i.e. has been freed already.
> This object is a "symbol with position", a type of pseudovector which
> doesn't yet exist outside of this scratch branch.
Out of curiosity, why do we need a new C-level type here?
> At a guess, I'm setting some data structure in the C code to a Lisp
> structure containing this object, but failing to apply static protection
> to this C variable. Or something like that.
>
> This failure occurs during the byte compilation of .../lisp/registry.el
> in a make or make bootstrap. The failure only occurs when this byte
> compilation is started as -batch from the command line. So my use of
> GDB is from the command line, not within a running Emacs.
>
> With GDB, I can break at the creation of this symbol-with-position
> object and again at its (first) freeing with this breakpoint:
>
> break setup_on_free_list if (v == 0x5555561d0450)
>
> . However, this isn't helping me to track down the Lisp object which
> still references this symbol-with-position. I've tried to find the
> address of Emacs's data segment, so as to be able to search through it
> for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
> thing to do.
>
> Could somebody who has experience in this sort of thing please suggest
> how I might proceed with the debugging, or possibly offer me some other
> sort of help or hints.
>
> Thanks in advance!
rr is incredibly helpful for debugging this sort of problem. See
https://rr-project.org/. You can record an rr session containing the
crash, replay it, get to the crash, and then reverse-next, reverse-finish,
and reverse-continue your way through the GC, running it in reverse until
you find whatever it is that made mark_object on the dead object happen.
Hardware watchpoints with rr are also very useful and work great in
reverse mode: just use watch -l myvar and reverse-continue to see who last
wrote a memory location, or use rwatch to see who last *read* a location.
(The -l is important since it enables the use of hardware watchpoints.)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 19:09 ` Daniel Colascione
@ 2019-04-02 19:21 ` Eli Zaretskii
2019-04-02 20:46 ` Alan Mackenzie
2019-04-02 20:24 ` Alan Mackenzie
1 sibling, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-04-02 19:21 UTC (permalink / raw)
To: Daniel Colascione; +Cc: acm, emacs-devel
> Date: Tue, 2 Apr 2019 12:09:59 -0700
> From: "Daniel Colascione" <dancol@dancol.org>
> Cc: emacs-devel@gnu.org
>
> rr is incredibly helpful for debugging this sort of problem. See
> https://rr-project.org/. You can record an rr session containing the
> crash, replay it, get to the crash, and then reverse-next, reverse-finish,
> and reverse-continue your way through the GC, running it in reverse until
> you find whatever it is that made mark_object on the dead object happen.
GDB supports reverse execution as well, on some platforms.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 19:09 ` Daniel Colascione
2019-04-02 19:21 ` Eli Zaretskii
@ 2019-04-02 20:24 ` Alan Mackenzie
2019-04-02 20:33 ` Daniel Colascione
1 sibling, 1 reply; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-02 20:24 UTC (permalink / raw)
To: Daniel Colascione; +Cc: emacs-devel
Hello, Daniel.
On Tue, Apr 02, 2019 at 12:09:59 -0700, Daniel Colascione wrote:
> > Hello, Emacs.
> > I get this problem after a recent merge of master into
> > /scratch/accurate-warning-pos (my branch where I'm trying to implement
> > correct source positions in the byte compiler's warning messages). This
> > was a large merge, including bringing in the portable dumper.
> > Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
> > being freed already has type PVEC_FREE, i.e. has been freed already.
> > This object is a "symbol with position", a type of pseudovector which
> > doesn't yet exist outside of this scratch branch.
> Out of curiosity, why do we need a new C-level type here?
It's to help solve a bug in the byte compiler, which up until recently
was intractable. The byte compiler frequently (?usually) reports
incorrect line/column numbers in its warning messages. This is due to
the kludge it uses to keep track of them.
The only current candidate for a fix is for the reader, on a flag being
bound to non-nil, to return "symbols with position" rather than standard
symbols. The "position" associated with the symbol is it's textual
offset from the beginning of the construct in the source file being read.
These symbols with position are implemented as pseudovectors with type
PVEC_SYMBOL_WITH_POS and behave as ordinary symbols for all purposes,
except for when a warning message is being output, when the postion
supplies a correct file/line number for the message.
This works and works well. However it causes an unacceptable slowdown in
Emacs (around 8 - 15 per cent). I'm working on a fix for this, and have
made substantial progress.
The topic was discussed at length in emacs-devel starting November last
year in posts whose Subject: contained "scratch/accurate-warning-pos".
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 20:24 ` Alan Mackenzie
@ 2019-04-02 20:33 ` Daniel Colascione
2019-04-02 21:00 ` Alan Mackenzie
0 siblings, 1 reply; 24+ messages in thread
From: Daniel Colascione @ 2019-04-02 20:33 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Daniel Colascione, emacs-devel
> Hello, Daniel.
>
> On Tue, Apr 02, 2019 at 12:09:59 -0700, Daniel Colascione wrote:
>> > Hello, Emacs.
>
>> > I get this problem after a recent merge of master into
>> > /scratch/accurate-warning-pos (my branch where I'm trying to implement
>> > correct source positions in the byte compiler's warning messages).
>> This
>> > was a large merge, including bringing in the portable dumper.
>
>> > Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
>> > being freed already has type PVEC_FREE, i.e. has been freed already.
>> > This object is a "symbol with position", a type of pseudovector which
>> > doesn't yet exist outside of this scratch branch.
>
>> Out of curiosity, why do we need a new C-level type here?
>
> It's to help solve a bug in the byte compiler, which up until recently
> was intractable. The byte compiler frequently (?usually) reports
> incorrect line/column numbers in its warning messages. This is due to
> the kludge it uses to keep track of them.
>
> The only current candidate for a fix is for the reader, on a flag being
> bound to non-nil, to return "symbols with position" rather than standard
> symbols. The "position" associated with the symbol is it's textual
> offset from the beginning of the construct in the source file being read.
So if I read symbol foo from file1.el and symbol foo from file2.el, I get
two different symbol-with-location instances, each tagged with a different
source location? Do these symbol objects compare eq to each other?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 15:04 ` Eli Zaretskii
@ 2019-04-02 20:42 ` Alan Mackenzie
2019-04-03 4:43 ` Eli Zaretskii
0 siblings, 1 reply; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-02 20:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Tue, Apr 02, 2019 at 18:04:22 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 11:25:37 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > With GDB, I can break at the creation of this symbol-with-position
> > object and again at its (first) freeing with this breakpoint:
> > break setup_on_free_list if (v == 0x5555561d0450)
> > . However, this isn't helping me to track down the Lisp object which
> > still references this symbol-with-position. I've tried to find the
> > address of Emacs's data segment, so as to be able to search through it
> > for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
> > thing to do.
> > Could somebody who has experience in this sort of thing please suggest
> > how I might proceed with the debugging, or possibly offer me some other
> > sort of help or hints.
> The usual method of debugging such problems is described in etc/DEBUG,
Apologies, I didn't see this. I read quite a bit of etc/DEBUG, but for
some reason completely missed the bit about GC problems.
> it basically uses the last_marked[] array. You start with the object
> at last_marked[last_marked_index - 1], and go backwards (in circular
> manner), comparing the objects you find in the array with those you
> see in the call-stack frames that call mark_* functions. Just be very
> careful when you print the objects; e.g., never use 'pp', because the
> function it calls cannot handle marked objects.
I'm having some difficult seeing the entire last_marked array with GDB.
I will try to find a solution in the GDB manual.
> If you already tried this, please ask more specific questions.
No, I hadn't. I didn't know about last_marked. I'll see if I can get
further with its help. Thanks!
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 19:21 ` Eli Zaretskii
@ 2019-04-02 20:46 ` Alan Mackenzie
2019-04-02 21:03 ` Daniel Colascione
2019-04-03 4:39 ` Eli Zaretskii
0 siblings, 2 replies; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-02 20:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Daniel Colascione, emacs-devel
Hello, Eli.
On Tue, Apr 02, 2019 at 22:21:26 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 12:09:59 -0700
> > From: "Daniel Colascione" <dancol@dancol.org>
> > Cc: emacs-devel@gnu.org
> >
> > rr is incredibly helpful for debugging this sort of problem. See
> > https://rr-project.org/. You can record an rr session containing the
> > crash, replay it, get to the crash, and then reverse-next, reverse-finish,
> > and reverse-continue your way through the GC, running it in reverse until
> > you find whatever it is that made mark_object on the dead object happen.
> GDB supports reverse execution as well, on some platforms.
On my GNU/Linux system, I tried to run 'reverse-next', and got the error
message:
Target multi-thread does not support this command.
. :-( I suppose I could reconfigure without multi threading, but then
the bug (which is reproducible) probably wouldn't happen in the same
place.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 20:33 ` Daniel Colascione
@ 2019-04-02 21:00 ` Alan Mackenzie
2019-04-05 4:49 ` Alex
0 siblings, 1 reply; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-02 21:00 UTC (permalink / raw)
To: Daniel Colascione; +Cc: emacs-devel
Hello again, Daniel.
On Tue, Apr 02, 2019 at 13:33:02 -0700, Daniel Colascione wrote:
> > Hello, Daniel.
> > On Tue, Apr 02, 2019 at 12:09:59 -0700, Daniel Colascione wrote:
> >> > Hello, Emacs.
> >> > I get this problem after a recent merge of master into
> >> > /scratch/accurate-warning-pos (my branch where I'm trying to implement
> >> > correct source positions in the byte compiler's warning messages).
> >> This
> >> > was a large merge, including bringing in the portable dumper.
> >> > Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
> >> > being freed already has type PVEC_FREE, i.e. has been freed already.
> >> > This object is a "symbol with position", a type of pseudovector which
> >> > doesn't yet exist outside of this scratch branch.
> >> Out of curiosity, why do we need a new C-level type here?
> > It's to help solve a bug in the byte compiler, which up until recently
> > was intractable. The byte compiler frequently (?usually) reports
> > incorrect line/column numbers in its warning messages. This is due to
> > the kludge it uses to keep track of them.
> > The only current candidate for a fix is for the reader, on a flag being
> > bound to non-nil, to return "symbols with position" rather than standard
> > symbols. The "position" associated with the symbol is it's textual
> > offset from the beginning of the construct in the source file being read.
> So if I read symbol foo from file1.el and symbol foo from file2.el, I get
> two different symbol-with-location instances, each tagged with a different
> source location? Do these symbol objects compare eq to each other?
They do, yes. Otherwise the byte compiler wouldn't work, as it
frequently compares a symbol-with-position with a constant ("ordinary")
symbol using eq.
However, it is envisaged the flag symbols-with-pos-enable will be bound
to non-nil only by the byte compiler. The reader resets this position to
zero for each top-level form it reads.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 20:46 ` Alan Mackenzie
@ 2019-04-02 21:03 ` Daniel Colascione
2019-04-03 4:39 ` Eli Zaretskii
1 sibling, 0 replies; 24+ messages in thread
From: Daniel Colascione @ 2019-04-02 21:03 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Eli Zaretskii, Daniel Colascione, emacs-devel
> Hello, Eli.
>
> On Tue, Apr 02, 2019 at 22:21:26 +0300, Eli Zaretskii wrote:
>> > Date: Tue, 2 Apr 2019 12:09:59 -0700
>> > From: "Daniel Colascione" <dancol@dancol.org>
>> > Cc: emacs-devel@gnu.org
>> >
>> > rr is incredibly helpful for debugging this sort of problem. See
>> > https://rr-project.org/. You can record an rr session containing the
>> > crash, replay it, get to the crash, and then reverse-next,
>> reverse-finish,
>> > and reverse-continue your way through the GC, running it in reverse
>> until
>> > you find whatever it is that made mark_object on the dead object
>> happen.
>
>> GDB supports reverse execution as well, on some platforms.
>
> On my GNU/Linux system, I tried to run 'reverse-next', and got the error
> message:
>
> Target multi-thread does not support this command.
>
> . :-( I suppose I could reconfigure without multi threading, but then
> the bug (which is reproducible) probably wouldn't happen in the same
> place.
I don't think I've ever gotten pure-GDB reverse execution to work
correctly. rr Just Works for me in every instance I've tried it.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 20:46 ` Alan Mackenzie
2019-04-02 21:03 ` Daniel Colascione
@ 2019-04-03 4:39 ` Eli Zaretskii
2019-04-03 10:01 ` Alan Mackenzie
1 sibling, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-04-03 4:39 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: dancol, emacs-devel
> Date: Tue, 2 Apr 2019 20:46:53 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: Daniel Colascione <dancol@dancol.org>, emacs-devel@gnu.org
>
> > GDB supports reverse execution as well, on some platforms.
>
> On my GNU/Linux system, I tried to run 'reverse-next', and got the error
> message:
>
> Target multi-thread does not support this command.
I think you are supposed to record the execution, and then say
(gdb) target record-core
or
(gdb) target record-btrace
before the reverse execution is available.
But I was always able to debug GC problems by using last_marked array.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 20:42 ` Alan Mackenzie
@ 2019-04-03 4:43 ` Eli Zaretskii
2019-04-04 18:57 ` Alan Mackenzie
0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2019-04-03 4:43 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: emacs-devel
> Date: Tue, 2 Apr 2019 20:42:37 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: emacs-devel@gnu.org
>
> I'm having some difficult seeing the entire last_marked array with GDB.
> I will try to find a solution in the GDB manual.
You want "set print elements unlimited", I think.
However, my recommendation is to examine the array one element at a
time, moving back to the previous one only when you understand what
the element you've looked at is and whether it is or isn't related to
the problem. Also, last_marked array is written cyclically, so you
may need to wrap around the index to see the objects in the right
order.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-03 4:39 ` Eli Zaretskii
@ 2019-04-03 10:01 ` Alan Mackenzie
2019-04-03 10:12 ` Eli Zaretskii
2019-04-03 15:23 ` Paul Eggert
0 siblings, 2 replies; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-03 10:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: dancol, emacs-devel
Hello, Eli.
On Wed, Apr 03, 2019 at 07:39:35 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 20:46:53 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: Daniel Colascione <dancol@dancol.org>, emacs-devel@gnu.org
> > > GDB supports reverse execution as well, on some platforms.
> > On my GNU/Linux system, I tried to run 'reverse-next', and got the error
> > message:
> > Target multi-thread does not support this command.
> I think you are supposed to record the execution, and then say
> (gdb) target record-core
> or
> (gdb) target record-btrace
> before the reverse execution is available.
Yes. I thought there was something missing. ;-) There's no mention of
such recording in the GDB manual's "Reverse Execution" page, nor any
cross reference to "Process Record and Replay" there.
I'll try again and see if I can get it working.
> But I was always able to debug GC problems by using last_marked array.
The problem I think I'm up against is that the symbol-with-pos object is
not being marked at a particular garbage_collect_1, and thus gets freed
prematurely.
I intend to get the hex values of the Lisp_Objects which constitute the
list in which the symbol-with-pos is embedded and search for these in
last_marked. Putting a conditional breakpoint on Fcons slows down Emacs
somewhat. ;-)
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-03 10:01 ` Alan Mackenzie
@ 2019-04-03 10:12 ` Eli Zaretskii
2019-04-03 15:23 ` Paul Eggert
1 sibling, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2019-04-03 10:12 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: dancol, emacs-devel
> Date: Wed, 3 Apr 2019 10:01:13 +0000
> Cc: dancol@dancol.org, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> The problem I think I'm up against is that the symbol-with-pos object is
> not being marked at a particular garbage_collect_1, and thus gets freed
> prematurely.
>
> I intend to get the hex values of the Lisp_Objects which constitute the
> list in which the symbol-with-pos is embedded and search for these in
> last_marked. Putting a conditional breakpoint on Fcons slows down Emacs
> somewhat. ;-)
GDB has memory-search commands, see the node "Searching Memory" in the
GDB manual. Maybe this can help.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-03 10:01 ` Alan Mackenzie
2019-04-03 10:12 ` Eli Zaretskii
@ 2019-04-03 15:23 ` Paul Eggert
1 sibling, 0 replies; 24+ messages in thread
From: Paul Eggert @ 2019-04-03 15:23 UTC (permalink / raw)
To: Alan Mackenzie, Eli Zaretskii; +Cc: dancol, emacs-devel
Alan Mackenzie wrote:
> There's no mention of
> such recording in the GDB manual's "Reverse Execution" page, nor any
> cross reference to "Process Record and Replay" there.
I filed a bug report for that here:
https://sourceware.org/bugzilla/show_bug.cgi?id=24417
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-03 4:43 ` Eli Zaretskii
@ 2019-04-04 18:57 ` Alan Mackenzie
0 siblings, 0 replies; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-04 18:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Hello, Eli.
On Wed, Apr 03, 2019 at 07:43:22 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 20:42:37 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: emacs-devel@gnu.org
> > I'm having some difficult seeing the entire last_marked array with GDB.
> > I will try to find a solution in the GDB manual.
> You want "set print elements unlimited", I think.
> However, my recommendation is to examine the array one element at a
> time, moving back to the previous one only when you understand what
> the element you've looked at is and whether it is or isn't related to
> the problem. Also, last_marked array is written cyclically, so you
> may need to wrap around the index to see the objects in the right
> order.
I've found the bug.
In the garbage collection, it's necessary for Qsymbols_with_pos_enabled
to be bound to nil. (That's the variable which enables symbols with
position).
I had bound that variable to nil in Fgarbage_collect, not noticing that
there are calls to the C function garbage_collect which bypass the
primitive. This was the bug.
As a result, the pseudovector (Symbol "nil" at position 339) was caught
by a NILP, causing it not to get marked. So it got swept away, even
though it was still live.
So I've spent several days on this, but as a consolation I now know GDB
much better than I did before. ;-). My branch now builds successfully.
Thanks for all the help!
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-02 21:00 ` Alan Mackenzie
@ 2019-04-05 4:49 ` Alex
2019-04-05 8:26 ` Alan Mackenzie
0 siblings, 1 reply; 24+ messages in thread
From: Alex @ 2019-04-05 4:49 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Daniel Colascione, emacs-devel
Alan Mackenzie <acm@muc.de> writes:
> Hello again, Daniel.
>
> On Tue, Apr 02, 2019 at 13:33:02 -0700, Daniel Colascione wrote:
>
>> So if I read symbol foo from file1.el and symbol foo from file2.el, I get
>> two different symbol-with-location instances, each tagged with a different
>> source location? Do these symbol objects compare eq to each other?
>
> They do, yes. Otherwise the byte compiler wouldn't work, as it
> frequently compares a symbol-with-position with a constant ("ordinary")
> symbol using eq.
>
> However, it is envisaged the flag symbols-with-pos-enable will be bound
> to non-nil only by the byte compiler. The reader resets this position to
> zero for each top-level form it reads.
I apologize if this topic already reached its conclusion, but IMO
having eq return true for two different object types is quite
surprising behaviour. Is it out of the question to leave eq alone and
introduce, e.g., eq-excluding-position that strips possible positions
before comparison?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Help please! To track down GC trying to free an already freed object.
2019-04-05 4:49 ` Alex
@ 2019-04-05 8:26 ` Alan Mackenzie
2019-04-05 17:05 ` Comparing symbol-with-position using eq (was: Help please! To track down GC trying to free an already freed object.) Alex
0 siblings, 1 reply; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-05 8:26 UTC (permalink / raw)
To: Alex; +Cc: Daniel Colascione, emacs-devel
Hello, Alex.
On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
> Alan Mackenzie <acm@muc.de> writes:
> > On Tue, Apr 02, 2019 at 13:33:02 -0700, Daniel Colascione wrote:
> >> So if I read symbol foo from file1.el and symbol foo from file2.el,
> >> I get two different symbol-with-location instances, each tagged with
> >> a different source location? Do these symbol objects compare eq to
> >> each other?
> > They do, yes. Otherwise the byte compiler wouldn't work, as it
> > frequently compares a symbol-with-position with a constant
> > ("ordinary") symbol using eq.
> > However, it is envisaged the flag symbols-with-pos-enable will be bound
> > to non-nil only by the byte compiler. The reader resets this position to
> > zero for each top-level form it reads.
> I apologize if this topic already reached its conclusion, but IMO
> having eq return true for two different object types is quite
> surprising behaviour.
We are comparing two symbols, both of which are 'foo, but one of which is
annotated with its position in a source file. The two symbols are the
same symbol.
I understand the reaction to the idea, though. Even though the
representation of these two objects is different, conceptually they are
the same object.
But consider: on a make bootstrap I did last night, there were 332
warning messages from the byte compiler. Of these, only 80 gave the
correct line/column position, the other 252 being wrong. There have been
several bug reports from users complaining about such false positions.
This is what I'm trying to fix.
> Is it out of the question to leave eq alone and introduce, e.g.,
> eq-excluding-position that strips possible positions before comparison?
It is, rather. To implement this would involve rewriting everything
which calls eq and is used by the byte compiler, to call
eq-excluding-position instead. These functions would need to exist in
two versions. There are rather a lot of functions which use eq. ;-)
My actual strategy is to have two versions of each C primitive used by
the byte compiler, and to switch over to the "symbol-with-position"
version at the start of the byte compiler.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Comparing symbol-with-position using eq (was: Help please! To track down GC trying to free an already freed object.)
2019-04-05 8:26 ` Alan Mackenzie
@ 2019-04-05 17:05 ` Alex
2019-04-05 18:21 ` Comparing symbol-with-position using eq Alan Mackenzie
0 siblings, 1 reply; 24+ messages in thread
From: Alex @ 2019-04-05 17:05 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Daniel Colascione, emacs-devel
Hello, Alan.
Alan Mackenzie <acm@muc.de> writes:
> On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
>
>> I apologize if this topic already reached its conclusion, but IMO
>> having eq return true for two different object types is quite
>> surprising behaviour.
>
> We are comparing two symbols, both of which are 'foo, but one of which is
> annotated with its position in a source file. The two symbols are the
> same symbol.
Is it not comparing a symbol with a pseudovector containing that symbol
and a position?
> I understand the reaction to the idea, though. Even though the
> representation of these two objects is different, conceptually they are
> the same object.
Similar objects, but I don't believe that's enough for eq. Consider that
it's regarded non-portable in Lisp to compare integers with eq since the
same number may be represented by different objects, or (eq 3 3.0), or
(eq (list 1 2) (list 1 2)).
> But consider: on a make bootstrap I did last night, there were 332
> warning messages from the byte compiler. Of these, only 80 gave the
> correct line/column position, the other 252 being wrong. There have been
> several bug reports from users complaining about such false positions.
> This is what I'm trying to fix.
I agree that it's a problem very much worth fixing; thank you for
working on it.
>> Is it out of the question to leave eq alone and introduce, e.g.,
>> eq-excluding-position that strips possible positions before comparison?
>
> It is, rather. To implement this would involve rewriting everything
> which calls eq and is used by the byte compiler, to call
> eq-excluding-position instead. These functions would need to exist in
> two versions. There are rather a lot of functions which use eq. ;-)
Why would you need to rewrite the helper procedures that the byte
compiler uses? What about stripping the position at each relevant call
site?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Comparing symbol-with-position using eq
2019-04-05 17:05 ` Comparing symbol-with-position using eq (was: Help please! To track down GC trying to free an already freed object.) Alex
@ 2019-04-05 18:21 ` Alan Mackenzie
2019-04-05 20:18 ` Daniel Colascione
0 siblings, 1 reply; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-05 18:21 UTC (permalink / raw)
To: Alex; +Cc: Daniel Colascione, emacs-devel
Hello, Alex.
On Fri, Apr 05, 2019 at 11:05:59 -0600, Alex wrote:
> Hello, Alan.
> Alan Mackenzie <acm@muc.de> writes:
> > On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
> >> I apologize if this topic already reached its conclusion, but IMO
> >> having eq return true for two different object types is quite
> >> surprising behaviour.
> > We are comparing two symbols, both of which are 'foo, but one of which is
> > annotated with its position in a source file. The two symbols are the
> > same symbol.
> Is it not comparing a symbol with a pseudovector containing that symbol
> and a position?
At the machine code level, that is what it's doing, yes.
> > I understand the reaction to the idea, though. Even though the
> > representation of these two objects is different, conceptually they are
> > the same object.
> Similar objects, but I don't believe that's enough for eq. Consider that
> it's regarded non-portable in Lisp to compare integers with eq since the
> same number may be represented by different objects, or (eq 3 3.0), or
> (eq (list 1 2) (list 1 2)).
The point is that comparing 'foo with (Symbol "foo" at 339) with `eq',
and returning t doesn't do any harm. On the contrary, it enables correct
source positions to be output in byte compiler warning messages. That it
does no harm is verified by the fact that a make bootstrap with such
annotated symbols works.
However, there is a slight slowdown in this Emacs, compared with the
master branch. The powers that be have intimated that this slowdown is
unacceptable, so I'm having to make more far reaching changes in the C
code to confine this slowdown to byte compilation.
> > But consider: on a make bootstrap I did last night, there were 332
> > warning messages from the byte compiler. Of these, only 80 gave the
> > correct line/column position, the other 252 being wrong. There have been
> > several bug reports from users complaining about such false positions.
> > This is what I'm trying to fix.
> I agree that it's a problem very much worth fixing; thank you for
> working on it.
It's a difficult problem. The idea of annotating symbols with a source
position (this was Stefan M.'s idea) is the only idea which has even come
close to solving this problem. I was struggling with another approach
back in 2016 which involved keeping the source location in a hash table
indexed by the corresponding cons cell. This effort collapsed from the
sheer tedium of the changes needed, coupled with the unlikelihood of
getting the changes working, to say nothing of the fact it would have
rendered the byte compiler unreadable.
> >> Is it out of the question to leave eq alone and introduce, e.g.,
> >> eq-excluding-position that strips possible positions before comparison?
> > It is, rather. To implement this would involve rewriting everything
> > which calls eq and is used by the byte compiler, to call
> > eq-excluding-position instead. These functions would need to exist in
> > two versions. There are rather a lot of functions which use eq. ;-)
> Why would you need to rewrite the helper procedures that the byte
> compiler uses? What about stripping the position at each relevant call
> site?
I'm not sure what you mean here. If by "relevant call site" you mean
"places where `eq' is used", there are just too many of them. They're in
the C code as well as the Lisp. If you mean "places where the helper
procedures are called", then that stripping the positions would negate
the whole point of the symbols with positions, since it is these helper
procedures which output warning messages.
Or did you mean something else?
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Comparing symbol-with-position using eq
2019-04-05 18:21 ` Comparing symbol-with-position using eq Alan Mackenzie
@ 2019-04-05 20:18 ` Daniel Colascione
2019-04-05 21:54 ` Alan Mackenzie
0 siblings, 1 reply; 24+ messages in thread
From: Daniel Colascione @ 2019-04-05 20:18 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Daniel Colascione, Alex, emacs-devel
> Hello, Alex.
>
> On Fri, Apr 05, 2019 at 11:05:59 -0600, Alex wrote:
>> Hello, Alan.
>
>> Alan Mackenzie <acm@muc.de> writes:
>
>> > On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
>
>> >> I apologize if this topic already reached its conclusion, but IMO
>> >> having eq return true for two different object types is quite
>> >> surprising behaviour.
>
>> > We are comparing two symbols, both of which are 'foo, but one of which
>> is
>> > annotated with its position in a source file. The two symbols are the
>> > same symbol.
>
>> Is it not comparing a symbol with a pseudovector containing that symbol
>> and a position?
>
> At the machine code level, that is what it's doing, yes.
>
>> > I understand the reaction to the idea, though. Even though the
>> > representation of these two objects is different, conceptually they
>> are
>> > the same object.
>
>> Similar objects, but I don't believe that's enough for eq. Consider that
>> it's regarded non-portable in Lisp to compare integers with eq since the
>> same number may be represented by different objects, or (eq 3 3.0), or
>> (eq (list 1 2) (list 1 2)).
>
> The point is that comparing 'foo with (Symbol "foo" at 339) with `eq',
> and returning t doesn't do any harm. On the contrary, it enables correct
> source positions to be output in byte compiler warning messages. That it
> does no harm is verified by the fact that a make bootstrap with such
> annotated symbols works.
>
> However, there is a slight slowdown in this Emacs, compared with the
> master branch. The powers that be have intimated that this slowdown is
> unacceptable, so I'm having to make more far reaching changes in the C
> code to confine this slowdown to byte compilation.
I'm also concerned that by overloading eq this way we'll make it easy to
"lose" information about positions. In general, when (eq a b), we can
substitute a for b and vice versa. The objects are equivalent in the
strongest sense. Now, they're not equivalent, and choosing a instead of b
can lead to subtle bugs, especially since we're talking about error-path
and warning-path code that might not be frequently exercised.
You mention that we'd need to change the use of EQ throughout the byte
compiler in order to work with positional symbols properly. Can we just do
that, in one big renaming patch? In cases where we don't want positions,
we can just define a macro making the new eq-for-position function
equivalent to eq.
But yes, it's kind of unfortunate that we haven't been using an explicit
AST representation.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Comparing symbol-with-position using eq
2019-04-05 20:18 ` Daniel Colascione
@ 2019-04-05 21:54 ` Alan Mackenzie
2019-04-05 22:50 ` Paul Eggert
2019-04-06 12:23 ` Clément Pit-Claudel
0 siblings, 2 replies; 24+ messages in thread
From: Alan Mackenzie @ 2019-04-05 21:54 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Alex, emacs-devel
Hello, Daniel.
On Fri, Apr 05, 2019 at 13:18:55 -0700, Daniel Colascione wrote:
> > Hello, Alex.
> > On Fri, Apr 05, 2019 at 11:05:59 -0600, Alex wrote:
> >> Hello, Alan.
> >> Alan Mackenzie <acm@muc.de> writes:
> >> > On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
> >> >> I apologize if this topic already reached its conclusion, but IMO
> >> >> having eq return true for two different object types is quite
> >> >> surprising behaviour.
> >> > We are comparing two symbols, both of which are 'foo, but one of which
> >> is
> >> > annotated with its position in a source file. The two symbols are the
> >> > same symbol.
> >> Is it not comparing a symbol with a pseudovector containing that symbol
> >> and a position?
> > At the machine code level, that is what it's doing, yes.
> >> > I understand the reaction to the idea, though. Even though the
> >> > representation of these two objects is different, conceptually they
> >> are
> >> > the same object.
> >> Similar objects, but I don't believe that's enough for eq. Consider that
> >> it's regarded non-portable in Lisp to compare integers with eq since the
> >> same number may be represented by different objects, or (eq 3 3.0), or
> >> (eq (list 1 2) (list 1 2)).
> > The point is that comparing 'foo with (Symbol "foo" at 339) with `eq',
> > and returning t doesn't do any harm. On the contrary, it enables correct
> > source positions to be output in byte compiler warning messages. That it
> > does no harm is verified by the fact that a make bootstrap with such
> > annotated symbols works.
> > However, there is a slight slowdown in this Emacs, compared with the
> > master branch. The powers that be have intimated that this slowdown is
> > unacceptable, so I'm having to make more far reaching changes in the C
> > code to confine this slowdown to byte compilation.
> I'm also concerned that by overloading eq this way we'll make it easy to
> "lose" information about positions. In general, when (eq a b), we can
> substitute a for b and vice versa.
You could still do that (not that you'd want to), and your code would
still work up to the point where the byte compiler warning output
wouldn't have a position to output, and would degrade to a less accurate
position, in the limit not outputting a position at all. But this isn't
going to happen in practice.
A symbol with position is merely an annotated version of an ordinary
symbol. It behaves identically to that ordinary symbol, provided only
that the enabling flag, symbols-with-pos-enabled, is bound to non-nil.
The normal way these annotated symbols come into existence is via the
reader when a form is read with read-positioning-symbols (as contrasted
with the standard read).
All the details are in the code in branch scratch/accurate-warning-pos.
> The objects are equivalent in the strongest sense. Now, they're not
> equivalent, and choosing a instead of b can lead to subtle bugs,
> especially since we're talking about error-path and warning-path code
> that might not be frequently exercised.
In the byte compiler, the warning path code is all too frequently
exercised. ;-( But I've just found (and fixed) a subtle bug, which was
what this thread was about. The fact that make bootstrap works with
these annotated symbols is a very strong test.
> You mention that we'd need to change the use of EQ throughout the byte
> compiler in order to work with positional symbols properly. Can we just do
> that, in one big renaming patch?
I'm not quite sure what you mean here, but I think the answer's no. The
byte compiler calls C primitives which use EQ.
> In cases where we don't want positions, we can just define a macro
> making the new eq-for-position function equivalent to eq.
> But yes, it's kind of unfortunate that we haven't been using an explicit
> AST representation.
I've been thinking that for the time (nearly 3 years) that I've been
trying to fix this bug. Is this how compilers for other Lisp systems are
written? It seems horribly easy to compile as Emacs does, by taking the
(read) starting form and gradually transforming it as a Lisp form. It is
difficult to keep track of (text) source positions when one does this.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Comparing symbol-with-position using eq
2019-04-05 21:54 ` Alan Mackenzie
@ 2019-04-05 22:50 ` Paul Eggert
2019-04-06 12:23 ` Clément Pit-Claudel
1 sibling, 0 replies; 24+ messages in thread
From: Paul Eggert @ 2019-04-05 22:50 UTC (permalink / raw)
To: Alan Mackenzie, Daniel Colascione; +Cc: Alex, emacs-devel
On 4/5/19 4:54 PM, Alan Mackenzie wrote:
> I've been thinking that for the time (nearly 3 years) that I've been
> trying to fix this bug. Is this how compilers for other Lisp systems are
> written? It seems horribly easy to compile as Emacs does, by taking the
> (read) starting form and gradually transforming it as a Lisp form.
Sure, it's standard for Lisp compilers to use a representation that is
somewhat more complicated than the original. This kind of practice goes
back a long way. For example, the Multics MACLISP compiler, although it
didn't do a full AST, systematically used a different representation
(i.e., not simple symbols) for variables, a representation that let the
compiler issue more-precise diagnostics. See Bernard Greenberg's
tutorial <https://multicians.org/lcp.html>.
Although this sort of thing does complicate the compiler, that's
typically better than complicating 'eq'. 'eq' is supposed to be verrry
simple and straightforward.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Comparing symbol-with-position using eq
2019-04-05 21:54 ` Alan Mackenzie
2019-04-05 22:50 ` Paul Eggert
@ 2019-04-06 12:23 ` Clément Pit-Claudel
1 sibling, 0 replies; 24+ messages in thread
From: Clément Pit-Claudel @ 2019-04-06 12:23 UTC (permalink / raw)
To: emacs-devel
On 2019-04-05 17:54, Alan Mackenzie wrote:
> I've been thinking that for the time (nearly 3 years) that I've been
> trying to fix this bug. Is this how compilers for other Lisp systems are
> written? It seems horribly easy to compile as Emacs does, by taking the
> (read) starting form and gradually transforming it as a Lisp form. It is
> difficult to keep track of (text) source positions when one does this.
The following page might be of interest, about how Racket does this sort of things: https://docs.racket-lang.org/reference/Syntax_Quoting__quote-syntax.html
The idea is that macros (not just the byte-compiler) may want to access position information, to issue better diagnostics. This is useful for small languages implemented using macros, like cl-loop.
Clément.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2019-04-06 12:23 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-02 11:25 Help please! To track down GC trying to free an already freed object Alan Mackenzie
2019-04-02 15:04 ` Eli Zaretskii
2019-04-02 20:42 ` Alan Mackenzie
2019-04-03 4:43 ` Eli Zaretskii
2019-04-04 18:57 ` Alan Mackenzie
2019-04-02 19:09 ` Daniel Colascione
2019-04-02 19:21 ` Eli Zaretskii
2019-04-02 20:46 ` Alan Mackenzie
2019-04-02 21:03 ` Daniel Colascione
2019-04-03 4:39 ` Eli Zaretskii
2019-04-03 10:01 ` Alan Mackenzie
2019-04-03 10:12 ` Eli Zaretskii
2019-04-03 15:23 ` Paul Eggert
2019-04-02 20:24 ` Alan Mackenzie
2019-04-02 20:33 ` Daniel Colascione
2019-04-02 21:00 ` Alan Mackenzie
2019-04-05 4:49 ` Alex
2019-04-05 8:26 ` Alan Mackenzie
2019-04-05 17:05 ` Comparing symbol-with-position using eq (was: Help please! To track down GC trying to free an already freed object.) Alex
2019-04-05 18:21 ` Comparing symbol-with-position using eq Alan Mackenzie
2019-04-05 20:18 ` Daniel Colascione
2019-04-05 21:54 ` Alan Mackenzie
2019-04-05 22:50 ` Paul Eggert
2019-04-06 12:23 ` Clément Pit-Claudel
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).