unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* GC and stack marking
@ 2014-05-19 16:31 Eli Zaretskii
  2014-05-19 18:47 ` Paul Eggert
  2014-05-20 13:44 ` Stefan Monnier
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-19 16:31 UTC (permalink / raw)
  To: emacs-devel; +Cc: Fabrice Popineau

I have a question regarding GC and stack marking.

This issue popped up during testing of the new code written by Fabrice
for managing Emacs memory on MS-Windows.  I don't think this issue is
Windows specific, and I don't think the details of the new
implementation matter for what I'm about to ask (but if someone wants
the gory details, please holler).

The short version of the question is: is it possible that a Lisp
object which is no longer referenced by anything won't be GC'ed
because it is marked by mark_stack due to some kind of coincidence?

The specific situation where I think I see something like this is
during dumping.  When temacs loads and runs loadup.el, it does this
near the beginning:

  (if (eq t purify-flag)
      (setq purify-flag (make-hash-table :test 'equal :size 70000)))

This creates a large hash-table and stores its reference in
purify-flag.  Then, after loading all the preloaded packages, temacs
does this:

  ;; Avoid error if user loads some more libraries now and make sure the
  ;; hash-consing hash table is GC'd.
  (setq purify-flag nil)

  (if (null (garbage-collect))
      (setq pure-space-overflow t))

Note the comment: "...and make sure the hash-consing hash table is
GC'd.".  Well, on one machine to which I have access, it isn't GC'd.
Why? because mark_stack happens to find its address somewhere on the
stack.  (I have a backtrace to prove it.)  So the huge hash-table gets
dumped into the emacs executable, and causes all kinds of trouble in
the dumped Emacs.

On another machine (with a different version of the OS and of GCC),
the problem doesn't happen, and the table is indeed GC'd.

My question is: is this a legitimate situation?  Since all mark_stack
does is look for values recorded in the red-black tree, it might find
such a value by sheer luck (or lack thereof).  Right?  Or is this a
bug that needs to be researched further?

If this can legitimately happen, then how can we make sure this
hash-table indeed gets GC'd before we dump Emacs?

TIA



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 16:31 GC and stack marking Eli Zaretskii
@ 2014-05-19 18:47 ` Paul Eggert
  2014-05-19 19:14   ` Eli Zaretskii
  2014-05-20 13:44 ` Stefan Monnier
  1 sibling, 1 reply; 40+ messages in thread
From: Paul Eggert @ 2014-05-19 18:47 UTC (permalink / raw)
  To: Eli Zaretskii, emacs-devel; +Cc: Fabrice Popineau

On 05/19/2014 09:31 AM, Eli Zaretskii wrote:
> is it possible that a Lisp object which is no longer referenced by anything won't be GC'ed because it is marked by mark_stack due to some kind of coincidence?

Yes.  Normally Emacs uses a conservative approach, which means it 
occasionally does not collect something that is in fact garbage.  See, 
for example, 
<https://www.gnu.org/software/guile/manual/html_node/Conservative-GC.html>.

> how can we make sure this hash-table indeed gets GC'd before we dump Emacs?
>

We could have the garbage collector treat purify-flag specially, I suppose.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 18:47 ` Paul Eggert
@ 2014-05-19 19:14   ` Eli Zaretskii
  2014-05-19 19:58     ` Paul Eggert
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-19 19:14 UTC (permalink / raw)
  To: Paul Eggert; +Cc: fabrice.popineau, emacs-devel

> Date: Mon, 19 May 2014 11:47:28 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: Fabrice Popineau <fabrice.popineau@gmail.com>
> 
> On 05/19/2014 09:31 AM, Eli Zaretskii wrote:
> > is it possible that a Lisp object which is no longer referenced by anything won't be GC'ed because it is marked by mark_stack due to some kind of coincidence?
> 
> Yes.  Normally Emacs uses a conservative approach, which means it 
> occasionally does not collect something that is in fact garbage.  See, 
> for example, 
> <https://www.gnu.org/software/guile/manual/html_node/Conservative-GC.html>.

Thanks for confirming.  I couldn't explain what I saw in the debugger
except as such a coincidence.

> > how can we make sure this hash-table indeed gets GC'd before we dump Emacs?
> >
> 
> We could have the garbage collector treat purify-flag specially, I suppose.

I'm not sure I understand the suggestion.  Can you elaborate?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 19:14   ` Eli Zaretskii
@ 2014-05-19 19:58     ` Paul Eggert
  2014-05-19 20:03       ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Eggert @ 2014-05-19 19:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

On 05/19/2014 12:14 PM, Eli Zaretskii wrote:
>> >
>> >We could have the garbage collector treat purify-flag specially, I suppose.
> I'm not sure I understand the suggestion.  Can you elaborate?

I was thinking of a horrible hack where the GC knows about purify-flag, 
so that when you set purify-flag to nil the GCC immediately frees the 
object that purify-flag used to contain, and that we make purify-flag 
special in this way.

I hope there's a better idea out there somewhere.  Maybe we should get 
rid of the hash-table purify-flag hack, for example.  I'm just thinking 
out loud here.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 19:58     ` Paul Eggert
@ 2014-05-19 20:03       ` Eli Zaretskii
  2014-05-19 20:17         ` Paul Eggert
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-19 20:03 UTC (permalink / raw)
  To: Paul Eggert; +Cc: fabrice.popineau, emacs-devel

> Date: Mon, 19 May 2014 12:58:34 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org, fabrice.popineau@gmail.com
> 
> On 05/19/2014 12:14 PM, Eli Zaretskii wrote:
> >> >
> >> >We could have the garbage collector treat purify-flag specially, I suppose.
> > I'm not sure I understand the suggestion.  Can you elaborate?
> 
> I was thinking of a horrible hack where the GC knows about purify-flag, 
> so that when you set purify-flag to nil the GCC immediately frees the 
> object that purify-flag used to contain, and that we make purify-flag 
> special in this way.

Right, but that would only work for that single object.  The problem,
by contrast, sounds more general than that.

> Maybe we should get rid of the hash-table purify-flag hack, for
> example.

Maybe, I'm not sure I fully understand its purpose to begin with
(speed of finding objects?).  There's a comment by Stefan there saying
something about some savings.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 20:03       ` Eli Zaretskii
@ 2014-05-19 20:17         ` Paul Eggert
  2014-05-20 16:37           ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Paul Eggert @ 2014-05-19 20:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

On 05/19/2014 01:03 PM, Eli Zaretskii wrote:
>   The problem, by contrast, sounds more general than that.

Yes, it's a general problem with conservative garbage collection; it's 
why such garbage collection is called "conservative" rather than "accurate".

If it's essential that GC be accurate, then Emacs shouldn't be using 
conservative GC.  My impression, though, is that the goal is to arrange 
Emacs's internals so that accurate GC isn't essential.  If purify-flag 
is a counterexample, it's almost surely simpler to change howpurify-flag 
works than to insist on accurate GC.

What happens if you change this:

(setq purify-flag nil)

to something like this?

(clrhash purify-flag)
(setq purify-flag nil)




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 16:31 GC and stack marking Eli Zaretskii
  2014-05-19 18:47 ` Paul Eggert
@ 2014-05-20 13:44 ` Stefan Monnier
  2014-05-20 16:57   ` Eli Zaretskii
  2014-05-31  6:31   ` Florian Weimer
  1 sibling, 2 replies; 40+ messages in thread
From: Stefan Monnier @ 2014-05-20 13:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Fabrice Popineau, emacs-devel

> The short version of the question is: is it possible that a Lisp
> object which is no longer referenced by anything won't be GC'ed
> because it is marked by mark_stack due to some kind of coincidence?

Yes, of course, it's what makes a conservative marking conservative.

> So the huge hash-table gets dumped into the emacs executable, and

That's bad luck, indeed.

> causes all kinds of trouble in the dumped Emacs.

But it shouldn't cause any trouble (other than extra memory use).

> If this can legitimately happen, then how can we make sure this
> hash-table indeed gets GC'd before we dump Emacs?

First we should make sure that even if this table is not GC'd, Emacs
behaves correctly.  Otherwise, we probably have a bug that can appear in
other situations.

As for ensuring that the table gets' GC'd, there are 2 approaches:
- provide a low-level "free-this-table" function which loadup.el could
  use.  This is dangerous, since it basically says "trust me there are
  no other references to this object".  Even implementing this function
  can be tricky; it would probably be easier to provide it as
  a C function only.
- find where the spurious "reference" is coming from and add code to set
  this reference to some other value (e.g. it might be some variable
  left uninitialized, or a dead variable which we could explicitly set
  back to NULL or something), or to mark this memory location as "not
  a pointer" (like GCPRO but reversed: we'd do a NEGATIVE_GCPRO on the
  var (presumably of a type like int or float)).

The Boehm's GC has developed ways to do this second option
automatically: if during a GC, a memory cell is found to "point to"
unallocated memory, then it is assumed to be of non-pointer type and
this fact is recorded somewhere so that if in subsequent GC's this cell
ends up "pointing" to allocated memory that won't be considered as an
actual pointer.  This can be very important when you get close to using
the whole address space, in which case most addresses are allocated, so
that many/most ints and floats end up spuriously pointing "somewhere".

This doesn't work for us, tho, because we don't know when a stack
location is reused for some other purpose (i.e. when it changes type),
and more importantly because we have the Lisp_Object type which is
a memory cell which can sometimes contain integers and
sometimes pointers.  OTOH, we are only conservative w.r.t stack
scanning, so we're only subject to spurious pointers coming from the
stack, not from the rest of the heap.  And furthermore we have the great
advantage that, as an interactive application, our stack regularly comes
back to "almost empty" (and since we do "opportunistic GC" during idle
time, we often GC at the very moment the stack is almost empty).


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-19 20:17         ` Paul Eggert
@ 2014-05-20 16:37           ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-20 16:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: fabrice.popineau, emacs-devel

> Date: Mon, 19 May 2014 13:17:40 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org, fabrice.popineau@gmail.com
> 
> What happens if you change this:
> 
> (setq purify-flag nil)
> 
> to something like this?
> 
> (clrhash purify-flag)
> (setq purify-flag nil)

Thanks, but it didn't help.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 13:44 ` Stefan Monnier
@ 2014-05-20 16:57   ` Eli Zaretskii
  2014-05-20 17:54     ` Stefan Monnier
  2014-05-20 19:12     ` Daniel Colascione
  2014-05-31  6:31   ` Florian Weimer
  1 sibling, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-20 16:57 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org, Fabrice Popineau <fabrice.popineau@gmail.com>
> Date: Tue, 20 May 2014 09:44:05 -0400
> 
> > The short version of the question is: is it possible that a Lisp
> > object which is no longer referenced by anything won't be GC'ed
> > because it is marked by mark_stack due to some kind of coincidence?
> 
> Yes, of course, it's what makes a conservative marking conservative.

I have nothing against conservative, but this failure to GC is too
spectacular to ignore.

> > So the huge hash-table gets dumped into the emacs executable, and
> 
> That's bad luck, indeed.
> 
> > causes all kinds of trouble in the dumped Emacs.
> 
> But it shouldn't cause any trouble (other than extra memory use).

It does, due to all kinds of subtleties.  The result is that the
large_vectors linked list gets dumped with a pointer to a non-existent
memory, and the dumped Emacs then crashes on the first GC when it
tries to traverse that linked list.

> > If this can legitimately happen, then how can we make sure this
> > hash-table indeed gets GC'd before we dump Emacs?
> 
> First we should make sure that even if this table is not GC'd, Emacs
> behaves correctly.

Fabrice might have found a work-around, so there is hope.  I found a
way to kludge around it, but my solution is more fragile.

Otherwise, we probably have a bug that can appear in
> other situations.
> 
> - find where the spurious "reference" is coming from and add code to set
>   this reference to some other value

I think this is hopeless: I see this problem on a single system; two
others don't have it.  It's just some semi-random garbage somehwre on
the stack.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 16:57   ` Eli Zaretskii
@ 2014-05-20 17:54     ` Stefan Monnier
  2014-05-20 19:28       ` Eli Zaretskii
  2014-05-20 19:12     ` Daniel Colascione
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2014-05-20 17:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

>> But it shouldn't cause any trouble (other than extra memory use).
> It does, due to all kinds of subtleties.  The result is that the
> large_vectors linked list gets dumped with a pointer to a non-existent
> memory, and the dumped Emacs then crashes on the first GC when it
> tries to traverse that linked list.

We should fix that.

> I think this is hopeless: I see this problem on a single system; two
> others don't have it.  It's just some semi-random garbage somehwre on
> the stack.

Of course, but if you can find where it comes from, we can fix that
one case.  After all, we don't know of any other anyway.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 16:57   ` Eli Zaretskii
  2014-05-20 17:54     ` Stefan Monnier
@ 2014-05-20 19:12     ` Daniel Colascione
  2014-05-20 19:43       ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Colascione @ 2014-05-20 19:12 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

On 05/20/2014 09:57 AM, Eli Zaretskii wrote:
>> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
>> Cc: emacs-devel@gnu.org, Fabrice Popineau <fabrice.popineau@gmail.com>
>> Date: Tue, 20 May 2014 09:44:05 -0400
>>
>>> The short version of the question is: is it possible that a Lisp
>>> object which is no longer referenced by anything won't be GC'ed
>>> because it is marked by mark_stack due to some kind of coincidence?
>>
>> Yes, of course, it's what makes a conservative marking conservative.
> 
> I have nothing against conservative, but this failure to GC is too
> spectacular to ignore.
> 
>>> So the huge hash-table gets dumped into the emacs executable, and
>>
>> That's bad luck, indeed.
>>
>>> causes all kinds of trouble in the dumped Emacs.
>>
>> But it shouldn't cause any trouble (other than extra memory use).
> 
> It does, due to all kinds of subtleties.  The result is that the
> large_vectors linked list gets dumped with a pointer to a non-existent
> memory, and the dumped Emacs then crashes on the first GC when it
> tries to traverse that linked list.

Can you elaborate on how that happens? This behavior sounds like a plain
GC bug.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 17:54     ` Stefan Monnier
@ 2014-05-20 19:28       ` Eli Zaretskii
  2014-05-20 22:01         ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-20 19:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org, fabrice.popineau@gmail.com
> Date: Tue, 20 May 2014 13:54:16 -0400
> 
> >> But it shouldn't cause any trouble (other than extra memory use).
> > It does, due to all kinds of subtleties.  The result is that the
> > large_vectors linked list gets dumped with a pointer to a non-existent
> > memory, and the dumped Emacs then crashes on the first GC when it
> > tries to traverse that linked list.
> 
> We should fix that.

No argument here.  Otherwise the dumped Emacs crashes.

> > I think this is hopeless: I see this problem on a single system; two
> > others don't have it.  It's just some semi-random garbage somehwre on
> > the stack.
> 
> Of course, but if you can find where it comes from, we can fix that
> one case.

I tried, but couldn't.  Suggestions for how to set up a GDB session
for that are welcome.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 19:12     ` Daniel Colascione
@ 2014-05-20 19:43       ` Eli Zaretskii
  2014-05-20 22:03         ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-20 19:43 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: fabrice.popineau, monnier, emacs-devel

> Date: Tue, 20 May 2014 12:12:45 -0700
> From: Daniel Colascione <dancol@dancol.org>
> CC: fabrice.popineau@gmail.com, emacs-devel@gnu.org
> 
> >> But it shouldn't cause any trouble (other than extra memory use).
> > 
> > It does, due to all kinds of subtleties.  The result is that the
> > large_vectors linked list gets dumped with a pointer to a non-existent
> > memory, and the dumped Emacs then crashes on the first GC when it
> > tries to traverse that linked list.
> 
> Can you elaborate on how that happens? This behavior sounds like a plain
> GC bug.

It's not a bug in GC.  The memory management scheme that Fabrice wrote
does not dump the heap (because doing that is problematic on Windows,
and requires addition of a separate section to the executable, which
then precludes its stripping, and has also other complexities).
Instead, temacs uses a private fixed-address heap that is located in a
static array, and whose memory is allocated by a replacement malloc
function.  So any address that points to memory allocated not in that
array, but in the real heap provided by malloc from libc, cannot be
safely dumped, because in the dumped Emacs it will point to some
random location.

Now, the large_vectors list is a linked list chained via the next
pointer.  If one of these next pointers points to a memory on the
heap, following it in the dumped Emacs will surely crash.  There's no
way GC can work around that, when it traverses that linked list.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 19:28       ` Eli Zaretskii
@ 2014-05-20 22:01         ` Stefan Monnier
  2014-05-21  2:48           ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2014-05-20 22:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

> I tried, but couldn't.  Suggestions for how to set up a GDB session
> for that are welcome.

I guess you could try the following:
- interrupt the dump just before setting purify-flag to nil.
- get the value of purify-flag.
- set a conditional break point in mark_object that catches the case
  where the argument is equal to the value you just got.
- "c"


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 19:43       ` Eli Zaretskii
@ 2014-05-20 22:03         ` Stefan Monnier
  2014-05-21  2:51           ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2014-05-20 22:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Daniel Colascione, fabrice.popineau, emacs-devel

> It's not a bug in GC.  The memory management scheme that Fabrice wrote
> does not dump the heap (because doing that is problematic on Windows,
> and requires addition of a separate section to the executable, which
> then precludes its stripping, and has also other complexities).
> Instead, temacs uses a private fixed-address heap that is located in a
> static array, and whose memory is allocated by a replacement malloc
> function.  So any address that points to memory allocated not in that
> array, but in the real heap provided by malloc from libc, cannot be
> safely dumped, because in the dumped Emacs it will point to some
> random location.

OK, so why is the hash table allocated elsewhere then the other objects
(I understand why one might want to do that, but the question is about
what is different in the code in the case of this purify-flag hash-table
compared to other vectors/hashtables allocated during the dump).

Is it just based on size?  I.e. would the same problem show up if some
large vector were to be allocated (and not freed) before dumping?


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 22:01         ` Stefan Monnier
@ 2014-05-21  2:48           ` Eli Zaretskii
  2014-05-21  3:01             ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21  2:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org,  fabrice.popineau@gmail.com
> Date: Tue, 20 May 2014 18:01:05 -0400
> 
> > I tried, but couldn't.  Suggestions for how to set up a GDB session
> > for that are welcome.
> 
> I guess you could try the following:
> - interrupt the dump just before setting purify-flag to nil.
> - get the value of purify-flag.
> - set a conditional break point in mark_object that catches the case
>   where the argument is equal to the value you just got.
> - "c"

That's how I found out that it was being marked by mark_stack.  But
that doesn't tell you how that value _got_ on the stack, does it?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 22:03         ` Stefan Monnier
@ 2014-05-21  2:51           ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21  2:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: dancol, fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Daniel Colascione <dancol@dancol.org>,  fabrice.popineau@gmail.com,  emacs-devel@gnu.org
> Date: Tue, 20 May 2014 18:03:51 -0400
> 
> > It's not a bug in GC.  The memory management scheme that Fabrice wrote
> > does not dump the heap (because doing that is problematic on Windows,
> > and requires addition of a separate section to the executable, which
> > then precludes its stripping, and has also other complexities).
> > Instead, temacs uses a private fixed-address heap that is located in a
> > static array, and whose memory is allocated by a replacement malloc
> > function.  So any address that points to memory allocated not in that
> > array, but in the real heap provided by malloc from libc, cannot be
> > safely dumped, because in the dumped Emacs it will point to some
> > random location.
> 
> OK, so why is the hash table allocated elsewhere then the other objects
> (I understand why one might want to do that, but the question is about
> what is different in the code in the case of this purify-flag hash-table
> compared to other vectors/hashtables allocated during the dump).

Because fixed-address heaps on Windows are limited to allocations
whose size is at most 0x7f000, and one of the vectors allocated for a
70K hash-table is larger than that.

> Is it just based on size?  I.e. would the same problem show up if some
> large vector were to be allocated (and not freed) before dumping?

Yes.  And not just large vectors, any large object (e.g., string).
And that's what scared me, because I can always find a solution for
the case I know of, but how to make this reliable in the face of
future changes in Emacs?

Anyway, it looks like Fabrice found a way to work around the above
limitation, so I guess this issue is no longer such a big problem.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21  2:48           ` Eli Zaretskii
@ 2014-05-21  3:01             ` Stefan Monnier
  2014-05-21 15:39               ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2014-05-21  3:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

> That's how I found out that it was being marked by mark_stack.  But
> that doesn't tell you how that value _got_ on the stack, does it?

No, but it does tell you its address in the stack, so you can then walk
up the backtrace and look at the address of local variables until you
(hopefully) find the one that matters.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21  3:01             ` Stefan Monnier
@ 2014-05-21 15:39               ` Eli Zaretskii
  2014-05-21 15:57                 ` Dmitry Antipov
  2014-05-21 17:40                 ` Stefan Monnier
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21 15:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org,  fabrice.popineau@gmail.com
> Date: Tue, 20 May 2014 23:01:24 -0400
> 
> > That's how I found out that it was being marked by mark_stack.  But
> > that doesn't tell you how that value _got_ on the stack, does it?
> 
> No, but it does tell you its address in the stack, so you can then walk
> up the backtrace and look at the address of local variables until you
> (hopefully) find the one that matters.

I already tried that before, and came up empty-handed.  I tried again
now; the address of that value on the stack does not correspond to any
local variable in the corresponding stack frame, and I also cannot
find that address in the disassembly of the function whose stack frame
includes the value.

I might try setting a watchpoint at that address, but that might be
impractical; we shall see.

Now, I have a question: mark_stack stops examining the stack when it
gets to its own stack frame.  That is certainly safe, but it sounds
too conservative: it should stop at the stack frame of
Fgarbage_collect, I think, because no live Lisp object can appear
while Fgarbage_collect runs, right?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 15:39               ` Eli Zaretskii
@ 2014-05-21 15:57                 ` Dmitry Antipov
  2014-05-21 16:06                   ` Dmitry Antipov
  2014-05-21 16:53                   ` Eli Zaretskii
  2014-05-21 17:40                 ` Stefan Monnier
  1 sibling, 2 replies; 40+ messages in thread
From: Dmitry Antipov @ 2014-05-21 15:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, Stefan Monnier, emacs-devel

On 05/21/2014 07:39 PM, Eli Zaretskii wrote:

> Now, I have a question: mark_stack stops examining the stack when it
> gets to its own stack frame.  That is certainly safe, but it sounds
> too conservative: it should stop at the stack frame of
> Fgarbage_collect, I think, because no live Lisp object can appear
> while Fgarbage_collect runs, right?

1) Yes, but you need ABI- and machine-specific tricks to find the stack frame boundaries. I.e.
while in mark_stack, there is no easy way to find start and end of Fgarbage_collect's stack frame.

2) But see GCC's __builtin_frame_address, https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html.

3) But even if 2) works on all platforms we have to support, I don't see a reasons to complicate
    GC just to avoid scanning a few tens of bytes of an extra stack frame.

Dmitry




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 15:57                 ` Dmitry Antipov
@ 2014-05-21 16:06                   ` Dmitry Antipov
  2014-05-21 16:55                     ` Eli Zaretskii
  2014-05-21 16:53                   ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Dmitry Antipov @ 2014-05-21 16:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, Stefan Monnier, emacs-devel

On 05/21/2014 07:57 PM, Dmitry Antipov wrote:

> 1) Yes, but you need ABI- and machine-specific tricks to find the stack frame boundaries. I.e.
> while in mark_stack, there is no easy way to find start and end of Fgarbage_collect's stack frame.
>
> 2) But see GCC's __builtin_frame_address, https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html.
>
> 3) But even if 2) works on all platforms we have to support, I don't see a reasons to complicate
>     GC just to avoid scanning a few tens of bytes of an extra stack frame.

4) mark_stack calls __builtin_unwind_init to save registers onto the stack. So if you stop
    at Fgarbage_collect's stack frame, you don't scan the frame of mark_stack too and may
    loose Lisp_Objects accessible from registers.

Dmitry




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 15:57                 ` Dmitry Antipov
  2014-05-21 16:06                   ` Dmitry Antipov
@ 2014-05-21 16:53                   ` Eli Zaretskii
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21 16:53 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: fabrice.popineau, monnier, emacs-devel

> Date: Wed, 21 May 2014 19:57:42 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: Stefan Monnier <monnier@iro.umontreal.ca>, 
>  fabrice.popineau@gmail.com, emacs-devel@gnu.org
> 
> On 05/21/2014 07:39 PM, Eli Zaretskii wrote:
> 
> > Now, I have a question: mark_stack stops examining the stack when it
> > gets to its own stack frame.  That is certainly safe, but it sounds
> > too conservative: it should stop at the stack frame of
> > Fgarbage_collect, I think, because no live Lisp object can appear
> > while Fgarbage_collect runs, right?
> 
> 1) Yes, but you need ABI- and machine-specific tricks to find the stack frame boundaries. I.e.
> while in mark_stack, there is no easy way to find start and end of Fgarbage_collect's stack frame.

I thought of passing that to mark_stack as argument when
Fgarbage_collect calls it.  That should work as well as what we do in
mark_stack to find its own stack frame, no?

> 3) But even if 2) works on all platforms we have to support, I don't see a reasons to complicate
>     GC just to avoid scanning a few tens of bytes of an extra stack frame.

The issue discussed in this thread _is_ that reason: we are dumping
Emacs with a dead object, for no good reason, and that object is quite
large (around 1MB).



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 16:06                   ` Dmitry Antipov
@ 2014-05-21 16:55                     ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21 16:55 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: fabrice.popineau, monnier, emacs-devel

> Date: Wed, 21 May 2014 20:06:55 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: fabrice.popineau@gmail.com, Stefan Monnier <monnier@iro.umontreal.ca>, 
>  emacs-devel@gnu.org
> 
> 4) mark_stack calls __builtin_unwind_init to save registers onto the stack. So if you stop
>     at Fgarbage_collect's stack frame, you don't scan the frame of mark_stack too and may
>     loose Lisp_Objects accessible from registers.

We could call __builtin_unwind_init in Fgarbage_collect, no?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 15:39               ` Eli Zaretskii
  2014-05-21 15:57                 ` Dmitry Antipov
@ 2014-05-21 17:40                 ` Stefan Monnier
  2014-05-21 17:58                   ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2014-05-21 17:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

> I already tried that before, and came up empty-handed.  I tried again
> now; the address of that value on the stack does not correspond to any
> local variable in the corresponding stack frame, and I also cannot
> find that address in the disassembly of the function whose stack frame
> includes the value.

It might simply be a slot that's unused by the current stack frame,
whose value comes from some stack frame that existed some time in
the past.

Which stack frame is that?  Is it high up or very deep (both of which
we could hope to solve by using tighter bounds on the start and end
addresses of the stack scan), or neither?

> Now, I have a question: mark_stack stops examining the stack when it
> gets to its own stack frame.  That is certainly safe, but it sounds
> too conservative: it should stop at the stack frame of
> Fgarbage_collect, I think, because no live Lisp object can appear
> while Fgarbage_collect runs, right?

Sounds right, yes.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 17:40                 ` Stefan Monnier
@ 2014-05-21 17:58                   ` Eli Zaretskii
  2014-05-22 15:20                     ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21 17:58 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org,  fabrice.popineau@gmail.com
> Date: Wed, 21 May 2014 13:40:21 -0400
> 
> > I already tried that before, and came up empty-handed.  I tried again
> > now; the address of that value on the stack does not correspond to any
> > local variable in the corresponding stack frame, and I also cannot
> > find that address in the disassembly of the function whose stack frame
> > includes the value.
> 
> It might simply be a slot that's unused by the current stack frame,
> whose value comes from some stack frame that existed some time in
> the past.

That's probably what it is, yes.

> Which stack frame is that?

The one of Fgarbage_collect.  That's why I asked about mark_stack
looking for objects too high on the stack.

> > Now, I have a question: mark_stack stops examining the stack when it
> > gets to its own stack frame.  That is certainly safe, but it sounds
> > too conservative: it should stop at the stack frame of
> > Fgarbage_collect, I think, because no live Lisp object can appear
> > while Fgarbage_collect runs, right?
> 
> Sounds right, yes.

I will try that and see if that helps.  Of course, if my reading of
GDB data is correct, and the value was indeed in the
Fgarbage_collect's stack frame, it must help.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
@ 2014-05-21 19:31 Barry OReilly
  2014-05-21 20:13 ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Barry OReilly @ 2014-05-21 19:31 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 339 bytes --]

> It might simply be a slot that's unused by the current stack frame,
> whose value comes from some stack frame that existed some time in
> the past.

So should the relevant C code try to initialize variables with non
garbage? I took a look at Fgarbage_collect and found that the
stack_top_variable variable for example is garbage valued.

[-- Attachment #2: Type: text/html, Size: 402 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 19:31 Barry OReilly
@ 2014-05-21 20:13 ` Eli Zaretskii
  2014-05-21 20:49   ` Barry OReilly
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-21 20:13 UTC (permalink / raw)
  To: Barry OReilly; +Cc: emacs-devel

> Date: Wed, 21 May 2014 15:31:49 -0400
> From: Barry OReilly <gundaetiapo@gmail.com>
> 
> So should the relevant C code try to initialize variables with non
> garbage?

No.  It's prohibitively expensive.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 20:13 ` Eli Zaretskii
@ 2014-05-21 20:49   ` Barry OReilly
  2014-05-22  2:43     ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Barry OReilly @ 2014-05-21 20:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 300 bytes --]

Even if we're only talking about the stack variables in the frames that are
active during your particular problematic case (and perhaps in the idle
Emacs GC case)?

Have you already ruled out whether stack_top_variable contributes one of
the bytes in your false positive lookup in the mem_node tree?

[-- Attachment #2: Type: text/html, Size: 342 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 20:49   ` Barry OReilly
@ 2014-05-22  2:43     ` Eli Zaretskii
  2014-05-22  3:12       ` Daniel Colascione
  2014-05-22 14:59       ` Barry OReilly
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-22  2:43 UTC (permalink / raw)
  To: Barry OReilly; +Cc: emacs-devel

> Date: Wed, 21 May 2014 16:49:22 -0400
> From: Barry OReilly <gundaetiapo@gmail.com>
> Cc: emacs-devel@gnu.org
> 
> Even if we're only talking about the stack variables in the frames that are
> active during your particular problematic case (and perhaps in the idle
> Emacs GC case)?

I thought you were asking about having the compiler generate the code
to do that, which would then happen everywhere.

If you propose doing that selectively, I don't know how this would be
possible, since on the C level you don't have a way of telling how
much stack is allocated in a given function.

> Have you already ruled out whether stack_top_variable contributes one of
> the bytes in your false positive lookup in the mem_node tree?

Yes.  I looked at all the local variables in that stack frame, and
their addresses on the stack are different from the one that triggers
the problem.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22  2:43     ` Eli Zaretskii
@ 2014-05-22  3:12       ` Daniel Colascione
  2014-05-22  5:37         ` David Kastrup
  2014-05-22 15:49         ` Eli Zaretskii
  2014-05-22 14:59       ` Barry OReilly
  1 sibling, 2 replies; 40+ messages in thread
From: Daniel Colascione @ 2014-05-22  3:12 UTC (permalink / raw)
  To: Eli Zaretskii, Barry OReilly; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

On 05/21/2014 07:43 PM, Eli Zaretskii wrote:
>> Date: Wed, 21 May 2014 16:49:22 -0400
>> From: Barry OReilly <gundaetiapo@gmail.com>
>> Cc: emacs-devel@gnu.org
>>
>> Even if we're only talking about the stack variables in the frames that are
>> active during your particular problematic case (and perhaps in the idle
>> Emacs GC case)?
> 
> I thought you were asking about having the compiler generate the code
> to do that, which would then happen everywhere.
> 
> If you propose doing that selectively, I don't know how this would be
> possible, since on the C level you don't have a way of telling how
> much stack is allocated in a given function.
> 
>> Have you already ruled out whether stack_top_variable contributes one of
>> the bytes in your false positive lookup in the mem_node tree?
> 
> Yes.  I looked at all the local variables in that stack frame, and
> their addresses on the stack are different from the one that triggers
> the problem.

What about cleaning the stack (memset from the top to the high water
mark) every once in a while?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22  3:12       ` Daniel Colascione
@ 2014-05-22  5:37         ` David Kastrup
  2014-05-22 13:57           ` Stefan Monnier
  2014-05-22 15:49         ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: David Kastrup @ 2014-05-22  5:37 UTC (permalink / raw)
  To: emacs-devel

Daniel Colascione <dancol@dancol.org> writes:

> On 05/21/2014 07:43 PM, Eli Zaretskii wrote:
>>> Date: Wed, 21 May 2014 16:49:22 -0400
>>> From: Barry OReilly <gundaetiapo@gmail.com>
>>> Cc: emacs-devel@gnu.org
>>>
>>> Even if we're only talking about the stack variables in the frames that are
>>> active during your particular problematic case (and perhaps in the idle
>>> Emacs GC case)?
>> 
>> I thought you were asking about having the compiler generate the code
>> to do that, which would then happen everywhere.
>> 
>> If you propose doing that selectively, I don't know how this would be
>> possible, since on the C level you don't have a way of telling how
>> much stack is allocated in a given function.
>> 
>>> Have you already ruled out whether stack_top_variable contributes one of
>>> the bytes in your false positive lookup in the mem_node tree?
>> 
>> Yes.  I looked at all the local variables in that stack frame, and
>> their addresses on the stack are different from the one that triggers
>> the problem.
>
> What about cleaning the stack (memset from the top to the high water
> mark) every once in a while?

How about explicitly triggering garbage collection at a point of time
where the water mark is really low?  For the few remaining variables,
initializing them explicitly would then not be a high cost.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22  5:37         ` David Kastrup
@ 2014-05-22 13:57           ` Stefan Monnier
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2014-05-22 13:57 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

> How about explicitly triggering garbage collection at a point of time
> where the water mark is really low?

We already do that.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22  2:43     ` Eli Zaretskii
  2014-05-22  3:12       ` Daniel Colascione
@ 2014-05-22 14:59       ` Barry OReilly
  2014-05-22 17:03         ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Barry OReilly @ 2014-05-22 14:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

> Yes. I looked at all the local variables in that stack frame, and
> their addresses on the stack are different from the one that
> triggers the problem.

[I assume you mean "void* values on the stack" rather than "addresses
on the stack".]

So when you printed the value of a one byte variable like
stack_top_variable, you printed it with any alignment padding there
might be?

Or in case of GC_POINTER_ALIGNMENT < sizeof(void*), you accounted for
mark_stack's candidate void* coming partially from different stack
variables?

And you accounted for the compiler reordering stack variables, eg to
more optimally align data? I confirmed for example that
stack_top_variable and message_p are allocated next to each other on
the stack in my build, with the i variable not between them in memory.

[-- Attachment #2: Type: text/html, Size: 920 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-21 17:58                   ` Eli Zaretskii
@ 2014-05-22 15:20                     ` Eli Zaretskii
  2014-05-22 16:14                       ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-22 15:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, monnier, emacs-devel

> Date: Wed, 21 May 2014 20:58:19 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: fabrice.popineau@gmail.com, emacs-devel@gnu.org
> 
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Cc: emacs-devel@gnu.org,  fabrice.popineau@gmail.com
> > Date: Wed, 21 May 2014 13:40:21 -0400
> > 
> > > I already tried that before, and came up empty-handed.  I tried again
> > > now; the address of that value on the stack does not correspond to any
> > > local variable in the corresponding stack frame, and I also cannot
> > > find that address in the disassembly of the function whose stack frame
> > > includes the value.
> > 
> > It might simply be a slot that's unused by the current stack frame,
> > whose value comes from some stack frame that existed some time in
> > the past.
> 
> That's probably what it is, yes.

That's definitely what it is.  The value gets onto the stack when
loadup.el does this:

  (when (hash-table-p purify-flag)
    (let ((strings 0)
	  (vectors 0)
	  (bytecodes 0)
	  (conses 0)
	  (others 0))
      (maphash (lambda (k v)
		 (cond
		  ((stringp k) (setq strings (1+ strings)))
		  ((vectorp k) (setq vectors (1+ vectors)))
		  ((consp k)   (setq conses  (1+ conses)))
		  ((byte-code-function-p v) (setq bytecodes (1+ bytecodes)))
		  (t           (setq others  (1+ others)))))
	       purify-flag)
      (message "Pure-hashed: %d strings, %d vectors, %d conses, %d bytecodes, %d others"
	       strings vectors conses bytecodes others)))

The call to hash-table-p pushes the table address on the stack before
calling Fhash_table_p, and it remains there until the call to
mark_stack.

> > > Now, I have a question: mark_stack stops examining the stack when it
> > > gets to its own stack frame.  That is certainly safe, but it sounds
> > > too conservative: it should stop at the stack frame of
> > > Fgarbage_collect, I think, because no live Lisp object can appear
> > > while Fgarbage_collect runs, right?
> > 
> > Sounds right, yes.
> 
> I will try that and see if that helps.  Of course, if my reading of
> GDB data is correct, and the value was indeed in the
> Fgarbage_collect's stack frame, it must help.

It did help, at least in an unoptimized build.  The suggested patch is
below.  It just reshuffles the existing code: we now determine the
limit for searching the stack in Fgarbage_collect, and then call a
subroutine that does what Fgarbage_collect should actually do.  This
way, none of the variables local to Fgarbage_collect or its stack will
be searched by mark_stack.

Is the patchy below OK for the trunk?  Does anyone see anything
problematic with it?

--- src/alloc.c~	2014-05-21 18:04:29 +0300
+++ src/alloc.c	2014-05-22 18:18:32 +0300
@@ -4880,61 +4880,8 @@ dump_zombies (void)
    from the stack start.  */
 
 static void
-mark_stack (void)
+mark_stack (void *end)
 {
-  void *end;
-
-#ifdef HAVE___BUILTIN_UNWIND_INIT
-  /* Force callee-saved registers and register windows onto the stack.
-     This is the preferred method if available, obviating the need for
-     machine dependent methods.  */
-  __builtin_unwind_init ();
-  end = &end;
-#else /* not HAVE___BUILTIN_UNWIND_INIT */
-#ifndef GC_SAVE_REGISTERS_ON_STACK
-  /* jmp_buf may not be aligned enough on darwin-ppc64 */
-  union aligned_jmpbuf {
-    Lisp_Object o;
-    sys_jmp_buf j;
-  } j;
-  volatile bool stack_grows_down_p = (char *) &j > (char *) stack_base;
-#endif
-  /* This trick flushes the register windows so that all the state of
-     the process is contained in the stack.  */
-  /* Fixme: Code in the Boehm GC suggests flushing (with `flushrs') is
-     needed on ia64 too.  See mach_dep.c, where it also says inline
-     assembler doesn't work with relevant proprietary compilers.  */
-#ifdef __sparc__
-#if defined (__sparc64__) && defined (__FreeBSD__)
-  /* FreeBSD does not have a ta 3 handler.  */
-  asm ("flushw");
-#else
-  asm ("ta 3");
-#endif
-#endif
-
-  /* Save registers that we need to see on the stack.  We need to see
-     registers used to hold register variables and registers used to
-     pass parameters.  */
-#ifdef GC_SAVE_REGISTERS_ON_STACK
-  GC_SAVE_REGISTERS_ON_STACK (end);
-#else /* not GC_SAVE_REGISTERS_ON_STACK */
-
-#ifndef GC_SETJMP_WORKS  /* If it hasn't been checked yet that
-			    setjmp will definitely work, test it
-			    and print a message with the result
-			    of the test.  */
-  if (!setjmp_tested_p)
-    {
-      setjmp_tested_p = 1;
-      test_setjmp ();
-    }
-#endif /* GC_SETJMP_WORKS */
-
-  sys_setjmp (j.j);
-  end = stack_grows_down_p ? (char *) &j + sizeof j : (char *) &j;
-#endif /* not GC_SAVE_REGISTERS_ON_STACK */
-#endif /* not HAVE___BUILTIN_UNWIND_INIT */
 
   /* This assumes that the stack is a contiguous region in memory.  If
      that's not the case, something has to be done here to iterate
@@ -5542,22 +5489,14 @@ mark_pinned_symbols (void)
     }
 }
 
-DEFUN ("garbage-collect", Fgarbage_collect, Sgarbage_collect, 0, 0, "",
-       doc: /* Reclaim storage for Lisp objects no longer needed.
-Garbage collection happens automatically if you cons more than
-`gc-cons-threshold' bytes of Lisp data since previous garbage collection.
-`garbage-collect' normally returns a list with info on amount of space in use,
-where each entry has the form (NAME SIZE USED FREE), where:
-- NAME is a symbol describing the kind of objects this entry represents,
-- SIZE is the number of bytes used by each one,
-- USED is the number of those objects that were found live in the heap,
-- FREE is the number of those objects that are not live but that Emacs
-  keeps around for future allocations (maybe because it does not know how
-  to return them to the OS).
-However, if there was overflow in pure space, `garbage-collect'
-returns nil, because real GC can't be done.
-See Info node `(elisp)Garbage Collection'.  */)
-  (void)
+/* Subroutine of Fgarbage_collect that does most of the work.  It is a
+   separate function so that we could limit mark_stack in searching
+   the stack frames below this function, thus avoiding the rare cases
+   where mark_stack finds values that look like live Lisp objects on
+   portions of stack that couldn't possibly contain such live
+   objects.  */
+static Lisp_Object
+garbage_collect_1 (void *end)
 {
   struct buffer *nextb;
   char stack_top_variable;
@@ -5655,7 +5594,7 @@ See Info node `(elisp)Garbage Collection
 
 #if (GC_MARK_STACK == GC_MAKE_GCPROS_NOOPS \
      || GC_MARK_STACK == GC_MARK_STACK_CHECK_GCPROS)
-  mark_stack ();
+  mark_stack (end);
 #else
   {
     register struct gcpro *tail;
@@ -5678,7 +5617,7 @@ See Info node `(elisp)Garbage Collection
 #endif
 
 #if GC_MARK_STACK == GC_USE_GCPROS_CHECK_ZOMBIES
-  mark_stack ();
+  mark_stack (end);
 #endif
 
   /* Everything is now marked, except for the data in font caches
@@ -5838,6 +5777,82 @@ See Info node `(elisp)Garbage Collection
   return retval;
 }
 
+DEFUN ("garbage-collect", Fgarbage_collect, Sgarbage_collect, 0, 0, "",
+       doc: /* Reclaim storage for Lisp objects no longer needed.
+Garbage collection happens automatically if you cons more than
+`gc-cons-threshold' bytes of Lisp data since previous garbage collection.
+`garbage-collect' normally returns a list with info on amount of space in use,
+where each entry has the form (NAME SIZE USED FREE), where:
+- NAME is a symbol describing the kind of objects this entry represents,
+- SIZE is the number of bytes used by each one,
+- USED is the number of those objects that were found live in the heap,
+- FREE is the number of those objects that are not live but that Emacs
+  keeps around for future allocations (maybe because it does not know how
+  to return them to the OS).
+However, if there was overflow in pure space, `garbage-collect'
+returns nil, because real GC can't be done.
+See Info node `(elisp)Garbage Collection'.  */)
+  (void)
+{
+#if (GC_MARK_STACK == GC_MAKE_GCPROS_NOOPS		\
+     || GC_MARK_STACK == GC_MARK_STACK_CHECK_GCPROS	\
+     || GC_MARK_STACK == GC_USE_GCPROS_CHECK_ZOMBIES)
+  void *end;
+
+#ifdef HAVE___BUILTIN_UNWIND_INIT
+  /* Force callee-saved registers and register windows onto the stack.
+     This is the preferred method if available, obviating the need for
+     machine dependent methods.  */
+  __builtin_unwind_init ();
+  end = &end;
+#else /* not HAVE___BUILTIN_UNWIND_INIT */
+#ifndef GC_SAVE_REGISTERS_ON_STACK
+  /* jmp_buf may not be aligned enough on darwin-ppc64 */
+  union aligned_jmpbuf {
+    Lisp_Object o;
+    sys_jmp_buf j;
+  } j;
+  volatile bool stack_grows_down_p = (char *) &j > (char *) stack_base;
+#endif
+  /* This trick flushes the register windows so that all the state of
+     the process is contained in the stack.  */
+  /* Fixme: Code in the Boehm GC suggests flushing (with `flushrs') is
+     needed on ia64 too.  See mach_dep.c, where it also says inline
+     assembler doesn't work with relevant proprietary compilers.  */
+#ifdef __sparc__
+#if defined (__sparc64__) && defined (__FreeBSD__)
+  /* FreeBSD does not have a ta 3 handler.  */
+  asm ("flushw");
+#else
+  asm ("ta 3");
+#endif
+#endif
+
+  /* Save registers that we need to see on the stack.  We need to see
+     registers used to hold register variables and registers used to
+     pass parameters.  */
+#ifdef GC_SAVE_REGISTERS_ON_STACK
+  GC_SAVE_REGISTERS_ON_STACK (end);
+#else /* not GC_SAVE_REGISTERS_ON_STACK */
+
+#ifndef GC_SETJMP_WORKS  /* If it hasn't been checked yet that
+			    setjmp will definitely work, test it
+			    and print a message with the result
+			    of the test.  */
+  if (!setjmp_tested_p)
+    {
+      setjmp_tested_p = 1;
+      test_setjmp ();
+    }
+#endif /* GC_SETJMP_WORKS */
+
+  sys_setjmp (j.j);
+  end = stack_grows_down_p ? (char *) &j + sizeof j : (char *) &j;
+#endif /* not GC_SAVE_REGISTERS_ON_STACK */
+#endif /* not HAVE___BUILTIN_UNWIND_INIT */
+#endif /* GC_MARK_STACK */
+  return garbage_collect_1 (end);
+}
 
 /* Mark Lisp objects in glyph matrix MATRIX.  Currently the
    only interesting objects referenced from glyphs are strings.  */



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22  3:12       ` Daniel Colascione
  2014-05-22  5:37         ` David Kastrup
@ 2014-05-22 15:49         ` Eli Zaretskii
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-22 15:49 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: gundaetiapo, emacs-devel

> Date: Wed, 21 May 2014 20:12:45 -0700
> From: Daniel Colascione <dancol@dancol.org>
> CC: emacs-devel@gnu.org
> 
> What about cleaning the stack (memset from the top to the high water
> mark) every once in a while?

I believe this would be as tedious and expensive as clearing the stack
on entry to a function.  It also requires ugly OS-dependent
code/assembly.  Also, when would you exactly do that, except where we
call GC?

I think what I suggested a few minutes ago is better, and seems to
solve the problem at hand.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22 15:20                     ` Eli Zaretskii
@ 2014-05-22 16:14                       ` Stefan Monnier
  2014-05-24 12:03                         ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2014-05-22 16:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fabrice.popineau, emacs-devel

> Is the patchy below OK for the trunk?

Looks good to me.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22 14:59       ` Barry OReilly
@ 2014-05-22 17:03         ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-22 17:03 UTC (permalink / raw)
  To: Barry OReilly; +Cc: emacs-devel

> Date: Thu, 22 May 2014 10:59:00 -0400
> From: Barry OReilly <gundaetiapo@gmail.com>
> Cc: emacs-devel@gnu.org
> 
> > Yes. I looked at all the local variables in that stack frame, and
> > their addresses on the stack are different from the one that
> > triggers the problem.
> 
> [I assume you mean "void* values on the stack" rather than "addresses
> on the stack".]

No, I meant addresses on the stack.  Like this:

 (gdb) info locals
 foo = 0xbaadf00d
 bar = 191919191
 baz = 0 '\000'
 (gdb) p/x &foo
 $1 = 0x12345678
 (gdb) p/x &bar
 $2 = 0x23456789
 (gdb) p/x &baz
 $3 = 0x87654321

I compared these addresses with the value the 'pp' variable had in
mark_memory, here:

  for (pp = start; (void *) pp < end; pp++)
    for (i = 0; i < sizeof *pp; i += GC_POINTER_ALIGNMENT)
      {
	void *p = *(void **) ((char *) pp + i);
	mark_maybe_pointer (p);
	if (POINTERS_MIGHT_HIDE_IN_OBJECTS)
	  mark_maybe_object (XIL ((intptr_t) p));
      }

when the value of 'p' was the address of the hash-table struct that
was passed to mark_maybe_pointer.

> So when you printed the value of a one byte variable like
> stack_top_variable, you printed it with any alignment padding there
> might be?

I didn't print any values, just the addresses, see above.  That's
because I already knew the address of the stack slot where the
offending value was stored, so I didn't need to look for it.  That
address was the value of 'pp' above.

> And you accounted for the compiler reordering stack variables, eg to
> more optimally align data?

Yes, in a way: I looked at the disassembly of the offending function,
and reviewed every reference to a stack slot via $ebp and $esp.  Since
I knew the values of $ebp and $esp of that function when mark_stack
was called, and I also knew the address of the stack slot where the
offending value was stored, it was simple to calculate the offsets
from $ebp and $esp corresponding to that stack slot.  I looked for
those offsets in the disassembly, but they weren't there.

> I confirmed for example that stack_top_variable and message_p are
> allocated next to each other on the stack in my build, with the i
> variable not between them in memory.

Again, I checked all the locals in that function, and I also checked
all the references to the stack in the disassembly, thus accounting
for temporary values that have no C variables in the source.  I think
this covers all the possibilities, and isn't affected by how the
compiler allocates the variables on the stack.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-22 16:14                       ` Stefan Monnier
@ 2014-05-24 12:03                         ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2014-05-24 12:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: fabrice.popineau, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: fabrice.popineau@gmail.com,  emacs-devel@gnu.org
> Date: Thu, 22 May 2014 12:14:36 -0400
> 
> > Is the patchy below OK for the trunk?
> 
> Looks good to me.

Thanks, committed on the trunk.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-20 13:44 ` Stefan Monnier
  2014-05-20 16:57   ` Eli Zaretskii
@ 2014-05-31  6:31   ` Florian Weimer
  2014-05-31 14:24     ` Stefan Monnier
  1 sibling, 1 reply; 40+ messages in thread
From: Florian Weimer @ 2014-05-31  6:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Fabrice Popineau, emacs-devel

* Stefan Monnier:

> The Boehm's GC has developed ways to do this second option
> automatically: if during a GC, a memory cell is found to "point to"
> unallocated memory, then it is assumed to be of non-pointer type and
> this fact is recorded somewhere so that if in subsequent GC's this cell
> ends up "pointing" to allocated memory that won't be considered as an
> actual pointer.

I believe this is not a correct description of the mechanism.  What
happens is that the pointer *target* is blacklisted and not used for
allocation.  What you propose instead is not safe because it will
result in dangling pointers in some cases.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: GC and stack marking
  2014-05-31  6:31   ` Florian Weimer
@ 2014-05-31 14:24     ` Stefan Monnier
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2014-05-31 14:24 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Eli Zaretskii, Fabrice Popineau, emacs-devel

> I believe this is not a correct description of the mechanism.  What
> happens is that the pointer *target* is blacklisted and not used for
> allocation.

Oh right, sorry, and thanks for the correction,


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-05-31 14:24 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-19 16:31 GC and stack marking Eli Zaretskii
2014-05-19 18:47 ` Paul Eggert
2014-05-19 19:14   ` Eli Zaretskii
2014-05-19 19:58     ` Paul Eggert
2014-05-19 20:03       ` Eli Zaretskii
2014-05-19 20:17         ` Paul Eggert
2014-05-20 16:37           ` Eli Zaretskii
2014-05-20 13:44 ` Stefan Monnier
2014-05-20 16:57   ` Eli Zaretskii
2014-05-20 17:54     ` Stefan Monnier
2014-05-20 19:28       ` Eli Zaretskii
2014-05-20 22:01         ` Stefan Monnier
2014-05-21  2:48           ` Eli Zaretskii
2014-05-21  3:01             ` Stefan Monnier
2014-05-21 15:39               ` Eli Zaretskii
2014-05-21 15:57                 ` Dmitry Antipov
2014-05-21 16:06                   ` Dmitry Antipov
2014-05-21 16:55                     ` Eli Zaretskii
2014-05-21 16:53                   ` Eli Zaretskii
2014-05-21 17:40                 ` Stefan Monnier
2014-05-21 17:58                   ` Eli Zaretskii
2014-05-22 15:20                     ` Eli Zaretskii
2014-05-22 16:14                       ` Stefan Monnier
2014-05-24 12:03                         ` Eli Zaretskii
2014-05-20 19:12     ` Daniel Colascione
2014-05-20 19:43       ` Eli Zaretskii
2014-05-20 22:03         ` Stefan Monnier
2014-05-21  2:51           ` Eli Zaretskii
2014-05-31  6:31   ` Florian Weimer
2014-05-31 14:24     ` Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
2014-05-21 19:31 Barry OReilly
2014-05-21 20:13 ` Eli Zaretskii
2014-05-21 20:49   ` Barry OReilly
2014-05-22  2:43     ` Eli Zaretskii
2014-05-22  3:12       ` Daniel Colascione
2014-05-22  5:37         ` David Kastrup
2014-05-22 13:57           ` Stefan Monnier
2014-05-22 15:49         ` Eli Zaretskii
2014-05-22 14:59       ` Barry OReilly
2014-05-22 17:03         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).