unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* pdumper's performance
@ 2018-08-29 20:29 Stefan Monnier
  2018-08-29 22:10 ` Daniel Colascione
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2018-08-29 20:29 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

I was looking at the pdumper and one thing I was wondering is how to use
of all the <foo>_marked_p and set_<foo>_marked functions impacts GC (and
hence) runtime performance.

I mean the fact that they're functions is perfectly fine, but the fact
that they need to test pdumper_object_p might have a measurable impact,
since I believe these operations are performed very many times per GC.

Also I don't quite understand why this is needed: IIUC the markbits of
pdump'd objects are stored elsewhere, but I don't understand why that
needs to be the case.


        Stefan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-08-29 20:29 pdumper's performance Stefan Monnier
@ 2018-08-29 22:10 ` Daniel Colascione
  2018-08-30  2:14   ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Colascione @ 2018-08-29 22:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel

> I was looking at the pdumper and one thing I was wondering is how to use
> of all the <foo>_marked_p and set_<foo>_marked functions impacts GC (and
> hence) runtime performance.
>
> I mean the fact that they're functions is perfectly fine, but the fact
> that they need to test pdumper_object_p might have a measurable impact,
> since I believe these operations are performed very many times per GC.

The cache line that holds that the dump region bounds ends up being very
hot, so testing it isn't really that expensive.

You can see for yourself whether there's an impact. Compile an Emacs with
support for both pdumper and unexec, dump it with unexec, and compare its
GC performance to Emacs built without support for pdumper and also dumped
with unexec.

As I recall, the difference is minimal.

Besides, you get performance back at the end: the dump region's mark bits
are stored contiguously and we can clear them very quickly at the end of
GC.

> Also I don't quite understand why this is needed: IIUC the markbits of
> pdump'd objects are stored elsewhere, but I don't understand why that
> needs to be the case.

Because we don't store dumped objects in blocks and so the calculations of
the normal locations of their mark bits would be wrong.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-08-29 22:10 ` Daniel Colascione
@ 2018-08-30  2:14   ` Stefan Monnier
  2018-08-30  5:19     ` Daniel Colascione
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2018-08-30  2:14 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

Thanks Daniel for your prompt response.  I have some further questions, tho.

> You can see for yourself whether there's an impact. Compile an Emacs with
> support for both pdumper and unexec, dump it with unexec, and compare its
> GC performance to Emacs built without support for pdumper and also dumped
> with unexec.

I hoping to save myself the time ;-)
[ BTW, part of the reason for those questions is that I'm writing an
  article about the history of Elisp, and I'd like to understand how
  your code works so I can say something intelligent about it.
  Oh and there's not much time left before the deadline.
  Another part of course, is that I'd like to see this feature land
  on master.  ]

> As I recall, the difference is minimal.

Do you recall the tests you used and the ballpark of the difference?

>> Also I don't quite understand why this is needed: IIUC the markbits of
>> pdump'd objects are stored elsewhere, but I don't understand why that
>> needs to be the case.
> Because we don't store dumped objects in blocks and so the calculations of
> the normal locations of their mark bits would be wrong.

Hmm... OK that could explain it for conses and floats where we keep the
markbits separately from the objects in bitmaps alongside those blocs,
but you also have those <foo>_marked_p and set_<foo>_marked functions for
all other types of objects where the markbit is normally stored within
the object itself (i.e. it doesn't matter whether they're in blocks or
not).

Why did you choose to use a completely different layout for the objects
loaded from the dump?  I naively thought your code would take
cons_blocks, symbol_blocks, ... and write those blocks as-is so objects
keep the same layout, and things like mark_maybe_object don't need to be
changed at all.  I understand this would end up writing larger dumps
(since they would include some free objects), but I'd have expected it
would lead to simpler code and a smaller patch.

What am I missing?


        Stefan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-08-30  2:14   ` Stefan Monnier
@ 2018-08-30  5:19     ` Daniel Colascione
  2018-09-04 16:26       ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Colascione @ 2018-08-30  5:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel

> Thanks Daniel for your prompt response.  I have some further questions,
> tho.
>
>> You can see for yourself whether there's an impact. Compile an Emacs
>> with
>> support for both pdumper and unexec, dump it with unexec, and compare
>> its
>> GC performance to Emacs built without support for pdumper and also
>> dumped
>> with unexec.
>
> I hoping to save myself the time ;-)
> [ BTW, part of the reason for those questions is that I'm writing an
>   article about the history of Elisp, and I'd like to understand how
>   your code works so I can say something intelligent about it.
>   Oh and there's not much time left before the deadline.

Cool.

>   Another part of course, is that I'd like to see this feature land
>   on master.  ]

Me too. ;-)

>
>> As I recall, the difference is minimal.
>
> Do you recall the tests you used and the ballpark of the difference?

Exactly the above. IIRC, the difference amounted to a millisecond or two
on an emacs -Q startup plus an immediate (garbage-collect) --- but that's 
without the no-relocation optimization below.

>>> Also I don't quite understand why this is needed: IIUC the markbits of
>>> pdump'd objects are stored elsewhere, but I don't understand why that
>>> needs to be the case.
>> Because we don't store dumped objects in blocks and so the calculations
>> of
>> the normal locations of their mark bits would be wrong.
>
> Hmm... OK that could explain it for conses and floats where we keep the
> markbits separately from the objects in bitmaps alongside those blocs,
> but you also have those <foo>_marked_p and set_<foo>_marked functions for
> all other types of objects where the markbit is normally stored within
> the object itself (i.e. it doesn't matter whether they're in blocks or
> not).
>
> Why did you choose to use a completely different layout for the objects
> loaded from the dump?

The objects themselves have the same layout that they do in the normal
heap. (The layout of a cons cell is unchanged, for example.) Dumping
objects individually instead of in blocks both simplifies the
implementation and allows for a more compact dump, as you point out below.

> I naively thought your code would take
> cons_blocks, symbol_blocks, ... and write those blocks as-is so objects
> keep the same layout, and things like mark_maybe_object don't need to be
> changed at all.  I understand this would end up writing larger dumps
> (since they would include some free objects), but I'd have expected it
> would lead to simpler code and a smaller patch.

If we keep the mark bits out of the objects, we can avoid modifying the
object pages just for GC. In the non-PIC case, in which in principle we
don't have to relocate the dump, that means that the pages in the dump
stay clean and file-backed, not dirty, COWed, and pagefile-backed as they
would if we banged on them just for the GC. That's an efficiency win.

For a future more-efficient GC, contiguous object storage with external
mark bits is probably the way to go for the entire heap.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-08-30  5:19     ` Daniel Colascione
@ 2018-09-04 16:26       ` Stefan Monnier
  2018-09-04 16:42         ` Daniel Colascione
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2018-09-04 16:26 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

> Dumping objects individually instead of in blocks both simplifies the
> implementation and allows for a more compact dump, as you point
> out below.

I understand the compactness argument, but I'm surprised it makes the
implementation simpler.  I'll trust you on that, tho.

>> I naively thought your code would take
>> cons_blocks, symbol_blocks, ... and write those blocks as-is so objects
>> keep the same layout, and things like mark_maybe_object don't need to be
>> changed at all.  I understand this would end up writing larger dumps
>> (since they would include some free objects), but I'd have expected it
>> would lead to simpler code and a smaller patch.
>
> If we keep the mark bits out of the objects, we can avoid modifying the
> object pages just for GC. In the non-PIC case, in which in principle we
> don't have to relocate the dump, that means that the pages in the dump
> stay clean and file-backed, not dirty, COWed, and pagefile-backed as they
> would if we banged on them just for the GC. That's an efficiency win.

I kind of see the benefit, but:
- unlike the purespace, the dumped&restored heap is not read-only,
  so even with the markbits elsewhere the pages will generally not all
  stay clean.
- the benefit you mention only applies to the case where
  no-relocation was needed.  And I suspect that in practice keeping those
  pages "clean and file-backed" will very rarely make a noticeable
  difference anyway.
- I like the idea of keeping markbits separately (XEmacs does the same,
  IIUC, for the needs of their incremental GC where they want to mark
  pages as read-only to catch writes from the mutator, while still
  letting the collector twiddle the mark bits).  But I think this should
  be a separate patch and should apply to all objects rather than only
  the pdump'd ones.

> For a future more-efficient GC, contiguous object storage with external
> mark bits is probably the way to go for the entire heap.

Agreed, obviously [ not only because of the text above, but also because
I implemented the separate markbits for floats and cons cells ;-)  ]


        Stefan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 16:26       ` Stefan Monnier
@ 2018-09-04 16:42         ` Daniel Colascione
  2018-09-04 19:30           ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Colascione @ 2018-09-04 16:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel

>   letting the collector twiddle the mark bits).  But I think this should
>   be a separate patch and should apply to all objects rather than only
>   the pdump'd ones.

I'm not interested in replicating the cons block structure in the dump
image, especially when separating markbits is the right thing to do
anyway.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 16:42         ` Daniel Colascione
@ 2018-09-04 19:30           ` Stefan Monnier
  2018-09-04 19:35             ` Daniel Colascione
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2018-09-04 19:30 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

>>   letting the collector twiddle the mark bits).  But I think this should
>>   be a separate patch and should apply to all objects rather than only
>>   the pdump'd ones.
> I'm not interested in replicating the cons block structure in the dump
> image, especially when separating markbits is the right thing to do
> anyway.

I'm more concerned about the non-cons non-float objects.


        Stefan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 19:30           ` Stefan Monnier
@ 2018-09-04 19:35             ` Daniel Colascione
  2018-09-04 20:58               ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Colascione @ 2018-09-04 19:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel

>>>   letting the collector twiddle the mark bits).  But I think this
>>> should
>>>   be a separate patch and should apply to all objects rather than only
>>>   the pdump'd ones.
>> I'm not interested in replicating the cons block structure in the dump
>> image, especially when separating markbits is the right thing to do
>> anyway.
>
> I'm more concerned about the non-cons non-float objects.

Try it and see before deciding that there's a performance problem.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 19:35             ` Daniel Colascione
@ 2018-09-04 20:58               ` Stefan Monnier
  2018-09-04 21:20                 ` Daniel Colascione
  2018-09-04 21:33                 ` Stefan Monnier
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Monnier @ 2018-09-04 20:58 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

> Try it and see before deciding that there's a performance problem.

Hm... there's a misunderstanding: the issue is not one of performance
problem, but of separation of concerns.  Moving the markbits elsewhere
should be done in a separate patch (before or after the pdumper patch).


        Stefan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 20:58               ` Stefan Monnier
@ 2018-09-04 21:20                 ` Daniel Colascione
  2018-09-04 21:49                   ` Stefan Monnier
  2018-09-04 21:33                 ` Stefan Monnier
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Colascione @ 2018-09-04 21:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel

>> Try it and see before deciding that there's a performance problem.
>
> Hm... there's a misunderstanding: the issue is not one of performance
> problem, but of separation of concerns.  Moving the markbits elsewhere
> should be done in a separate patch (before or after the pdumper patch).

There is no separation-of-concerns problem. The mark bit thing isn't some
unrelated bonus feature of pdumper. It's how the dump image *works*. To
make pdumper work differently, and worse, merely to create some kind of
fictitious separation between its component parts strikes me as a bad idea
and unnecessary work.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 20:58               ` Stefan Monnier
  2018-09-04 21:20                 ` Daniel Colascione
@ 2018-09-04 21:33                 ` Stefan Monnier
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2018-09-04 21:33 UTC (permalink / raw)
  To: emacs-devel

>> Try it and see before deciding that there's a performance problem.
> Hm... there's a misunderstanding: the issue is not one of performance
> problem, but of separation of concerns.  Moving the markbits elsewhere
> should be done in a separate patch (before or after the pdumper patch).

I now realize that indeed, the subject says "performance" (and that I'm
the one who put this word there), but I actually haven't looked at the
resulting performance and since you said there was no perceptible
difference, I assume it's a non-issue.


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pdumper's performance
  2018-09-04 21:20                 ` Daniel Colascione
@ 2018-09-04 21:49                   ` Stefan Monnier
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2018-09-04 21:49 UTC (permalink / raw)
  To: emacs-devel

>> Hm... there's a misunderstanding: the issue is not one of performance
>> problem, but of separation of concerns.  Moving the markbits elsewhere
>> should be done in a separate patch (before or after the pdumper patch).
> There is no separation-of-concerns problem. The mark bit thing isn't some
> unrelated bonus feature of pdumper.

Then I don't understand yet: could you explain why pdumper needs to put
the markbits elsewhere for pdump'd objects than for non-pdump'd objects?

Since the layout of the markbits was designed without thinking of the
pdumper case, I can imagine that maybe the current placement of markbits is
inconvenient for the pdumper case (tho I don't actually which part is
inconvenient).

But surely there should be a way to layout the markbits such that it's
convenient both for pdump'd and the non-pdump'd objects.


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-04 21:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-29 20:29 pdumper's performance Stefan Monnier
2018-08-29 22:10 ` Daniel Colascione
2018-08-30  2:14   ` Stefan Monnier
2018-08-30  5:19     ` Daniel Colascione
2018-09-04 16:26       ` Stefan Monnier
2018-09-04 16:42         ` Daniel Colascione
2018-09-04 19:30           ` Stefan Monnier
2018-09-04 19:35             ` Daniel Colascione
2018-09-04 20:58               ` Stefan Monnier
2018-09-04 21:20                 ` Daniel Colascione
2018-09-04 21:49                   ` Stefan Monnier
2018-09-04 21:33                 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).