unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
@ 2022-10-29  5:07 Stefan Kangas
  2022-10-29  5:19 ` Christopher Dimech
  2022-10-29  5:41 ` Po Lu
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Kangas @ 2022-10-29  5:07 UTC (permalink / raw)
  To: Emacs developers; +Cc: tyler

In this blog post

https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream

the author asserts that a one-line patch "reduces the total wall clock
duration for sweep conses execution by approximately 50%", at least in
one benchmark.  There are some caveats; read the blog post for the
full story.

The patch can be found here:

https://github.com/tyler-dodge/emacs/commit/36d2a8d5a4f741ae99540e139fff2621bbacfbaa

Is this a change we could/should realistically make in Emacs?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2022-10-29  5:07 "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Stefan Kangas
@ 2022-10-29  5:19 ` Christopher Dimech
  2022-10-29  5:41 ` Po Lu
  1 sibling, 0 replies; 9+ messages in thread
From: Christopher Dimech @ 2022-10-29  5:19 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Emacs developers, tyler


> Sent: Saturday, October 29, 2022 at 5:07 PM
> From: "Stefan Kangas" <stefankangas@gmail.com>
> To: "Emacs developers" <emacs-devel@gnu.org>
> Cc: tyler@tdodge.consulting
> Subject: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
>
> In this blog post
> 
> https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream
> 
> the author asserts that a one-line patch "reduces the total wall clock
> duration for sweep conses execution by approximately 50%", at least in
> one benchmark.  There are some caveats; read the blog post for the
> full story.
> 
> The patch can be found here:
> 
> https://github.com/tyler-dodge/emacs/commit/36d2a8d5a4f741ae99540e139fff2621bbacfbaa
> 
> Is this a change we could/should realistically make in Emacs?
 
I give my approval for it. 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2022-10-29  5:07 "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Stefan Kangas
  2022-10-29  5:19 ` Christopher Dimech
@ 2022-10-29  5:41 ` Po Lu
  2022-10-29  6:07   ` Tyler Dodge
  1 sibling, 1 reply; 9+ messages in thread
From: Po Lu @ 2022-10-29  5:41 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Emacs developers, tyler

Stefan Kangas <stefankangas@gmail.com> writes:

> In this blog post
>
> https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream
>
> the author asserts that a one-line patch "reduces the total wall clock
> duration for sweep conses execution by approximately 50%", at least in
> one benchmark.  There are some caveats; read the blog post for the
> full story.

My guess is that the blog post overestimates the performance cost of
branch predictor misses, and underestimates the real effect of the
change, which is making sweep_conses walk an array more and a linked
list less.  Which is also more cache friendly, but sweeping any kind of
array is intrinsically faster than doing the same to a linked list for
any number of other reasons.

I don't know what the memory consumption impact of such a change would
be since I haven't tried it myself.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2022-10-29  5:41 ` Po Lu
@ 2022-10-29  6:07   ` Tyler Dodge
  2022-10-29 14:32     ` Stefan Monnier
  2023-02-11 20:10     ` Ihor Radchenko
  0 siblings, 2 replies; 9+ messages in thread
From: Tyler Dodge @ 2022-10-29  6:07 UTC (permalink / raw)
  To: Po Lu, Stefan Kangas; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]

Hi,

I would not be surprised if you are correct in that cache locality has a greater
impact than the branch mispredictions. I'm also not certain that this 
would have any effect on other builds than the Mac OS version, so I 
would be curious to hear if it does have the same benefit. In my personal setup
with the change, the memory usage has not caused any issues, but I have not 
measured it that closely. I think this change would make sense as a configure
flag.

Since writing that blogpost, I did attempt a few variations of adding prefetch 
instructions in sweep_conses, but all of the variants I tried ended up having 
significantly worse performance characteristics than omitting them. That makes
it a bit harder for me to believe that it's attributable to cache locality, but like
you said, there are a number of other reasons that could be the actual cause.

Tyler Dodge

On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wrote:
> Stefan Kangas <stefankangas@gmail.com> writes:
> 
> > In this blog post
> >
> > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream
> >
> > the author asserts that a one-line patch "reduces the total wall clock
> > duration for sweep conses execution by approximately 50%", at least in
> > one benchmark.  There are some caveats; read the blog post for the
> > full story.
> 
> My guess is that the blog post overestimates the performance cost of
> branch predictor misses, and underestimates the real effect of the
> change, which is making sweep_conses walk an array more and a linked
> list less.  Which is also more cache friendly, but sweeping any kind of
> array is intrinsically faster than doing the same to a linked list for
> any number of other reasons.
> 
> I don't know what the memory consumption impact of such a change would
> be since I haven't tried it myself.
> 

[-- Attachment #2: Type: text/html, Size: 2723 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2022-10-29  6:07   ` Tyler Dodge
@ 2022-10-29 14:32     ` Stefan Monnier
  2022-10-30  0:53       ` Po Lu
  2023-02-11 20:10     ` Ihor Radchenko
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2022-10-29 14:32 UTC (permalink / raw)
  To: Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers

Hi Tyler,

> I would not be surprised if you are correct in that cache locality has a greater
> impact than the branch mispredictions. I'm also not certain that this 
> would have any effect on other builds than the Mac OS version, so I 
> would be curious to hear if it does have the same benefit.

If the issue is linked to locality or to branch mispredictions, than it
should depend on the processor more than the OS.

> In my personal setup with the change, the memory usage has not caused
> any issues, but I have not  measured it that closely. I think this
> change would make sense as a configure flag.

Basically you're replacing 1kB blocks with 32kB blocks, which indeed
seems very reasonable.  Back when I introduced those blocks (and their
associated bitmaps) I chose 1kB because I wanted something small enough
that it couldn't be worse than what we had before, not because it was
the best choice.

Actually 512B was the smallest meaningful choice, FWIW, because the 8B
alignment constraints means that the bitmap eats up at least 8B and
an 8B bitmap can accommodate 64 objects, multiplied by the 8B
alignment gives 512B.

Based on your graph, there doesn't seem to be much need to increase
BLOCK_ALIGN past 1<<15, but yes, it would be interesting to find out
what's causing the crash when going over that limit.


        Stefan


PS: I liked your hack using a background process to increase the PTY
    buffer size.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2022-10-29 14:32     ` Stefan Monnier
@ 2022-10-30  0:53       ` Po Lu
  0 siblings, 0 replies; 9+ messages in thread
From: Po Lu @ 2022-10-30  0:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Tyler Dodge, Stefan Kangas, Emacs developers

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Basically you're replacing 1kB blocks with 32kB blocks, which indeed
> seems very reasonable.  Back when I introduced those blocks (and their
> associated bitmaps) I chose 1kB because I wanted something small enough
> that it couldn't be worse than what we had before, not because it was
> the best choice.

It may be worth nothing that I chose a cons and float block size of 4
times the page size while working with the incremental GC, which ends up
16k on most systems.  However, there are also two pairs of markbits in
each block.

I haven't seen anything bad happen from that choice yet.  But there,
BLOCK_ALIGN is gone in the first place, and it relies on valloc
returning suitably aligned memory.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2022-10-29  6:07   ` Tyler Dodge
  2022-10-29 14:32     ` Stefan Monnier
@ 2023-02-11 20:10     ` Ihor Radchenko
  2023-02-12 21:58       ` Konstantin Kharlamov
  1 sibling, 1 reply; 9+ messages in thread
From: Ihor Radchenko @ 2023-02-11 20:10 UTC (permalink / raw)
  To: Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers

"Tyler Dodge" <tyler@tdodge.consulting> writes:

> Since writing that blogpost, I did attempt a few variations of adding prefetch 
> instructions in sweep_conses, but all of the variants I tried ended up having 
> significantly worse performance characteristics than omitting them. That makes
> it a bit harder for me to believe that it's attributable to cache locality, but like
> you said, there are a number of other reasons that could be the actual cause.

May I know if there have been any progress on this?
Is there any help/testing needed?

GC is really making a big impact on Emacs performance. I believe that
many people would appreciate even few percent overall speed-ups if they
can be achieved as easily as described in the post. Even if only on
certain builds.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2023-02-11 20:10     ` Ihor Radchenko
@ 2023-02-12 21:58       ` Konstantin Kharlamov
  2023-02-13 20:07         ` Konstantin Kharlamov
  0 siblings, 1 reply; 9+ messages in thread
From: Konstantin Kharlamov @ 2023-02-12 21:58 UTC (permalink / raw)
  To: Ihor Radchenko, Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers

On Sat, 2023-02-11 at 20:10 +0000, Ihor Radchenko wrote:
> "Tyler Dodge" <tyler@tdodge.consulting> writes:
>
> > Since writing that blogpost, I did attempt a few variations of adding
> > prefetch
> > instructions in sweep_conses, but all of the variants I tried ended up
> > having
> > significantly worse performance characteristics than omitting them. That
> > makes
> > it a bit harder for me to believe that it's attributable to cache locality,
> > but like
> > you said, there are a number of other reasons that could be the actual
> > cause.
>
> May I know if there have been any progress on this?
> Is there any help/testing needed?
>
> GC is really making a big impact on Emacs performance. I believe that
> many people would appreciate even few percent overall speed-ups if they
> can be achieved as easily as described in the post. Even if only on
> certain builds.

I did some testing of the change on Archlinux, on Intel i5-7200U.

First of, I run the (test-garbage-collect-large-cons-allocation) from the post,
but I made a small modification to it. As it stands the function takes too much
time and memory for me to wait it out, so I reduced the value `105910837` to
`10591083`.

Running the benchmark three times before and after the change I see reduction of
`Garbage Collection` time by ≈25%.

Then the question that came up in the thread: by how much memory consumption
would increase? I tested `emacs -Q` similarly 3 times before/after, and each
time I measured Emacs PSS. I saw no statistically significant difference.

Interestingly, while repeating previous test with my packages and config loaded
(which includes running malloc_trim()), I see inconsistently lower amount of PSS
with BLOCK_ALIGN=15. At first it was significant, but upon further tests I see
it varies. Either way, it does seem to be a bit lower that with the default
BLOCK_ALIGN=10. So in memory consumption there is either no difference or even a
slight win.

So, yeah, nice change. No idea why it's been 3 months and nobody picked it up
🤷‍♂️ Tho now that I looked at it, I may as well create a patch from it and send
for review. Unless there's opposition, I'm planning on doing so tomorrow.

Thank you Ihor for pinging (I didn't see this thread) and Stefan for the initial
posting of the article here!




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%?
  2023-02-12 21:58       ` Konstantin Kharlamov
@ 2023-02-13 20:07         ` Konstantin Kharlamov
  0 siblings, 0 replies; 9+ messages in thread
From: Konstantin Kharlamov @ 2023-02-13 20:07 UTC (permalink / raw)
  To: Ihor Radchenko, Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers

On Mon, 2023-02-13 at 00:58 +0300, Konstantin Kharlamov wrote:
> On Sat, 2023-02-11 at 20:10 +0000, Ihor Radchenko wrote:
> So, yeah, nice change. No idea why it's been 3 months and nobody picked it up
> 🤷‍♂️ Tho now that I looked at it, I may as well create a patch from it and
> send
> for review. Unless there's opposition, I'm planning on doing so tomorrow.
> 
> Thank you Ihor for pinging (I didn't see this thread) and Stefan for the
> initial
> posting of the article here!

So, a day later, as promised, patch sent! Yay^^

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=61490



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-02-13 20:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-29  5:07 "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Stefan Kangas
2022-10-29  5:19 ` Christopher Dimech
2022-10-29  5:41 ` Po Lu
2022-10-29  6:07   ` Tyler Dodge
2022-10-29 14:32     ` Stefan Monnier
2022-10-30  0:53       ` Po Lu
2023-02-11 20:10     ` Ihor Radchenko
2023-02-12 21:58       ` Konstantin Kharlamov
2023-02-13 20:07         ` Konstantin Kharlamov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).