* "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? @ 2022-10-29 5:07 Stefan Kangas 2022-10-29 5:19 ` Christopher Dimech 2022-10-29 5:41 ` Po Lu 0 siblings, 2 replies; 9+ messages in thread From: Stefan Kangas @ 2022-10-29 5:07 UTC (permalink / raw) To: Emacs developers; +Cc: tyler In this blog post https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream the author asserts that a one-line patch "reduces the total wall clock duration for sweep conses execution by approximately 50%", at least in one benchmark. There are some caveats; read the blog post for the full story. The patch can be found here: https://github.com/tyler-dodge/emacs/commit/36d2a8d5a4f741ae99540e139fff2621bbacfbaa Is this a change we could/should realistically make in Emacs? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2022-10-29 5:07 "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Stefan Kangas @ 2022-10-29 5:19 ` Christopher Dimech 2022-10-29 5:41 ` Po Lu 1 sibling, 0 replies; 9+ messages in thread From: Christopher Dimech @ 2022-10-29 5:19 UTC (permalink / raw) To: Stefan Kangas; +Cc: Emacs developers, tyler > Sent: Saturday, October 29, 2022 at 5:07 PM > From: "Stefan Kangas" <stefankangas@gmail.com> > To: "Emacs developers" <emacs-devel@gnu.org> > Cc: tyler@tdodge.consulting > Subject: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? > > In this blog post > > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream > > the author asserts that a one-line patch "reduces the total wall clock > duration for sweep conses execution by approximately 50%", at least in > one benchmark. There are some caveats; read the blog post for the > full story. > > The patch can be found here: > > https://github.com/tyler-dodge/emacs/commit/36d2a8d5a4f741ae99540e139fff2621bbacfbaa > > Is this a change we could/should realistically make in Emacs? I give my approval for it. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2022-10-29 5:07 "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Stefan Kangas 2022-10-29 5:19 ` Christopher Dimech @ 2022-10-29 5:41 ` Po Lu 2022-10-29 6:07 ` Tyler Dodge 1 sibling, 1 reply; 9+ messages in thread From: Po Lu @ 2022-10-29 5:41 UTC (permalink / raw) To: Stefan Kangas; +Cc: Emacs developers, tyler Stefan Kangas <stefankangas@gmail.com> writes: > In this blog post > > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream > > the author asserts that a one-line patch "reduces the total wall clock > duration for sweep conses execution by approximately 50%", at least in > one benchmark. There are some caveats; read the blog post for the > full story. My guess is that the blog post overestimates the performance cost of branch predictor misses, and underestimates the real effect of the change, which is making sweep_conses walk an array more and a linked list less. Which is also more cache friendly, but sweeping any kind of array is intrinsically faster than doing the same to a linked list for any number of other reasons. I don't know what the memory consumption impact of such a change would be since I haven't tried it myself. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2022-10-29 5:41 ` Po Lu @ 2022-10-29 6:07 ` Tyler Dodge 2022-10-29 14:32 ` Stefan Monnier 2023-02-11 20:10 ` Ihor Radchenko 0 siblings, 2 replies; 9+ messages in thread From: Tyler Dodge @ 2022-10-29 6:07 UTC (permalink / raw) To: Po Lu, Stefan Kangas; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] Hi, I would not be surprised if you are correct in that cache locality has a greater impact than the branch mispredictions. I'm also not certain that this would have any effect on other builds than the Mac OS version, so I would be curious to hear if it does have the same benefit. In my personal setup with the change, the memory usage has not caused any issues, but I have not measured it that closely. I think this change would make sense as a configure flag. Since writing that blogpost, I did attempt a few variations of adding prefetch instructions in sweep_conses, but all of the variants I tried ended up having significantly worse performance characteristics than omitting them. That makes it a bit harder for me to believe that it's attributable to cache locality, but like you said, there are a number of other reasons that could be the actual cause. Tyler Dodge On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wrote: > Stefan Kangas <stefankangas@gmail.com> writes: > > > In this blog post > > > > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream > > > > the author asserts that a one-line patch "reduces the total wall clock > > duration for sweep conses execution by approximately 50%", at least in > > one benchmark. There are some caveats; read the blog post for the > > full story. > > My guess is that the blog post overestimates the performance cost of > branch predictor misses, and underestimates the real effect of the > change, which is making sweep_conses walk an array more and a linked > list less. Which is also more cache friendly, but sweeping any kind of > array is intrinsically faster than doing the same to a linked list for > any number of other reasons. > > I don't know what the memory consumption impact of such a change would > be since I haven't tried it myself. > [-- Attachment #2: Type: text/html, Size: 2723 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2022-10-29 6:07 ` Tyler Dodge @ 2022-10-29 14:32 ` Stefan Monnier 2022-10-30 0:53 ` Po Lu 2023-02-11 20:10 ` Ihor Radchenko 1 sibling, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2022-10-29 14:32 UTC (permalink / raw) To: Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers Hi Tyler, > I would not be surprised if you are correct in that cache locality has a greater > impact than the branch mispredictions. I'm also not certain that this > would have any effect on other builds than the Mac OS version, so I > would be curious to hear if it does have the same benefit. If the issue is linked to locality or to branch mispredictions, than it should depend on the processor more than the OS. > In my personal setup with the change, the memory usage has not caused > any issues, but I have not measured it that closely. I think this > change would make sense as a configure flag. Basically you're replacing 1kB blocks with 32kB blocks, which indeed seems very reasonable. Back when I introduced those blocks (and their associated bitmaps) I chose 1kB because I wanted something small enough that it couldn't be worse than what we had before, not because it was the best choice. Actually 512B was the smallest meaningful choice, FWIW, because the 8B alignment constraints means that the bitmap eats up at least 8B and an 8B bitmap can accommodate 64 objects, multiplied by the 8B alignment gives 512B. Based on your graph, there doesn't seem to be much need to increase BLOCK_ALIGN past 1<<15, but yes, it would be interesting to find out what's causing the crash when going over that limit. Stefan PS: I liked your hack using a background process to increase the PTY buffer size. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2022-10-29 14:32 ` Stefan Monnier @ 2022-10-30 0:53 ` Po Lu 0 siblings, 0 replies; 9+ messages in thread From: Po Lu @ 2022-10-30 0:53 UTC (permalink / raw) To: Stefan Monnier; +Cc: Tyler Dodge, Stefan Kangas, Emacs developers Stefan Monnier <monnier@iro.umontreal.ca> writes: > Basically you're replacing 1kB blocks with 32kB blocks, which indeed > seems very reasonable. Back when I introduced those blocks (and their > associated bitmaps) I chose 1kB because I wanted something small enough > that it couldn't be worse than what we had before, not because it was > the best choice. It may be worth nothing that I chose a cons and float block size of 4 times the page size while working with the incremental GC, which ends up 16k on most systems. However, there are also two pairs of markbits in each block. I haven't seen anything bad happen from that choice yet. But there, BLOCK_ALIGN is gone in the first place, and it relies on valloc returning suitably aligned memory. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2022-10-29 6:07 ` Tyler Dodge 2022-10-29 14:32 ` Stefan Monnier @ 2023-02-11 20:10 ` Ihor Radchenko 2023-02-12 21:58 ` Konstantin Kharlamov 1 sibling, 1 reply; 9+ messages in thread From: Ihor Radchenko @ 2023-02-11 20:10 UTC (permalink / raw) To: Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers "Tyler Dodge" <tyler@tdodge.consulting> writes: > Since writing that blogpost, I did attempt a few variations of adding prefetch > instructions in sweep_conses, but all of the variants I tried ended up having > significantly worse performance characteristics than omitting them. That makes > it a bit harder for me to believe that it's attributable to cache locality, but like > you said, there are a number of other reasons that could be the actual cause. May I know if there have been any progress on this? Is there any help/testing needed? GC is really making a big impact on Emacs performance. I believe that many people would appreciate even few percent overall speed-ups if they can be achieved as easily as described in the post. Even if only on certain builds. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2023-02-11 20:10 ` Ihor Radchenko @ 2023-02-12 21:58 ` Konstantin Kharlamov 2023-02-13 20:07 ` Konstantin Kharlamov 0 siblings, 1 reply; 9+ messages in thread From: Konstantin Kharlamov @ 2023-02-12 21:58 UTC (permalink / raw) To: Ihor Radchenko, Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers On Sat, 2023-02-11 at 20:10 +0000, Ihor Radchenko wrote: > "Tyler Dodge" <tyler@tdodge.consulting> writes: > > > Since writing that blogpost, I did attempt a few variations of adding > > prefetch > > instructions in sweep_conses, but all of the variants I tried ended up > > having > > significantly worse performance characteristics than omitting them. That > > makes > > it a bit harder for me to believe that it's attributable to cache locality, > > but like > > you said, there are a number of other reasons that could be the actual > > cause. > > May I know if there have been any progress on this? > Is there any help/testing needed? > > GC is really making a big impact on Emacs performance. I believe that > many people would appreciate even few percent overall speed-ups if they > can be achieved as easily as described in the post. Even if only on > certain builds. I did some testing of the change on Archlinux, on Intel i5-7200U. First of, I run the (test-garbage-collect-large-cons-allocation) from the post, but I made a small modification to it. As it stands the function takes too much time and memory for me to wait it out, so I reduced the value `105910837` to `10591083`. Running the benchmark three times before and after the change I see reduction of `Garbage Collection` time by ≈25%. Then the question that came up in the thread: by how much memory consumption would increase? I tested `emacs -Q` similarly 3 times before/after, and each time I measured Emacs PSS. I saw no statistically significant difference. Interestingly, while repeating previous test with my packages and config loaded (which includes running malloc_trim()), I see inconsistently lower amount of PSS with BLOCK_ALIGN=15. At first it was significant, but upon further tests I see it varies. Either way, it does seem to be a bit lower that with the default BLOCK_ALIGN=10. So in memory consumption there is either no difference or even a slight win. So, yeah, nice change. No idea why it's been 3 months and nobody picked it up 🤷♂️ Tho now that I looked at it, I may as well create a patch from it and send for review. Unless there's opposition, I'm planning on doing so tomorrow. Thank you Ihor for pinging (I didn't see this thread) and Stefan for the initial posting of the article here! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? 2023-02-12 21:58 ` Konstantin Kharlamov @ 2023-02-13 20:07 ` Konstantin Kharlamov 0 siblings, 0 replies; 9+ messages in thread From: Konstantin Kharlamov @ 2023-02-13 20:07 UTC (permalink / raw) To: Ihor Radchenko, Tyler Dodge; +Cc: Po Lu, Stefan Kangas, Emacs developers On Mon, 2023-02-13 at 00:58 +0300, Konstantin Kharlamov wrote: > On Sat, 2023-02-11 at 20:10 +0000, Ihor Radchenko wrote: > So, yeah, nice change. No idea why it's been 3 months and nobody picked it up > 🤷♂️ Tho now that I looked at it, I may as well create a patch from it and > send > for review. Unless there's opposition, I'm planning on doing so tomorrow. > > Thank you Ihor for pinging (I didn't see this thread) and Stefan for the > initial > posting of the article here! So, a day later, as promised, patch sent! Yay^^ https://debbugs.gnu.org/cgi/bugreport.cgi?bug=61490 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-02-13 20:07 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-29 5:07 "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Stefan Kangas 2022-10-29 5:19 ` Christopher Dimech 2022-10-29 5:41 ` Po Lu 2022-10-29 6:07 ` Tyler Dodge 2022-10-29 14:32 ` Stefan Monnier 2022-10-30 0:53 ` Po Lu 2023-02-11 20:10 ` Ihor Radchenko 2023-02-12 21:58 ` Konstantin Kharlamov 2023-02-13 20:07 ` Konstantin Kharlamov
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).