Hi,

I would not be surprised if you are correct in that cache locality has a greater
impact than the branch mispredictions. I'm also not certain that this
would have any effect on other builds than the Mac OS version, so I
would be curious to hear if it does have the same benefit. In my personal setup
with the change, the memory usage has not caused any issues, but I have not
measured it that closely. I think this change would make sense as a configure
flag.

Since writing that blogpost, I did attempt a few variations of adding prefetch
instructions in sweep_conses, but all of the variants I tried ended up having
significantly worse performance characteristics than omitting them. That makes
it a bit harder for me to believe that it's attributable to cache locality, but like
you said, there are a number of other reasons that could be the actual cause.

Tyler Dodge

On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wrote:
Stefan Kangas <stefankangas@gmail.com> writes:

> In this blog post
>
https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream
>
> the author asserts that a one-line patch "reduces the total wall clock
> duration for sweep conses execution by approximately 50%", at least in
> one benchmark.  There are some caveats; read the blog post for the
> full story.

My guess is that the blog post overestimates the performance cost of
branch predictor misses, and underestimates the real effect of the
change, which is making sweep_conses walk an array more and a linked
list less.  Which is also more cache friendly, but sweeping any kind of
array is intrinsically faster than doing the same to a linked list for
any number of other reasons.

I don't know what the memory consumption impact of such a change would
be since I haven't tried it myself.