Hi,

I would not be surprised if you are correct in that cache locality has a greater
impact than the branch mispredictions. I'm also not certain that this 
would have any effect on other builds than the Mac OS version, so I 
would be curious to hear if it does have the same benefit. In my personal setup
with the change, the memory usage has not caused any issues, but I have not 
measured it that closely. I think this change would make sense as a configure
flag.

Since writing that blogpost, I did attempt a few variations of adding prefetch 
instructions in sweep_conses, but all of the variants I tried ended up having 
significantly worse performance characteristics than omitting them. That makes
it a bit harder for me to believe that it's attributable to cache locality, but like
you said, there are a number of other reasons that could be the actual cause.

Tyler Dodge

On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wrote:
> Stefan Kangas <stefankangas@gmail.com> writes:
> 
> > In this blog post
> >
> > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream
> >
> > the author asserts that a one-line patch "reduces the total wall clock
> > duration for sweep conses execution by approximately 50%", at least in
> > one benchmark.  There are some caveats; read the blog post for the
> > full story.
> 
> My guess is that the blog post overestimates the performance cost of
> branch predictor misses, and underestimates the real effect of the
> change, which is making sweep_conses walk an array more and a linked
> list less.  Which is also more cache friendly, but sweeping any kind of
> array is intrinsically faster than doing the same to a linked list for
> any number of other reasons.
> 
> I don't know what the memory consumption impact of such a change would
> be since I haven't tried it myself.
>