Hi, I would not be surprised if you are correct in that cache locality has a greater impact than the branch mispredictions. I'm also not certain that this would have any effect on other builds than the Mac OS version, so I would be curious to hear if it does have the same benefit. In my personal setup with the change, the memory usage has not caused any issues, but I have not measured it that closely. I think this change would make sense as a configure flag. Since writing that blogpost, I did attempt a few variations of adding prefetch instructions in sweep_conses, but all of the variants I tried ended up having significantly worse performance characteristics than omitting them. That makes it a bit harder for me to believe that it's attributable to cache locality, but like you said, there are a number of other reasons that could be the actual cause. Tyler Dodge On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wrote: > Stefan Kangas writes: > > > In this blog post > > > > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream > > > > the author asserts that a one-line patch "reduces the total wall clock > > duration for sweep conses execution by approximately 50%", at least in > > one benchmark. There are some caveats; read the blog post for the > > full story. > > My guess is that the blog post overestimates the performance cost of > branch predictor misses, and underestimates the real effect of the > change, which is making sweep_conses walk an array more and a linked > list less. Which is also more cache friendly, but sweeping any kind of > array is intrinsically faster than doing the same to a linked list for > any number of other reasons. > > I don't know what the memory consumption impact of such a change would > be since I haven't tried it myself. >