From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "Tyler Dodge" Newsgroups: gmane.emacs.devel Subject: Re: "Significant Garbage Collection Improvement For Emacs" - sweep_conses performance improved by 50%? Date: Fri, 28 Oct 2022 23:07:56 -0700 Message-ID: <42ba5e8a-8a26-4afd-ab59-efbb967e8a24@app.fastmail.com> References: <871qqr425n.fsf@yahoo.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=70dd0aa9b7a1421c8c67dfd26b523c07 Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5397"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Cyrus-JMAP/3.7.0-alpha0-1087-g968661d8e1-fm-20221021.001-g968661d8 Cc: "Emacs developers" To: "Po Lu" , "Stefan Kangas" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Oct 29 09:11:27 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oog04-0001A8-DQ for ged-emacs-devel@m.gmane-mx.org; Sat, 29 Oct 2022 09:11:26 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oofz8-0005LV-FB; Sat, 29 Oct 2022 03:10:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oof1H-0001Jm-Uo for emacs-devel@gnu.org; Sat, 29 Oct 2022 02:08:39 -0400 Original-Received: from wout4-smtp.messagingengine.com ([64.147.123.20]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oof1A-0005Um-K9 for emacs-devel@gnu.org; Sat, 29 Oct 2022 02:08:35 -0400 Original-Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id A71AC32003C0; Sat, 29 Oct 2022 02:08:23 -0400 (EDT) Original-Received: from imap44 ([10.202.2.94]) by compute2.internal (MEProxy); Sat, 29 Oct 2022 02:08:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= tdodge.consulting; h=cc:cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1667023703; x= 1667110103; bh=vtZXXhQo0dHM+3zQMwHuXG+UGj2Q2FJrTp/uzpd8a5o=; b=m Wglccb/Q4NIZv5cteJz0JHKrszNjdyWKcfaTZ3pcaZHp5idUe09gyzacBOLujwKS vF4FGPvR8m3YpqMQBeM5pEk+k0zY9BuysdBMoVPNB8RNNWLU9NKp+WxfqcRsvHzx DRiG/MKmiaFxbTNZZZRzukn/wjtPIzaLsyzIXS6rNe1otX8f7M1jAgoeHrl0yHuq T0qqRk8dwXRz6LqZkq7ZHOW/tmi1zAtKq4irg7NpU5xWB49fnjSCYIx2X9Pr5DQY Dgcm1ZDpbBT/FX8i1Hipb4ulKscYf0sPtOPHxu/pvhgDyGplqk5fqMFz43wpy/fU ur9hke674MyK+8f++mn5w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1667023703; x=1667110103; bh=vtZXXhQo0dHM+3zQMwHuXG+UGj2Q 2FJrTp/uzpd8a5o=; b=pK7fv2ILHF+mJY6Usi6j9KLIrJRC+KOW0QJaV1a4PTE1 PaThaoEh3iaK4yj4gWFY7CL5IkZw9YeZQZaJb8eEj3fEqjxUWvK0B7K1DM9FDO/Z LZe6fuMtojfbNqvSkEjeeDUZchQuZhLW18PVQWdC75ZxIQ0j+25p9itY4mDTupes mlsvY62cCYs+B74y+SPcIeHt2FNVZ2ep/+TgkV4XLMfTsPEo2m8vB6xjslvybgMD MvjFXqZ3zsnGwUYCciA01H0xAezNoYWHXsTiTNdAETO13eg2N+k+WiZnvzjUDlYG +TPNhfAwb3+RVvpKLdJrrOW1yGgSwyiDgb4yZgbZGw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrtdejgddutdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsegrtderreerredtnecuhfhrohhmpedfvfih lhgvrhcuffhoughgvgdfuceothihlhgvrhesthguohgughgvrdgtohhnshhulhhtihhngh eqnecuggftrfgrthhtvghrnhepffetfeekffehveevleffheejlefffefhgedtieehvdek gfdtudfhleeiteejueefnecuffhomhgrihhnpehtughoughgvgdrtghonhhsuhhlthhinh hgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepthih lhgvrhesthguohgughgvrdgtohhnshhulhhtihhngh X-ME-Proxy: Feedback-ID: icda14427:Fastmail Original-Received: by mailuser.nyi.internal (Postfix, from userid 501) id 032D336A0073; Sat, 29 Oct 2022 02:08:22 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface In-Reply-To: <871qqr425n.fsf@yahoo.com> Received-SPF: pass client-ip=64.147.123.20; envelope-from=tyler@tdodge.consulting; helo=wout4-smtp.messagingengine.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sat, 29 Oct 2022 03:02:21 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "Emacs-devel" Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:298728 Archived-At: --70dd0aa9b7a1421c8c67dfd26b523c07 Content-Type: text/plain Hi, I would not be surprised if you are correct in that cache locality has a greater impact than the branch mispredictions. I'm also not certain that this would have any effect on other builds than the Mac OS version, so I would be curious to hear if it does have the same benefit. In my personal setup with the change, the memory usage has not caused any issues, but I have not measured it that closely. I think this change would make sense as a configure flag. Since writing that blogpost, I did attempt a few variations of adding prefetch instructions in sweep_conses, but all of the variants I tried ended up having significantly worse performance characteristics than omitting them. That makes it a bit harder for me to believe that it's attributable to cache locality, but like you said, there are a number of other reasons that could be the actual cause. Tyler Dodge On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wrote: > Stefan Kangas writes: > > > In this blog post > > > > https://tdodge.consulting/blog/living-the-emacs-garbage-collection-dream > > > > the author asserts that a one-line patch "reduces the total wall clock > > duration for sweep conses execution by approximately 50%", at least in > > one benchmark. There are some caveats; read the blog post for the > > full story. > > My guess is that the blog post overestimates the performance cost of > branch predictor misses, and underestimates the real effect of the > change, which is making sweep_conses walk an array more and a linked > list less. Which is also more cache friendly, but sweeping any kind of > array is intrinsically faster than doing the same to a linked list for > any number of other reasons. > > I don't know what the memory consumption impact of such a change would > be since I haven't tried it myself. > --70dd0aa9b7a1421c8c67dfd26b523c07 Content-Type: text/html Content-Transfer-Encoding: quoted-printable
Hi,

I would not be surprised if you are correct in that ca= che locality has a greater
impact than the branch mispredi= ctions. I'm also not certain that this
would have any eff= ect on other builds than the Mac OS version, so I
would b= e curious to hear if it does have the same benefit. In my personal setup=
with the change, the memory usage has not caused any issu= es, but I have not
measured it that closely. I think this= change would make sense as a configure
flag.
Since writing that blogpost, I did attempt a few variations= of adding prefetch
instructions in sweep_conses, but all= of the variants I tried ended up having
significantly wo= rse performance characteristics than omitting them. That makes
=
it a bit harder for me to believe that it's attributable to cache l= ocality, but like
you said, there are a number of other re= asons that could be the actual cause.

Tyler Dod= ge

On Fri, Oct 28, 2022, at 10:41 PM, Po Lu wro= te:
Stefan = Kangas <stefankangas@gmail.= com> writes:

> In this blog post<= br>
>
>
> the author asserts that a one-line patch "red= uces the total wall clock
> duration for sweep conses e= xecution by approximately 50%", at least in
> one bench= mark.  There are some caveats; read the blog post for the
=
> full story.

My guess is that the = blog post overestimates the performance cost of
branch pre= dictor misses, and underestimates the real effect of the
c= hange, which is making sweep_conses walk an array more and a linked
<= /div>
list less.  Which is also more cache friendly, but sweepi= ng any kind of
array is intrinsically faster than doing th= e same to a linked list for
any number of other reasons.

I don't know what the memory consumption imp= act of such a change would
be since I haven't tried it mys= elf.


--70dd0aa9b7a1421c8c67dfd26b523c07--