From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Merging scratch/no-purespace to remove unexec and purespace
Date: Sun, 22 Dec 2024 11:12:30 +0000
Message-ID: <87jzbsgefi.fsf@protonmail.com>
References: <CADwFkmmX7pxzLCRU3aU=oZS1NRdVuxxU-JCHpO2=e3J-mRS55Q@mail.gmail.com>
 <87zfku6ra9.fsf@gmail.com> <jwvfrmmujgz.fsf-monnier+emacs@gnu.org>
 <87seql7a3o.fsf@gmail.com> <87o71553yf.fsf@gmail.com>
 <CADwFkm=9=Zr-KzC8mPFDkyJ8RpYEhNUFKK7TKvQHsGj+7FsiNA@mail.gmail.com>
Reply-To: Pip Cet <pipcet@protonmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="4063"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Helmut Eller <eller.helmut@gmail.com>,
 Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
To: Stefan Kangas <stefankangas@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Dec 22 13:04:48 2024
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1tPKhS-0000s7-Jz
	for ged-emacs-devel@m.gmane-mx.org; Sun, 22 Dec 2024 13:04:46 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces@gnu.org>)
	id 1tPKgw-0003pR-EN; Sun, 22 Dec 2024 07:04:14 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <pipcet@protonmail.com>)
 id 1tPJt4-0002cH-Ez
 for emacs-devel@gnu.org; Sun, 22 Dec 2024 06:12:42 -0500
Original-Received: from mail-4316.protonmail.ch ([185.70.43.16])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <pipcet@protonmail.com>)
 id 1tPJt1-0004Mw-KQ
 for emacs-devel@gnu.org; Sun, 22 Dec 2024 06:12:42 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
 s=protonmail3; t=1734865955; x=1735125155;
 bh=7rRBctKlfhvepJ8lSA3na++bCu+sGo/WT3Icr+7dRYo=;
 h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References:
 Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID:
 Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post;
 b=g92dhXqk21SRnq0pH/ebwSyJkqne7vDb5UCXppMRBRr4NX34326xpeSXZFuo/myN8
 A26OiK5mD9RNYLsZVYuWInllugaAcVlVTW4+vSnqw7nyYoE9DeQSL2lzKcBxYNz6u2
 EhlZ/qFIt/2PIDvTK3t+243QcLl/9RDZD2bov8tYCh8wpAD6b46y852gvCt28lE/2W
 coMq4KtTUIEpLPL0hyjqKltFHMyN0rLMyo8dL9r7kqdWO8bR0PoQwY8N2W1V7GO38W
 y4gocF57zCWZswwqPTUkIRD9rbvQxYBBG9Iyf16v+dVSGStjQVkA7hGTnOEFc5oCAj
 PFdZw0gAbLS5w==
In-Reply-To: <CADwFkm=9=Zr-KzC8mPFDkyJ8RpYEhNUFKK7TKvQHsGj+7FsiNA@mail.gmail.com>
Feedback-ID: 112775352:user:proton
X-Pm-Message-ID: 2ee41b8343e272a1ab34a0bd484bc199b8ea63bd
Received-SPF: pass client-ip=185.70.43.16; envelope-from=pipcet@protonmail.com;
 helo=mail-4316.protonmail.ch
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001,
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Mailman-Approved-At: Sun, 22 Dec 2024 07:03:43 -0500
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.devel:326848
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/326848>

"Stefan Kangas" <stefankangas@gmail.com> writes:

> Helmut Eller <eller.helmut@gmail.com> writes:
>
>> On Wed, Dec 18 2024, Helmut Eller wrote:
>>
>> [...]
>>> So the pdumper copies objects from purespace to the dump like normal
>>> objects; when loading the dump, purespace stays empty.
>>>
>>> I had (wrongly) assumed that the pdumper creates a separate section for
>>> pure objects.  Creating such a section sounds easy enough (hmm, maybe
>>> not so easy because of hashtables).  Still not sure if it would be wort=
h
>>> the effort.
>>
>> Out of curiosity, I implemented such a section with the attached patch.
>> It seems that it would save ~2ms per collection cycle.
>>
>> To measure this, I compared the output of
>>
>>  ./src/emacs -Q --batch --eval \
>>  '(let* ((stats (benchmark-run 1000 (garbage-collect))))
>>      (message "%s" (/ (elt stats 2) (elt stats 1))))'
>>
>> between the versions with and without the patch.  The results was:
>>
>>    without-pure-section: 0.006251480181
>>       with-pure-section: 0.003986384857
>
> This is interesting, thanks.  Would this experiment easily transfer to
> and be relevant to the MPS branch?

TL;DR: let's add check_writable to replace CHECK_IMPURE rather than
dropping the calls entirely.  It'd be a nop in the initial merge, but it
would facilitate experiments for regaining the advantages of purespace
without the ugliness.

If we want to perform an optimization such as this one, we should think
of ways to do so without pure space and all its restrictions.  I think I
have one.

My experiments suggest that the main performance impact is that when we
reach a pure object, we stop scanning, in effect treating it as
though it were already marked.  Checking and setting the mark bit of the
object doesn't affect the performance much, in my experiments.

My idea is this: we add an extra mark bit area to the pdumper file for
objects which we know to be "tenured": i.e. objects that we'll treat as
immortal, but for which we also know that all referenced objects will
also be "tenured", or static.

Such "tenured" objects aren't scanned during GC: they're already marked
when we start the GC cycle.

If we write to such an object, we clear the bit, and put it on a special
set to maintain its tenure (it'd be nicer to simply set another bit, but
non-MPS pdumper cannot do so).  This should happen rarely, but it's
better than the current CHECK_IMPURE thing.

Some housekeeping would be required to adjust the "tenured" set, but the
important point is that "tenured" is no longer a user-visible property:
correctness would be maintained even for dumb choices of the "tenured"
set.

The problems are:

1. choose a "tenured" set. I would hazard a guess that most objects in
the pdumper file are usually not actually written to, ever, so maybe we
could even get away with starting with a universally-filled "tenured"
set.  We could use, in essence, PGO to do so, by identifying call stacks
for objects that are modified and excluding them from the "tenured" set
based on the call stacks when the real Emacs is built.

2. keep the immortal set small.  In particular, even if an object was
written to, it's quite possible that it can be removed from the set
again, because the new references are also "tenured".  In extreme cases,
we could get away with clearing the entire "pure" set and continuing
without this optimization.  However, if we find an object cannot have
its tenured bit set anymore, we can recursively scan the remaining
"tenured" objects which reference this object and revoke their tenure,
and remove the problematic object from the immortal set.

In essence, all this would turn the user-visible "pure" status,
available only to explicitly-specified objects, into a user-invisible
"tenured" status, awarded automatically to some objects without
programmer interaction.  The benefits may well be greater because the
automatic detection of "tenured" objects might very easily be superior
to the current "pure" selection.  "Tenured" status would be available to
all pdumped objects, but for MPS builds it would be possible to expand
this so even objects created after loading the dump can become
"tenured".

(I realize "tenure" isn't a great term because a sensible implementation
would probably revoke tenure in some cases.  "GC-skippable" or
"skippable" might be better.)

The main implementation problem I see right now is that to correct our
tenured set, and for the immortal set, we'd need an efficient "set"
structure.  In particular, while ordinary GC would only need to use the
"immortal" set, analysis and fixing the "tenured" set would require
knowing about all "tenured" objects.  For the MPS build, this is
comparatively easy as we can simply set a bit and linear-scan the
pdumper area (all pdumped objects have igc headers).  For non-MPS, a
full hash table would probably be required, and it would have to be
dumped.

My important points are:

1. It's probably possible to do better performance-wise than Helmut's
experiment.
2. This potential approach would not require the reintroduction of a
user-visible "pure" status.
3. In particular, there's (almost) nothing to stop us from delaying
this experiment until after pure space is removed.
4. The exception to the "almost" is: We should probably introduce dummy
check_writable (Lisp_Object) calls rather than dropping CHECK_IMPURE ()
entirely.  We'll need them for experiments, even if the standard
implementation would be a nop.
5. While the details of the implementation would differ, this
optimization could apply to the MPS build, too, and even be generalized
(on no-purespace, we can't move an object into the pdumper are, but on
scratch/igc, we may be able to hack MPS to move an object into the
immovable "tenured" pool).

However, I realize that (1) is currently a sheer guess. I haven't
decided whether it's worth it to get an upper bound on the saved GC time
by implementing a universal "tenured" set and performing a GC right
after loading (which should be very fast, not marking any pdumped
objects).  The problem is that it's an upper bound: the effect will
become less significant as the immortal set would grow over time; at
some point, it's no longer worth it, and we could clear all "tenured"
bits and continue without this optimization.

While my experiments suggest it's not relevant, this would reduce some
potential benefits of Helmut's approach: locality wouldn't be as good,
we'd need to check a bit rather than doing a bounds check to identify
skippable objects, and of course there's the extra overhead from another
bits table and the set implementation.  If anyone tries this, please
also try whether increasing the pdumper alignment to 16 or further
improves performance; that would mean some wasted space, but it would
also reduce the sheer number of bits to check.

If we want dynamic adjustment of the skippable/"tenured" set, we'd also
need to duplicate a lot of the GC code to trace references for analysis
rather than setting their mark bits.  We might be able to avoid the code
duplication with a trick, but realistically we'd be looking at two code
paths: the "fast" GC path which performs a hardcoded action, and the
"slow" analysis path which merely collects references and returns them
to the caller.

Sorry this is a bit long.

Pip