From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Emacs development discussions." Newsgroups: gmane.emacs.devel Subject: Re: Merging scratch/no-purespace to remove unexec and purespace Date: Sun, 22 Dec 2024 11:12:30 +0000 Message-ID: <87jzbsgefi.fsf@protonmail.com> References: <87zfku6ra9.fsf@gmail.com> <87seql7a3o.fsf@gmail.com> <87o71553yf.fsf@gmail.com> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4063"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Helmut Eller , Stefan Monnier , emacs-devel@gnu.org To: Stefan Kangas Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Dec 22 13:04:48 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tPKhS-0000s7-Jz for ged-emacs-devel@m.gmane-mx.org; Sun, 22 Dec 2024 13:04:46 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tPKgw-0003pR-EN; Sun, 22 Dec 2024 07:04:14 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tPJt4-0002cH-Ez for emacs-devel@gnu.org; Sun, 22 Dec 2024 06:12:42 -0500 Original-Received: from mail-4316.protonmail.ch ([185.70.43.16]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tPJt1-0004Mw-KQ for emacs-devel@gnu.org; Sun, 22 Dec 2024 06:12:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1734865955; x=1735125155; bh=7rRBctKlfhvepJ8lSA3na++bCu+sGo/WT3Icr+7dRYo=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=g92dhXqk21SRnq0pH/ebwSyJkqne7vDb5UCXppMRBRr4NX34326xpeSXZFuo/myN8 A26OiK5mD9RNYLsZVYuWInllugaAcVlVTW4+vSnqw7nyYoE9DeQSL2lzKcBxYNz6u2 EhlZ/qFIt/2PIDvTK3t+243QcLl/9RDZD2bov8tYCh8wpAD6b46y852gvCt28lE/2W coMq4KtTUIEpLPL0hyjqKltFHMyN0rLMyo8dL9r7kqdWO8bR0PoQwY8N2W1V7GO38W y4gocF57zCWZswwqPTUkIRD9rbvQxYBBG9Iyf16v+dVSGStjQVkA7hGTnOEFc5oCAj PFdZw0gAbLS5w== In-Reply-To: Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 2ee41b8343e272a1ab34a0bd484bc199b8ea63bd Received-SPF: pass client-ip=185.70.43.16; envelope-from=pipcet@protonmail.com; helo=mail-4316.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sun, 22 Dec 2024 07:03:43 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:326848 Archived-At: "Stefan Kangas" writes: > Helmut Eller writes: > >> On Wed, Dec 18 2024, Helmut Eller wrote: >> >> [...] >>> So the pdumper copies objects from purespace to the dump like normal >>> objects; when loading the dump, purespace stays empty. >>> >>> I had (wrongly) assumed that the pdumper creates a separate section for >>> pure objects. Creating such a section sounds easy enough (hmm, maybe >>> not so easy because of hashtables). Still not sure if it would be wort= h >>> the effort. >> >> Out of curiosity, I implemented such a section with the attached patch. >> It seems that it would save ~2ms per collection cycle. >> >> To measure this, I compared the output of >> >> ./src/emacs -Q --batch --eval \ >> '(let* ((stats (benchmark-run 1000 (garbage-collect)))) >> (message "%s" (/ (elt stats 2) (elt stats 1))))' >> >> between the versions with and without the patch. The results was: >> >> without-pure-section: 0.006251480181 >> with-pure-section: 0.003986384857 > > This is interesting, thanks. Would this experiment easily transfer to > and be relevant to the MPS branch? TL;DR: let's add check_writable to replace CHECK_IMPURE rather than dropping the calls entirely. It'd be a nop in the initial merge, but it would facilitate experiments for regaining the advantages of purespace without the ugliness. If we want to perform an optimization such as this one, we should think of ways to do so without pure space and all its restrictions. I think I have one. My experiments suggest that the main performance impact is that when we reach a pure object, we stop scanning, in effect treating it as though it were already marked. Checking and setting the mark bit of the object doesn't affect the performance much, in my experiments. My idea is this: we add an extra mark bit area to the pdumper file for objects which we know to be "tenured": i.e. objects that we'll treat as immortal, but for which we also know that all referenced objects will also be "tenured", or static. Such "tenured" objects aren't scanned during GC: they're already marked when we start the GC cycle. If we write to such an object, we clear the bit, and put it on a special set to maintain its tenure (it'd be nicer to simply set another bit, but non-MPS pdumper cannot do so). This should happen rarely, but it's better than the current CHECK_IMPURE thing. Some housekeeping would be required to adjust the "tenured" set, but the important point is that "tenured" is no longer a user-visible property: correctness would be maintained even for dumb choices of the "tenured" set. The problems are: 1. choose a "tenured" set. I would hazard a guess that most objects in the pdumper file are usually not actually written to, ever, so maybe we could even get away with starting with a universally-filled "tenured" set. We could use, in essence, PGO to do so, by identifying call stacks for objects that are modified and excluding them from the "tenured" set based on the call stacks when the real Emacs is built. 2. keep the immortal set small. In particular, even if an object was written to, it's quite possible that it can be removed from the set again, because the new references are also "tenured". In extreme cases, we could get away with clearing the entire "pure" set and continuing without this optimization. However, if we find an object cannot have its tenured bit set anymore, we can recursively scan the remaining "tenured" objects which reference this object and revoke their tenure, and remove the problematic object from the immortal set. In essence, all this would turn the user-visible "pure" status, available only to explicitly-specified objects, into a user-invisible "tenured" status, awarded automatically to some objects without programmer interaction. The benefits may well be greater because the automatic detection of "tenured" objects might very easily be superior to the current "pure" selection. "Tenured" status would be available to all pdumped objects, but for MPS builds it would be possible to expand this so even objects created after loading the dump can become "tenured". (I realize "tenure" isn't a great term because a sensible implementation would probably revoke tenure in some cases. "GC-skippable" or "skippable" might be better.) The main implementation problem I see right now is that to correct our tenured set, and for the immortal set, we'd need an efficient "set" structure. In particular, while ordinary GC would only need to use the "immortal" set, analysis and fixing the "tenured" set would require knowing about all "tenured" objects. For the MPS build, this is comparatively easy as we can simply set a bit and linear-scan the pdumper area (all pdumped objects have igc headers). For non-MPS, a full hash table would probably be required, and it would have to be dumped. My important points are: 1. It's probably possible to do better performance-wise than Helmut's experiment. 2. This potential approach would not require the reintroduction of a user-visible "pure" status. 3. In particular, there's (almost) nothing to stop us from delaying this experiment until after pure space is removed. 4. The exception to the "almost" is: We should probably introduce dummy check_writable (Lisp_Object) calls rather than dropping CHECK_IMPURE () entirely. We'll need them for experiments, even if the standard implementation would be a nop. 5. While the details of the implementation would differ, this optimization could apply to the MPS build, too, and even be generalized (on no-purespace, we can't move an object into the pdumper are, but on scratch/igc, we may be able to hack MPS to move an object into the immovable "tenured" pool). However, I realize that (1) is currently a sheer guess. I haven't decided whether it's worth it to get an upper bound on the saved GC time by implementing a universal "tenured" set and performing a GC right after loading (which should be very fast, not marking any pdumped objects). The problem is that it's an upper bound: the effect will become less significant as the immortal set would grow over time; at some point, it's no longer worth it, and we could clear all "tenured" bits and continue without this optimization. While my experiments suggest it's not relevant, this would reduce some potential benefits of Helmut's approach: locality wouldn't be as good, we'd need to check a bit rather than doing a bounds check to identify skippable objects, and of course there's the extra overhead from another bits table and the set implementation. If anyone tries this, please also try whether increasing the pdumper alignment to 16 or further improves performance; that would mean some wasted space, but it would also reduce the sheer number of bits to check. If we want dynamic adjustment of the skippable/"tenured" set, we'd also need to duplicate a lot of the GC code to trace references for analysis rather than setting their mark bits. We might be able to avoid the code duplication with a trick, but realistically we'd be looking at two code paths: the "fast" GC path which performs a hardcoded action, and the "slow" analysis path which merely collects references and returns them to the caller. Sorry this is a bit long. Pip