From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Emacs development discussions." Newsgroups: gmane.emacs.devel Subject: Re: Merging scratch/no-purespace to remove unexec and purespace Date: Sun, 22 Dec 2024 13:13:50 +0000 Message-ID: <87a5cnhndr.fsf@protonmail.com> References: <87zfku6ra9.fsf@gmail.com> <87seql7a3o.fsf@gmail.com> <87o71553yf.fsf@gmail.com> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1205"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Helmut Eller , Stefan Monnier , emacs-devel@gnu.org To: Stefan Kangas Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Dec 22 17:23:18 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tPOjd-00007H-49 for ged-emacs-devel@m.gmane-mx.org; Sun, 22 Dec 2024 17:23:17 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tPOiu-0007td-WD; Sun, 22 Dec 2024 11:22:33 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tPLmS-0001Tu-4z for emacs-devel@gnu.org; Sun, 22 Dec 2024 08:14:00 -0500 Original-Received: from mail-40133.protonmail.ch ([185.70.40.133]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tPLmP-0003to-Ry for emacs-devel@gnu.org; Sun, 22 Dec 2024 08:13:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1734873234; x=1735132434; bh=rUVRWNAN0I+Km76k1nd8yUDeqWlmRz6YPRdpozwsYfg=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=nbDHF5eKlUSO/7HpKlhbAgckJUhSLacdZKvLC6A6ESeOsiD/eCcA/qfLxKiRCez9g Q/yx4iRys+HG4gIUeFd5J+NpXIR6a1m1UQAWYlioMplH4n5x+0jpoLyZVXyErBVtLY FI6M7jgWpfyvauk5UxjWZn/NITyICuLEu3XW1y8CXqh8QJNcI1mvVeHqGbZ5JOB7hY 6AieRMRfBegFM472ygwmq99d3sdc6nMUO1lADg2DcysS02s+47YNHEuLSvJIK5NZDv r9vSVzsD9X9VWKVL5N4NMs+PhgDZrJ5tNMbOitI/seI1teNJkhXQhJp9iO++yedJaa J5X/L5lJ8757A== In-Reply-To: Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 96c77b9cb212c989e6a5deb211efc00de3f8207c Received-SPF: pass client-ip=185.70.40.133; envelope-from=pipcet@protonmail.com; helo=mail-40133.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sun, 22 Dec 2024 11:22:31 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:326861 Archived-At: Pip Cet writes: > However, I realize that (1) is currently a sheer guess. I haven't > decided whether it's worth it to get an upper bound on the saved GC time > by implementing a universal "tenured" set and performing a GC right > after loading (which should be very fast, not marking any pdumped > objects). I did. This got long again. That's because I wanted to be really sure that merging no-purespace isn't going to prevent worthwhile optimizations in the future, and I am now. Feel free to skip the rest :-) My initial results are that simply "tenuring" the char tables in the pdump seems to have such a drastic effect that it's hard to perform a fair measurement: process_mark_stack is called (in emacs -Q, no --batch) 21384 times if we "tenure" the char tables, and 135345 times if we don't. (This suggests that char tables may be worth optimizing for the "old" GC: simply keep a set of GC-relevant values in the char table, and scan that rather than scanning the entire char table. However, we can't do that with MPS, so I'm not overly interested in it. Also, I doubt the optimization decisions required for char tables would be made the same way if they were reimplemented today, so it may be more productive to start over from scratch, with a particular focus on reducing the time needed for GC rather than ordinary performance) Also, we need to add a few check_writable calls to avoid segfaults. I should have expected that, I guess. The good news is that few pdumped objects (256 once a non-batched Emacs is started) actually appear to be written to, so it's not entirely hopeless to identify those in one run and mark them non-tenured in the real Emacs. IOW, my tentative conclusion is that it's possible to perform such optimizations after pure space is dropped, and there's no reason to delay the merge. Optimizing based on a *hint* that an object probably won't be mutated is a potential way forward. Optimizing based on a hard promise that an object won't be mutated, as the old purespace code does, not so much. Even the old purespace code, with the years of development it's seen, ended up losing the optimization and causing preventable segfaults for valid-looking Elisp code. I must confess I'm fundamentally opposed to having objects come in a "read-only" and a "read-write" flavor. Either they should always be immutable, such as bignums and floats are now, or we should go to the trouble of supporting the rare cases in which an object hinted or guessed to be read-only turned out not to be. (This is independent of the question of whether the characters in a string can be changed or not.) It's very hard even to define what constitutes mutation of an object and what doesn't. Setting a symbol's global value is clearly a mutation in the current code, but what if we keep those global values in a hash table instead, and the struct Lisp_Symbol is never written to? Does lexically (or dynamically) binding a symbol mean the entire symbol is no longer read-only? If we ever implement hash-collision workarounds by randomizing hash seeds, would re-seeding count as a mutation of the hash table? What about (aset v 0 (aref v 0))? Hash table resizing? Removing dead keys from Weak hash tables? Pinning a string to use it in a byte code object? Wouldn't it make sense to protect hash table (or obarray) keys from mutation if that may result in irretrievable entries? Most of these questions have two good answers, one which aids in optimization, and one which Lisp programmers would expect. They're often different. To get back to the no-purespace branch, I think we should consider reintroducing check_writable () calls (which would currently be no-ops on the master branch) after the merge, if we can agree on precisely when this macro should be called and how. The old locations of CHECK_IMPURE can serve as a hint, but no more, so let's drop CHECK_IMPURE first and start with a clean slate there. Pip