unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Merging scratch/no-purespace to remove unexec and purespace
@ 2024-12-17 10:47 Stefan Kangas
  2024-12-17 13:12 ` Gerd Möllmann
                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Stefan Kangas @ 2024-12-17 10:47 UTC (permalink / raw)
  To: emacs-devel; +Cc: Pip Cet, Stefan Monnier

In August, we decided to remove the unexec build along with purespace.
The scratch/no-purespace branch removes both, and has been rebased on
top of a recent master.

We, the maintainers, believe that the scratch/no-purespace branch is
now ready to merge, and would appreciate any final feedback, testing,
and code reviews.  Specifically, the branch has been primarily tested
on GNU/Linux and macOS, so testing on other systems would be greatly
appreciated.

Unless we uncover any serious blocking issues, or significant or
unexpected objections from the community, we plan to merge the
branch on February 1, 2025.

During our last discussion, we identified some issues with using
pdumper on some legacy proprietary systems: MS-DOS, Windows 98, and
Solaris 10 Zone.  As we have explained previously, patches are welcome
for these issues, but we do not currently consider them as blockers
for this merge.

Thanks!

Stefan Kangas, on behalf of the Emacs maintainers



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 10:47 Merging scratch/no-purespace to remove unexec and purespace Stefan Kangas
@ 2024-12-17 13:12 ` Gerd Möllmann
  2024-12-17 14:20   ` Gerd Möllmann
  2024-12-17 19:30 ` Helmut Eller
  2024-12-18  0:50 ` Po Lu
  2 siblings, 1 reply; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-17 13:12 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, Pip Cet, Stefan Monnier

Stefan Kangas <stefankangas@gmail.com> writes:

> In August, we decided to remove the unexec build along with purespace.
> The scratch/no-purespace branch removes both, and has been rebased on
> top of a recent master.
>
> We, the maintainers, believe that the scratch/no-purespace branch is
> now ready to merge, and would appreciate any final feedback, testing,
> and code reviews.  Specifically, the branch has been primarily tested
> on GNU/Linux and macOS, so testing on other systems would be greatly
> appreciated.
>
> Unless we uncover any serious blocking issues, or significant or
> unexpected objections from the community, we plan to merge the
> branch on February 1, 2025.
>
> During our last discussion, we identified some issues with using
> pdumper on some legacy proprietary systems: MS-DOS, Windows 98, and
> Solaris 10 Zone.  As we have explained previously, patches are welcome
> for these issues, but we do not currently consider them as blockers
> for this merge.
>
> Thanks!
>
> Stefan Kangas, on behalf of the Emacs maintainers

Building on macOS with --enable-checking=all --with-native-compilation
gives assertion violations. It builds without the --enable-checking.

gmake -C ../lisp compile-first EMACS="../src/bootstrap-emacs"
  ELC+ELN  emacs-lisp/comp-common.elc
  ELC+ELN  emacs-lisp/comp-run.elc
  ELC+ELN  emacs-lisp/radix-tree.elc
  ELC+ELN  emacs-lisp/loaddefs-gen.elc
  ELC+ELN  emacs-lisp/macroexp.elc
  ELC+ELN  emacs-lisp/cconv.elc
  ELC+ELN  emacs-lisp/comp-cstr.elc
  ELC+ELN  emacs-lisp/bytecomp.elc
  ELC+ELN  emacs-lisp/byte-opt.elc
  ELC+ELN  emacs-lisp/comp.elc
gmake -C ../lisp autoloads EMACS="../src/bootstrap-emacs"
  ELC+ELN  ../lisp/cus-face.elc

comp.c:5322: Emacs fatal error: assertion failed: check_comp_unit_relocs (comp_u)
gmake[3]: *** [Makefile:285: ../lisp/cus-face.elc] Abort trap: 6
  ELC+ELN  international/titdic-cnv.elc

comp.c:5322: Emacs fatal error: assertion failed: check_comp_unit_relocs (comp_u)
Fatal error 6: Aborted
gmake[3]: *** [Makefile:330: international/titdic-cnv.elc] Abort trap: 6
gmake[2]: *** [Makefile:901: ../lisp/cus-face.elc] Error 2
gmake[2]: *** Waiting for unfinished jobs....
gmake[2]: *** [Makefile:961: ../lisp/loaddefs.el] Error 2
  ELC+ELN  ../lisp/abbrev.elc

comp.c:5322: Emacs fatal error: assertion failed: check_comp_unit_relocs (comp_u)
gmake[3]: *** [Makefile:285: ../lisp/abbrev.elc] Abort trap: 6
  ELC+ELN  ../lisp/button.elc




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 13:12 ` Gerd Möllmann
@ 2024-12-17 14:20   ` Gerd Möllmann
  2024-12-17 14:30     ` Gerd Möllmann
  0 siblings, 1 reply; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-17 14:20 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, Pip Cet, Stefan Monnier

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> In August, we decided to remove the unexec build along with purespace.
>> The scratch/no-purespace branch removes both, and has been rebased on
>> top of a recent master.
>>
>> We, the maintainers, believe that the scratch/no-purespace branch is
>> now ready to merge, and would appreciate any final feedback, testing,
>> and code reviews.  Specifically, the branch has been primarily tested
>> on GNU/Linux and macOS, so testing on other systems would be greatly
>> appreciated.
>>
>> Unless we uncover any serious blocking issues, or significant or
>> unexpected objections from the community, we plan to merge the
>> branch on February 1, 2025.
>>
>> During our last discussion, we identified some issues with using
>> pdumper on some legacy proprietary systems: MS-DOS, Windows 98, and
>> Solaris 10 Zone.  As we have explained previously, patches are welcome
>> for these issues, but we do not currently consider them as blockers
>> for this merge.
>>
>> Thanks!
>>
>> Stefan Kangas, on behalf of the Emacs maintainers
>
> Building on macOS with --enable-checking=all --with-native-compilation
> gives assertion violations. It builds without the --enable-checking.
>
> gmake -C ../lisp compile-first EMACS="../src/bootstrap-emacs"
>   ELC+ELN  emacs-lisp/comp-common.elc
>   ELC+ELN  emacs-lisp/comp-run.elc
>   ELC+ELN  emacs-lisp/radix-tree.elc
>   ELC+ELN  emacs-lisp/loaddefs-gen.elc
>   ELC+ELN  emacs-lisp/macroexp.elc
>   ELC+ELN  emacs-lisp/cconv.elc
>   ELC+ELN  emacs-lisp/comp-cstr.elc
>   ELC+ELN  emacs-lisp/bytecomp.elc
>   ELC+ELN  emacs-lisp/byte-opt.elc
>   ELC+ELN  emacs-lisp/comp.elc
> gmake -C ../lisp autoloads EMACS="../src/bootstrap-emacs"
>   ELC+ELN  ../lisp/cus-face.elc
>
> comp.c:5322: Emacs fatal error: assertion failed: check_comp_unit_relocs (comp_u)
> gmake[3]: *** [Makefile:285: ../lisp/cus-face.elc] Abort trap: 6
>   ELC+ELN  international/titdic-cnv.elc
>
> comp.c:5322: Emacs fatal error: assertion failed: check_comp_unit_relocs (comp_u)
> Fatal error 6: Aborted
> gmake[3]: *** [Makefile:330: international/titdic-cnv.elc] Abort trap: 6
> gmake[2]: *** [Makefile:901: ../lisp/cus-face.elc] Error 2
> gmake[2]: *** Waiting for unfinished jobs....
> gmake[2]: *** [Makefile:961: ../lisp/loaddefs.el] Error 2
>   ELC+ELN  ../lisp/abbrev.elc
>
> comp.c:5322: Emacs fatal error: assertion failed: check_comp_unit_relocs (comp_u)
> gmake[3]: *** [Makefile:285: ../lisp/abbrev.elc] Abort trap: 6
>   ELC+ELN  ../lisp/button.elc

I think check_comp_unit_relocs should be removed in the branch. What's
left of it the branch, checks in master if everything has been
put in purespace which should be there. IIUC correctly, of course.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 14:20   ` Gerd Möllmann
@ 2024-12-17 14:30     ` Gerd Möllmann
  2024-12-17 17:56       ` Gerd Möllmann
  0 siblings, 1 reply; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-17 14:30 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, Pip Cet, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 284 bytes --]

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> I think check_comp_unit_relocs should be removed in the branch. What's
> left of it the branch, checks in master if everything has been
> put in purespace which should be there. IIUC correctly, of course.

Patch attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Remove-check_comp_unit_relocs.patch --]
[-- Type: text/x-patch, Size: 1722 bytes --]

From 66e73f65966e40af67338cfdf18f89549d687935 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerd=20M=C3=B6llmann?= <gerd@gnu.org>
Date: Tue, 17 Dec 2024 15:28:14 +0100
Subject: [PATCH] Remove check_comp_unit_relocs

* src/comp.c (check_comp_unit_relocs): Removed.
(load_comp_unit): Remove use.
---
 src/comp.c | 27 ---------------------------
 1 file changed, 27 deletions(-)

diff --git a/src/comp.c b/src/comp.c
index f91f7c57f45..2c7f2b704f6 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -5164,32 +5164,6 @@ load_static_obj (struct Lisp_Native_Comp_Unit *comp_u, const char *name)
 
 }
 
-/* Return false when something is wrong or true otherwise.  */
-
-static bool
-check_comp_unit_relocs (struct Lisp_Native_Comp_Unit *comp_u)
-{
-  dynlib_handle_ptr handle = comp_u->handle;
-  Lisp_Object *data_relocs = dynlib_sym (handle, DATA_RELOC_SYM);
-
-  EMACS_INT d_vec_len = XFIXNUM (Flength (comp_u->data_vec));
-
-  for (ptrdiff_t i = 0; i < d_vec_len; i++)
-    {
-      Lisp_Object x = data_relocs[i];
-      if (EQ (x, Qlambda_fixup))
-	return false;
-      else if (NATIVE_COMP_FUNCTIONP (x))
-	{
-	  if (NILP (Fgethash (x, comp_u->lambda_gc_guard_h, Qnil)))
-	    return false;
-	}
-      else if (!EQ (x, AREF (comp_u->data_vec, i)))
-	return false;
-    }
-  return true;
-}
-
 static void
 unset_cu_load_ongoing (Lisp_Object comp_u)
 {
@@ -5319,7 +5293,6 @@ load_comp_unit (struct Lisp_Native_Comp_Unit *comp_u, bool loading_dump,
       /* Make sure data_ephemeral_vec still exists after top_level_run has run.
 	 Guard against sibling call optimization (or any other).  */
       data_ephemeral_vec = data_ephemeral_vec;
-      eassert (check_comp_unit_relocs (comp_u));
     }
 
   if (!recursive_load)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 14:30     ` Gerd Möllmann
@ 2024-12-17 17:56       ` Gerd Möllmann
  2024-12-17 18:50         ` Eli Zaretskii
  2024-12-18  0:18         ` Stefan Kangas
  0 siblings, 2 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-17 17:56 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, Pip Cet, Stefan Monnier

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> I think check_comp_unit_relocs should be removed in the branch. What's
>> left of it the branch, checks in master if everything has been
>> put in purespace which should be there. IIUC correctly, of course.
>
> Patch attached.

Pushed. Complaints to me please.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 17:56       ` Gerd Möllmann
@ 2024-12-17 18:50         ` Eli Zaretskii
  2024-12-17 18:56           ` Gerd Möllmann
  2024-12-18  0:18         ` Stefan Kangas
  1 sibling, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-12-17 18:50 UTC (permalink / raw)
  To: Gerd Möllmann, Andrea Corallo
  Cc: stefankangas, emacs-devel, pipcet, monnier

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: emacs-devel@gnu.org,  Pip Cet <pipcet@protonmail.com>,  Stefan Monnier
>  <monnier@iro.umontreal.ca>
> Date: Tue, 17 Dec 2024 18:56:05 +0100
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >
> >> I think check_comp_unit_relocs should be removed in the branch. What's
> >> left of it the branch, checks in master if everything has been
> >> put in purespace which should be there. IIUC correctly, of course.
> >
> > Patch attached.
> 
> Pushed. Complaints to me please.

I'd like at least Andrea to take a look and confirm.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 18:50         ` Eli Zaretskii
@ 2024-12-17 18:56           ` Gerd Möllmann
  2024-12-18 12:55             ` Andrea Corallo
  0 siblings, 1 reply; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-17 18:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Andrea Corallo, stefankangas, emacs-devel, pipcet, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: emacs-devel@gnu.org,  Pip Cet <pipcet@protonmail.com>,  Stefan Monnier
>>  <monnier@iro.umontreal.ca>
>> Date: Tue, 17 Dec 2024 18:56:05 +0100
>> 
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> 
>> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> >
>> >> I think check_comp_unit_relocs should be removed in the branch. What's
>> >> left of it the branch, checks in master if everything has been
>> >> put in purespace which should be there. IIUC correctly, of course.
>> >
>> > Patch attached.
>> 
>> Pushed. Complaints to me please.
>
> I'd like at least Andrea to take a look and confirm.

It's 81fc23b5d6a60ca4f3d269ab2c88eb9a850bac4c



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 10:47 Merging scratch/no-purespace to remove unexec and purespace Stefan Kangas
  2024-12-17 13:12 ` Gerd Möllmann
@ 2024-12-17 19:30 ` Helmut Eller
  2024-12-17 20:47   ` Stefan Monnier
  2024-12-18  9:30   ` Pip Cet via Emacs development discussions.
  2024-12-18  0:50 ` Po Lu
  2 siblings, 2 replies; 58+ messages in thread
From: Helmut Eller @ 2024-12-17 19:30 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, Pip Cet, Stefan Monnier

On Tue, Dec 17 2024, Stefan Kangas wrote:

[...]
> We, the maintainers, believe that the scratch/no-purespace branch is
> now ready to merge, and would appreciate any final feedback, testing,
> and code reviews.  Specifically, the branch has been primarily tested
> on GNU/Linux and macOS, so testing on other systems would be greatly
> appreciated.

Do you have an estimate what removing purespace will cost in terms of GC
time?  I mean something like "1ms per collection".  Or perhaps a
suggestion how I could measure it?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 19:30 ` Helmut Eller
@ 2024-12-17 20:47   ` Stefan Monnier
  2024-12-18  2:15     ` Stefan Kangas
  2024-12-18  6:56     ` Helmut Eller
  2024-12-18  9:30   ` Pip Cet via Emacs development discussions.
  1 sibling, 2 replies; 58+ messages in thread
From: Stefan Monnier @ 2024-12-17 20:47 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Stefan Kangas, emacs-devel, Pip Cet

>> We, the maintainers, believe that the scratch/no-purespace branch is
>> now ready to merge, and would appreciate any final feedback, testing,
>> and code reviews.  Specifically, the branch has been primarily tested
>> on GNU/Linux and macOS, so testing on other systems would be greatly
>> appreciated.
>
> Do you have an estimate what removing purespace will cost in terms of GC
> time?  I mean something like "1ms per collection".  Or perhaps a
> suggestion how I could measure it?

In the pdump case it should have no effect at all, or maybe even
a slight *speedup*.

That's because the pdump already fails to take advantage of the
purespace (i.e. the GC traces through the purespace like the rest of
the heap).


        Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 17:56       ` Gerd Möllmann
  2024-12-17 18:50         ` Eli Zaretskii
@ 2024-12-18  0:18         ` Stefan Kangas
  1 sibling, 0 replies; 58+ messages in thread
From: Stefan Kangas @ 2024-12-18  0:18 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel, Pip Cet, Stefan Monnier

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> I think check_comp_unit_relocs should be removed in the branch. What's
>>> left of it the branch, checks in master if everything has been
>>> put in purespace which should be there. IIUC correctly, of course.
>>
>> Patch attached.
>
> Pushed. Complaints to me please.

Thanks!



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 10:47 Merging scratch/no-purespace to remove unexec and purespace Stefan Kangas
  2024-12-17 13:12 ` Gerd Möllmann
  2024-12-17 19:30 ` Helmut Eller
@ 2024-12-18  0:50 ` Po Lu
  2024-12-18  2:12   ` Stefan Kangas
  2024-12-18 21:26   ` Stefan Monnier
  2 siblings, 2 replies; 58+ messages in thread
From: Po Lu @ 2024-12-18  0:50 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, Pip Cet, Stefan Monnier

Stefan Kangas <stefankangas@gmail.com> writes:

> In August, we decided to remove the unexec build along with purespace.
> The scratch/no-purespace branch removes both, and has been rebased on
> top of a recent master.
>
> We, the maintainers, believe that the scratch/no-purespace branch is
> now ready to merge, and would appreciate any final feedback, testing,
> and code reviews.  Specifically, the branch has been primarily tested
> on GNU/Linux and macOS, so testing on other systems would be greatly
> appreciated.
>
> Unless we uncover any serious blocking issues, or significant or
> unexpected objections from the community, we plan to merge the
> branch on February 1, 2025.
>
> During our last discussion, we identified some issues with using
> pdumper on some legacy proprietary systems: MS-DOS, Windows 98, and
> Solaris 10 Zone.  As we have explained previously, patches are welcome
> for these issues, but we do not currently consider them as blockers
> for this merge.

Please don't merge this until Emacs 30 is released, whether that be
before the 1st of February or after.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18  0:50 ` Po Lu
@ 2024-12-18  2:12   ` Stefan Kangas
  2024-12-18 21:26   ` Stefan Monnier
  1 sibling, 0 replies; 58+ messages in thread
From: Stefan Kangas @ 2024-12-18  2:12 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel, Pip Cet, Stefan Monnier

Po Lu <luangruo@yahoo.com> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> In August, we decided to remove the unexec build along with purespace.
>> The scratch/no-purespace branch removes both, and has been rebased on
>> top of a recent master.
>>
>> We, the maintainers, believe that the scratch/no-purespace branch is
>> now ready to merge, and would appreciate any final feedback, testing,
>> and code reviews.  Specifically, the branch has been primarily tested
>> on GNU/Linux and macOS, so testing on other systems would be greatly
>> appreciated.
>>
>> Unless we uncover any serious blocking issues, or significant or
>> unexpected objections from the community, we plan to merge the
>> branch on February 1, 2025.
>>
>> During our last discussion, we identified some issues with using
>> pdumper on some legacy proprietary systems: MS-DOS, Windows 98, and
>> Solaris 10 Zone.  As we have explained previously, patches are welcome
>> for these issues, but we do not currently consider them as blockers
>> for this merge.
>
> Please don't merge this until Emacs 30 is released, whether that be
> before the 1st of February or after.

I'm guessing here, but I take that to mean:
1. You intend to work on resolving the problems with pdumper on one or
   more legacy proprietary systems.
2. You will not have much time before Emacs 30 is out, as you have
   previously indicated.

Is that a correct understanding?  If not, may I ask that you elaborate?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 20:47   ` Stefan Monnier
@ 2024-12-18  2:15     ` Stefan Kangas
  2024-12-18  7:11       ` Helmut Eller
  2024-12-18  6:56     ` Helmut Eller
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Kangas @ 2024-12-18  2:15 UTC (permalink / raw)
  To: Stefan Monnier, Helmut Eller; +Cc: emacs-devel, Pip Cet

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> We, the maintainers, believe that the scratch/no-purespace branch is
>>> now ready to merge, and would appreciate any final feedback, testing,
>>> and code reviews.  Specifically, the branch has been primarily tested
>>> on GNU/Linux and macOS, so testing on other systems would be greatly
>>> appreciated.
>>
>> Do you have an estimate what removing purespace will cost in terms of GC
>> time?  I mean something like "1ms per collection".  Or perhaps a
>> suggestion how I could measure it?
>
> In the pdump case it should have no effect at all, or maybe even
> a slight *speedup*.
>
> That's because the pdump already fails to take advantage of the
> purespace (i.e. the GC traces through the purespace like the rest of
> the heap).

I'll note that the best solution to that is to have a generational GC
instead.  Simple, right?

It's not entirely unrelated though: among other things, one reason why
merging this would be good is that it would reportedly simplify the work
on the igc branch.  And indeed any GC-related work now or in future.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 20:47   ` Stefan Monnier
  2024-12-18  2:15     ` Stefan Kangas
@ 2024-12-18  6:56     ` Helmut Eller
  2024-12-21 17:41       ` Helmut Eller
  1 sibling, 1 reply; 58+ messages in thread
From: Helmut Eller @ 2024-12-18  6:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stefan Kangas, emacs-devel, Pip Cet

On Tue, Dec 17 2024, Stefan Monnier wrote:

>> Do you have an estimate what removing purespace will cost in terms of GC
>> time?  I mean something like "1ms per collection".  Or perhaps a
>> suggestion how I could measure it?
>
> In the pdump case it should have no effect at all, or maybe even
> a slight *speedup*.
>
> That's because the pdump already fails to take advantage of the
> purespace (i.e. the GC traces through the purespace like the rest of
> the heap).

So the pdumper copies objects from purespace to the dump like normal
objects; when loading the dump, purespace stays empty.

I had (wrongly) assumed that the pdumper creates a separate section for
pure objects.  Creating such a section sounds easy enough (hmm, maybe
not so easy because of hashtables).  Still not sure if it would be worth
the effort.

Helmut



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18  2:15     ` Stefan Kangas
@ 2024-12-18  7:11       ` Helmut Eller
  2024-12-18 13:35         ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 58+ messages in thread
From: Helmut Eller @ 2024-12-18  7:11 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Stefan Monnier, emacs-devel, Pip Cet

On Tue, Dec 17 2024, Stefan Kangas wrote:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
[...]
>> That's because the pdump already fails to take advantage of the
>> purespace (i.e. the GC traces through the purespace like the rest of
>> the heap).
>
> I'll note that the best solution to that is to have a generational GC
> instead.  Simple, right?

A generational GC is definitely simpler.  Whether it's the "best"
solution is not so clear: A copying GC, like MPS, still needs to trace
and copy pure objects whenever the oldest generation is in the condemned
set.  Moving pure objects to a non-moving pool might be better.

> It's not entirely unrelated though: among other things, one reason why
> merging this would be good is that it would reportedly simplify the work
> on the igc branch.  And indeed any GC-related work now or in future.

Objects in purespace are immutable and immortal.  That's potentially
useful information for the GC.  Removing purespace also removes that
information.  Of course, if the pdumper already throws away this
information, then purespace just adds useless complexity.

Helmut



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 19:30 ` Helmut Eller
  2024-12-17 20:47   ` Stefan Monnier
@ 2024-12-18  9:30   ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-18  9:30 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Stefan Kangas, emacs-devel, Stefan Monnier

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Tue, Dec 17 2024, Stefan Kangas wrote:
>
> [...]
>> We, the maintainers, believe that the scratch/no-purespace branch is
>> now ready to merge, and would appreciate any final feedback, testing,
>> and code reviews.  Specifically, the branch has been primarily tested
>> on GNU/Linux and macOS, so testing on other systems would be greatly
>> appreciated.
>
> Do you have an estimate what removing purespace will cost in terms of GC
> time?  I mean something like "1ms per collection".  Or perhaps a
> suggestion how I could measure it?

"Close to zero" is the best I can do at this time.  In particular, that
implies "not catastrophically worse", which is all that is relevant
right now, IMHO.

> I mean something like "1ms per collection".  Or perhaps a
> suggestion how I could measure it?

I think that all the previously-pure data should be available right away
after starting an Emacs session, so maybe something like:

perf record -e cycles ./src/emacs -Q --batch --eval '(dotimes (i 1000)
(garbage-collect)))'

would be one data point.  Some effects would only be visible in GCs with
deep call stacks, or in large sessions, but that's hard to measure.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-17 18:56           ` Gerd Möllmann
@ 2024-12-18 12:55             ` Andrea Corallo
  2024-12-18 14:03               ` Gerd Möllmann
  0 siblings, 1 reply; 58+ messages in thread
From: Andrea Corallo @ 2024-12-18 12:55 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, stefankangas, emacs-devel, pipcet, monnier

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: emacs-devel@gnu.org,  Pip Cet <pipcet@protonmail.com>,  Stefan Monnier
>>>  <monnier@iro.umontreal.ca>
>>> Date: Tue, 17 Dec 2024 18:56:05 +0100
>>> 
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>> 
>>> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>> >
>>> >> I think check_comp_unit_relocs should be removed in the branch. What's
>>> >> left of it the branch, checks in master if everything has been
>>> >> put in purespace which should be there. IIUC correctly, of course.
>>> >
>>> > Patch attached.
>>> 
>>> Pushed. Complaints to me please.
>>
>> I'd like at least Andrea to take a look and confirm.
>
> It's 81fc23b5d6a60ca4f3d269ab2c88eb9a850bac4c

Hi Gerd,

looking at the commit now, why do you think 'check_comp_unit_relocs'
should be removed?  

Even if now the situation is simpler 'check_comp_unit_relocs' is still
performing some sanity checks like if lambdas are all been fixed-up and
present in 'comp_u->lambda_gc_guard_h'.

  Andrea



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18  7:11       ` Helmut Eller
@ 2024-12-18 13:35         ` Pip Cet via Emacs development discussions.
  0 siblings, 0 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-18 13:35 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Stefan Kangas, Stefan Monnier, emacs-devel

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Tue, Dec 17 2024, Stefan Kangas wrote:
>
>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> [...]
>>> That's because the pdump already fails to take advantage of the
>>> purespace (i.e. the GC traces through the purespace like the rest of
>>> the heap).
>>
>> I'll note that the best solution to that is to have a generational GC
>> instead.  Simple, right?
>
> A generational GC is definitely simpler.  Whether it's the "best"
> solution is not so clear: A copying GC, like MPS, still needs to trace
> and copy pure objects whenever the oldest generation is in the condemned
> set.

>  Moving pure objects to a non-moving pool might be better.

I think we should rephrase that without presupposing the existence of
"pure" objects: Hinting to MPS that an object is expected to be
immutable and have a very long lifetime may have advantages, and that's
a potential reason for introducing (and maintaining) a hinting
mechanism.

On the other hand, it's very common for objects to have those properties
without us knowing in advance that they will, so it's important MPS
works well in the absence of such hints.

>> It's not entirely unrelated though: among other things, one reason why
>> merging this would be good is that it would reportedly simplify the work
>> on the igc branch.  And indeed any GC-related work now or in future.
>
> Objects in purespace are immutable and immortal.  That's potentially
> useful information for the GC.

I may be misunderstanding what you mean by "immutable", but the most
important property "pure" objects had was that they only referenced
other pure objects or static objects, so GC didn't need to look at them
(but, IIUC, this optimization never worked in pdumper builds).  This
required us to make such objects read-only (which caused problems) and
immortal.

Immortality was an undesirable, but necessary, side effect of the "pure"
optimization, not a feature.  IMHO, so was immutability, but some people
consider it a feature not to be able to modify certain objects.

For example, on the no-purespace branch, you can execute

(aset (symbol-name nil) 0 ?N)

and rename nil to Nil, which will make the rest of your Emacs session
unusable.

On master, this code should (and did at one point, IIRC) throw a
CHECK_IMPURE exception.  Right now, it segfaults, which demonstrates
that the current purespace code has suffered from some bit rot and
removing purespace will fix some bugs for that reason alone.

So the two major features of pure space are broken right now.  Fixing
that is an option, and it's the only way we'd ever see a fair comparison
of purespace and no-purespace performance, but I hope it's not going to
happen.

So what's left is a weak hint to the GC that this object is likely to
remain reachable, and not to be modified, but for MPS to set up a "pure"
space of such objects based on this hint seems to me to be an
unrealistic expectation.

> Removing purespace also removes that information.

Removing purespace does leave us with very few read-only objects.  We
can consider introducing a useful set of read-only objects (and agree on
what that even means) after purespace is removed.

However, I would be surprised if the lost information about whether an
object used to be pure turns out to be very useful.

> Of course, if the pdumper already throws away this
> information, then purespace just adds useless complexity.

I believe that's the case.  In fact, once we remove purespace, we can
look at improving how pdumper dumps are handled during GC.  Maybe we can
make it so the pdumper dump looks even more like an ordinary MPS segment
in memory, and then we don't need to treat pdumper objects specially.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18 12:55             ` Andrea Corallo
@ 2024-12-18 14:03               ` Gerd Möllmann
  2024-12-18 16:05                 ` Pip Cet via Emacs development discussions.
  2024-12-18 16:25                 ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-18 14:03 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Eli Zaretskii, stefankangas, emacs-devel, pipcet, monnier

Andrea Corallo <acorallo@gnu.org> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>>> Cc: emacs-devel@gnu.org,  Pip Cet <pipcet@protonmail.com>,  Stefan Monnier
>>>>  <monnier@iro.umontreal.ca>
>>>> Date: Tue, 17 Dec 2024 18:56:05 +0100
>>>> 
>>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>> 
>>>> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>> >
>>>> >> I think check_comp_unit_relocs should be removed in the branch. What's
>>>> >> left of it the branch, checks in master if everything has been
>>>> >> put in purespace which should be there. IIUC correctly, of course.
>>>> >
>>>> > Patch attached.
>>>> 
>>>> Pushed. Complaints to me please.
>>>
>>> I'd like at least Andrea to take a look and confirm.
>>
>> It's 81fc23b5d6a60ca4f3d269ab2c88eb9a850bac4c
>
> Hi Gerd,
>
> looking at the commit now, why do you think 'check_comp_unit_relocs'
> should be removed?  
>
> Even if now the situation is simpler 'check_comp_unit_relocs' is still
> performing some sanity checks like if lambdas are all been fixed-up and
> present in 'comp_u->lambda_gc_guard_h'.
>
>   Andrea

Hi Andrea.

The check you mention checks something that I don't see how it could
happen. (With the usual disclaimers, because it's been some time since I
was in that code for igc.)

When we comp--register-lambda the code putting the result of make_subr
into the hash table and putting it in data_relocs (in the branch) is
just a handful of lines apart. I thereforeo thought we could do without.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18 14:03               ` Gerd Möllmann
@ 2024-12-18 16:05                 ` Pip Cet via Emacs development discussions.
  2024-12-18 16:30                   ` Gerd Möllmann
  2024-12-18 16:25                 ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-18 16:05 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Andrea Corallo, Eli Zaretskii, stefankangas, emacs-devel, monnier

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Andrea Corallo <acorallo@gnu.org> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>>>> Cc: emacs-devel@gnu.org,  Pip Cet <pipcet@protonmail.com>,  Stefan Monnier
>>>>>  <monnier@iro.umontreal.ca>
>>>>> Date: Tue, 17 Dec 2024 18:56:05 +0100
>>>>>
>>>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>>>
>>>>> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>>> >
>>>>> >> I think check_comp_unit_relocs should be removed in the branch. What's
>>>>> >> left of it the branch, checks in master if everything has been
>>>>> >> put in purespace which should be there. IIUC correctly, of course.
>>>>> >
>>>>> > Patch attached.
>>>>>
>>>>> Pushed. Complaints to me please.
>>>>
>>>> I'd like at least Andrea to take a look and confirm.
>>>
>>> It's 81fc23b5d6a60ca4f3d269ab2c88eb9a850bac4c
>>
>> Hi Gerd,
>>
>> looking at the commit now, why do you think 'check_comp_unit_relocs'
>> should be removed?
>>
>> Even if now the situation is simpler 'check_comp_unit_relocs' is still
>> performing some sanity checks like if lambdas are all been fixed-up and
>> present in 'comp_u->lambda_gc_guard_h'.
>>
>>   Andrea
>
> Hi Andrea.
>
> The check you mention checks something that I don't see how it could
> happen. (With the usual disclaimers, because it's been some time since I
> was in that code for igc.)

I'm confused, I thought you did hit the assertion?

AFAICT, the problem is simply that comp.el uses the symbol lambda-fixup.

So a relocation for that symbol is emitted.  But no fixup is, because
this isn't an actual lambda, it's merely the symbol.  The debug code
then sees "lambda-fixup", assumes it's a failed fixup, and asserts.

IOW, the old code happened not to run into this problem because
lambda-fixup was pure, and we never applied the sanity checks to the
pure section.

This "fix" appears to work:

diff --git a/lisp/emacs-lisp/comp.el b/lisp/emacs-lisp/comp.el
index dbd14b2740d..5d364e77e66 100644
--- a/lisp/emacs-lisp/comp.el
+++ b/lisp/emacs-lisp/comp.el
@@ -3254,7 +3254,7 @@ comp--finalize-container
                  ;; from the corresponding m-var.
                  collect (if (gethash obj
                                       (comp-ctxt-byte-func-to-func-h comp-ctxt))
-                             'lambda-fixup
+                             (intern (concat "lambda" (make-string 1 ?-) "fixup"))
                            obj))))
 
 (defun comp--finalize-relocs ()

My suggestion is to fix the "sanity check" on the master branch, change
it to apply to pure relocs there, and restore the fixed check on
scratch/no-purespace afterwards.

Pip




^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18 14:03               ` Gerd Möllmann
  2024-12-18 16:05                 ` Pip Cet via Emacs development discussions.
@ 2024-12-18 16:25                 ` Pip Cet via Emacs development discussions.
  2024-12-18 22:27                   ` Andrea Corallo
  1 sibling, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-18 16:25 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Andrea Corallo, Eli Zaretskii, stefankangas, emacs-devel, monnier

Pip Cet <pipcet@protonmail.com> writes:
> IOW, the old code happened not to run into this problem because
> lambda-fixup was pure, and we never applied the sanity checks to the
> pure section.

Just to be clear: the code on master is fine.  I misunderstood it when
modifying it for purespace removal, resulting in my bug which Gerd
discovered and fixed.  The code on no-purespace is also fine now, but
it's Andrea's call whether he wants some of the checking code restored,
and how.

> My suggestion is to fix the "sanity check" on the master branch, change
> it to apply to pure relocs there, and restore the fixed check on
> scratch/no-purespace afterwards.

Please ignore that. My suggestion is to EXTEND the sanity check on the
master branch to cover pure and impure relocs, and restore the EXTENDED
check before merging scratch/no-purespace.

There is no bug to fix on master.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18 16:05                 ` Pip Cet via Emacs development discussions.
@ 2024-12-18 16:30                   ` Gerd Möllmann
  0 siblings, 0 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-18 16:30 UTC (permalink / raw)
  To: Pip Cet; +Cc: Andrea Corallo, Eli Zaretskii, stefankangas, emacs-devel, monnier

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Andrea Corallo <acorallo@gnu.org> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>>
>>>>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>>>>> Cc: emacs-devel@gnu.org,  Pip Cet <pipcet@protonmail.com>,  Stefan Monnier
>>>>>>  <monnier@iro.umontreal.ca>
>>>>>> Date: Tue, 17 Dec 2024 18:56:05 +0100
>>>>>>
>>>>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>>>>
>>>>>> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>>>> >
>>>>>> >> I think check_comp_unit_relocs should be removed in the branch. What's
>>>>>> >> left of it the branch, checks in master if everything has been
>>>>>> >> put in purespace which should be there. IIUC correctly, of course.
>>>>>> >
>>>>>> > Patch attached.
>>>>>>
>>>>>> Pushed. Complaints to me please.
>>>>>
>>>>> I'd like at least Andrea to take a look and confirm.
>>>>
>>>> It's 81fc23b5d6a60ca4f3d269ab2c88eb9a850bac4c
>>>
>>> Hi Gerd,
>>>
>>> looking at the commit now, why do you think 'check_comp_unit_relocs'
>>> should be removed?
>>>
>>> Even if now the situation is simpler 'check_comp_unit_relocs' is still
>>> performing some sanity checks like if lambdas are all been fixed-up and
>>> present in 'comp_u->lambda_gc_guard_h'.
>>>
>>>   Andrea
>>
>> Hi Andrea.
>>
>> The check you mention checks something that I don't see how it could
>> happen. (With the usual disclaimers, because it's been some time since I
>> was in that code for igc.)
>
> I'm confused, I thought you did hit the assertion?

Correct, building with --enable-checking failed. What confuses?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18  0:50 ` Po Lu
  2024-12-18  2:12   ` Stefan Kangas
@ 2024-12-18 21:26   ` Stefan Monnier
  1 sibling, 0 replies; 58+ messages in thread
From: Stefan Monnier @ 2024-12-18 21:26 UTC (permalink / raw)
  To: Po Lu; +Cc: Stefan Kangas, emacs-devel, Pip Cet

> Please don't merge this until Emacs 30 is released, whether that be
> before the 1st of February or after.

Why?


        Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18 16:25                 ` Pip Cet via Emacs development discussions.
@ 2024-12-18 22:27                   ` Andrea Corallo
  2024-12-19  9:28                     ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 58+ messages in thread
From: Andrea Corallo @ 2024-12-18 22:27 UTC (permalink / raw)
  To: Pip Cet
  Cc: Gerd Möllmann, Eli Zaretskii, stefankangas, emacs-devel,
	monnier

Pip Cet <pipcet@protonmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>> IOW, the old code happened not to run into this problem because
>> lambda-fixup was pure, and we never applied the sanity checks to the
>> pure section.
>
> Just to be clear: the code on master is fine.  I misunderstood it when
> modifying it for purespace removal, resulting in my bug which Gerd
> discovered and fixed.  The code on no-purespace is also fine now, but
> it's Andrea's call whether he wants some of the checking code restored,
> and how.
>
>> My suggestion is to fix the "sanity check" on the master branch, change
>> it to apply to pure relocs there, and restore the fixed check on
>> scratch/no-purespace afterwards.
>
> Please ignore that. My suggestion is to EXTEND the sanity check on the
> master branch to cover pure and impure relocs, and restore the EXTENDED
> check before merging scratch/no-purespace.
>
> There is no bug to fix on master.

Right your analysis is correct, the new code in the branch just made the
symbol 'fixup-lambda' not compilable.

I restored the check and applied a variant of your fix with a comment
around.  scratch/no-purespace work for me now.

On master I don't think I see what we should do and the motivation for.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18 22:27                   ` Andrea Corallo
@ 2024-12-19  9:28                     ` Pip Cet via Emacs development discussions.
  2024-12-19 10:38                       ` Andrea Corallo
  2024-12-19 10:50                       ` Stefan Kangas
  0 siblings, 2 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-19  9:28 UTC (permalink / raw)
  To: Andrea Corallo
  Cc: Gerd Möllmann, Eli Zaretskii, stefankangas, emacs-devel,
	monnier

"Andrea Corallo" <acorallo@gnu.org> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>> IOW, the old code happened not to run into this problem because
>>> lambda-fixup was pure, and we never applied the sanity checks to the
>>> pure section.
>>
>> Just to be clear: the code on master is fine.  I misunderstood it when
>> modifying it for purespace removal, resulting in my bug which Gerd
>> discovered and fixed.  The code on no-purespace is also fine now, but
>> it's Andrea's call whether he wants some of the checking code restored,
>> and how.
>>
>>> My suggestion is to fix the "sanity check" on the master branch, change
>>> it to apply to pure relocs there, and restore the fixed check on
>>> scratch/no-purespace afterwards.
>>
>> Please ignore that. My suggestion is to EXTEND the sanity check on the
>> master branch to cover pure and impure relocs, and restore the EXTENDED
>> check before merging scratch/no-purespace.
>>
>> There is no bug to fix on master.
>
> Right your analysis is correct, the new code in the branch just made the
> symbol 'fixup-lambda' not compilable.
>
> I restored the check and applied a variant of your fix with a comment
> around.  scratch/no-purespace work for me now.

Just to summarize this:
There's now a forbidden symbol, --lambda-fixup.  If you use this symbol
in your code and compile the code with nativecomp, that may appear to work,
but loading the resulting object file into another Emacs will crash that
Emacs, if that Emacs was built with checks enabled.

> On master I don't think I see what we should do and the motivation for.

The scratch/no-purespace branch now tests things more rigorously than
the master branch does: master performs three checks on all impure
relocations and a single check on pure ones, but scratch/no-purespace
performs all three checks on all relocations.

That means when we merge scratch/no-purespace, and hit one of the new
assertions, it may be (and was, in Gerd's case) because the test would
have failed on the master branch but was never performed there, and that
would be unrelated to purespace removal.

What I think we should do doesn't really matter, but it seems quite
obvious to me that we should make the code on the master branch perform
all three checks on all relocations, as the code on no-purespace does.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19  9:28                     ` Pip Cet via Emacs development discussions.
@ 2024-12-19 10:38                       ` Andrea Corallo
  2024-12-19 10:50                       ` Stefan Kangas
  1 sibling, 0 replies; 58+ messages in thread
From: Andrea Corallo @ 2024-12-19 10:38 UTC (permalink / raw)
  To: Pip Cet
  Cc: Gerd Möllmann, Eli Zaretskii, stefankangas, emacs-devel,
	monnier

Pip Cet <pipcet@protonmail.com> writes:

> "Andrea Corallo" <acorallo@gnu.org> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>> IOW, the old code happened not to run into this problem because
>>>> lambda-fixup was pure, and we never applied the sanity checks to the
>>>> pure section.
>>>
>>> Just to be clear: the code on master is fine.  I misunderstood it when
>>> modifying it for purespace removal, resulting in my bug which Gerd
>>> discovered and fixed.  The code on no-purespace is also fine now, but
>>> it's Andrea's call whether he wants some of the checking code restored,
>>> and how.
>>>
>>>> My suggestion is to fix the "sanity check" on the master branch, change
>>>> it to apply to pure relocs there, and restore the fixed check on
>>>> scratch/no-purespace afterwards.
>>>
>>> Please ignore that. My suggestion is to EXTEND the sanity check on the
>>> master branch to cover pure and impure relocs, and restore the EXTENDED
>>> check before merging scratch/no-purespace.
>>>
>>> There is no bug to fix on master.
>>
>> Right your analysis is correct, the new code in the branch just made the
>> symbol 'fixup-lambda' not compilable.
>>
>> I restored the check and applied a variant of your fix with a comment
>> around.  scratch/no-purespace work for me now.
>
> Just to summarize this:
> There's now a forbidden symbol, --lambda-fixup.  If you use this symbol
> in your code and compile the code with nativecomp, that may appear to work,
> but loading the resulting object file into another Emacs will crash that
> Emacs, if that Emacs was built with checks enabled.

Correct, maybe we should use 'comp--lambda-fixup' so it's even more
evidently private to the compiler, not sure if would be worth using a
different kind of placeholder object.

>> On master I don't think I see what we should do and the motivation for.
>
> The scratch/no-purespace branch now tests things more rigorously than
> the master branch does: master performs three checks on all impure
> relocations and a single check on pure ones, but scratch/no-purespace
> performs all three checks on all relocations.
>
> That means when we merge scratch/no-purespace, and hit one of the new
> assertions, it may be (and was, in Gerd's case) because the test would
> have failed on the master branch but was never performed there, and that
> would be unrelated to purespace removal.
>
> What I think we should do doesn't really matter, but it seems quite
> obvious to me that we should make the code on the master branch perform
> all three checks on all relocations, as the code on no-purespace does.

master doesn't have forbidden symbols that can't be stored in data_vec
so why should we test for that?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19  9:28                     ` Pip Cet via Emacs development discussions.
  2024-12-19 10:38                       ` Andrea Corallo
@ 2024-12-19 10:50                       ` Stefan Kangas
  2024-12-19 12:08                         ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Kangas @ 2024-12-19 10:50 UTC (permalink / raw)
  To: Pip Cet, Andrea Corallo
  Cc: Gerd Möllmann, Eli Zaretskii, emacs-devel, monnier

Pip Cet <pipcet@protonmail.com> writes:

> What I think we should do doesn't really matter, but it seems quite
> obvious to me that we should make the code on the master branch
> perform all three checks on all relocations, as the code on
> no-purespace does.

Maybe.  But won't we get those checks with no additional effort once we
merge no-purespace, and if so, can't it wait until then?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19 10:50                       ` Stefan Kangas
@ 2024-12-19 12:08                         ` Pip Cet via Emacs development discussions.
  2024-12-19 17:55                           ` Stefan Kangas
  0 siblings, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-19 12:08 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Andrea Corallo, Gerd Möllmann, Eli Zaretskii, emacs-devel,
	monnier

"Stefan Kangas" <stefankangas@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> What I think we should do doesn't really matter, but it seems quite
>> obvious to me that we should make the code on the master branch
>> perform all three checks on all relocations, as the code on
>> no-purespace does.
>
> Maybe.  But won't we get those checks with no additional effort once we
> merge no-purespace,

Yes, we will.  (And the forbidden symbol; even if the forbidden symbol
doesn't cause trouble, which I think it will, it's simply very poor
programming practice to do things that way, particularly since the
crash may happen a long time after the compilation.  But, again, what I
think obviously doesn't matter here.  I'll just remember that
--enable-checking causes false positive crashes and shouldn't be used).

That's a problem, because if we run into problems there, we'll have no
way of knowing whether the problem was present on the pre-merge master
(where we didn't check) or not.  But, again, as it's specific to
--enable-checking, we can simply stop using that.

> and if so, can't it wait until then?

Of course, but changing two things at a time makes debugging harder.
(And IIUC, we won't rename the symbol on master until we merge, so
that's three changes which can cause trouble with the nativecomp code,
all introduced at the same time).

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19 12:08                         ` Pip Cet via Emacs development discussions.
@ 2024-12-19 17:55                           ` Stefan Kangas
  2024-12-19 20:13                             ` Pip Cet via Emacs development discussions.
  2024-12-20  8:42                             ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 58+ messages in thread
From: Stefan Kangas @ 2024-12-19 17:55 UTC (permalink / raw)
  To: Pip Cet
  Cc: Andrea Corallo, Gerd Möllmann, Eli Zaretskii, emacs-devel,
	monnier

Pip Cet <pipcet@protonmail.com> writes:

> "Stefan Kangas" <stefankangas@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> What I think we should do doesn't really matter, but it seems quite
>>> obvious to me that we should make the code on the master branch
>>> perform all three checks on all relocations, as the code on
>>> no-purespace does.
>>
>> Maybe.  But won't we get those checks with no additional effort once we
>> merge no-purespace,
>
> Yes, we will.  (And the forbidden symbol; even if the forbidden symbol
> doesn't cause trouble, which I think it will, it's simply very poor
> programming practice to do things that way, particularly since the
> crash may happen a long time after the compilation.  But, again, what I
> think obviously doesn't matter here.  I'll just remember that
> --enable-checking causes false positive crashes and shouldn't be used).

I don't think the existence of one symbol that will crash Emacs in some
situations means that --enable-checking should be completely avoided.
It's still quite useful, and we're fine as long as we avoid using that
one symbol, right?

OTOH and IMHO, it would be preferable if that symbol could not crash
Emacs.  Can we come up with a good way to fix that, while preserving the
check that Andrea wants to keep?

> That's a problem, because if we run into problems there, we'll have no
> way of knowing whether the problem was present on the pre-merge master
> (where we didn't check) or not.  But, again, as it's specific to
> --enable-checking, we can simply stop using that.
>
>> and if so, can't it wait until then?
>
> Of course, but changing two things at a time makes debugging harder.
> (And IIUC, we won't rename the symbol on master until we merge, so
> that's three changes which can cause trouble with the nativecomp code,
> all introduced at the same time).

I still don't think I understand your argument here, sorry.

The scratch/no-purespace branch contains several different changes, all
of which have had to pass through review, testing and verification.

Why is it particularly important to "backport" this change to master, in
advance of the merge, but not any of the other changes on that branch?
What am I missing?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19 17:55                           ` Stefan Kangas
@ 2024-12-19 20:13                             ` Pip Cet via Emacs development discussions.
  2024-12-20 15:59                               ` Stefan Monnier
  2025-01-03 22:36                               ` Pip Cet via Emacs development discussions.
  2024-12-20  8:42                             ` Pip Cet via Emacs development discussions.
  1 sibling, 2 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-19 20:13 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Andrea Corallo, Gerd Möllmann, Eli Zaretskii, emacs-devel,
	monnier

"Stefan Kangas" <stefankangas@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> "Stefan Kangas" <stefankangas@gmail.com> writes:
>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>
>>>> What I think we should do doesn't really matter, but it seems quite
>>>> obvious to me that we should make the code on the master branch
>>>> perform all three checks on all relocations, as the code on
>>>> no-purespace does.
>>>
>>> Maybe.  But won't we get those checks with no additional effort once we
>>> merge no-purespace,
>>
>> Yes, we will.  (And the forbidden symbol; even if the forbidden symbol
>> doesn't cause trouble, which I think it will, it's simply very poor
>> programming practice to do things that way, particularly since the
>> crash may happen a long time after the compilation.  But, again, what I
>> think obviously doesn't matter here.  I'll just remember that
>> --enable-checking causes false positive crashes and shouldn't be used).
>
> I don't think the existence of one symbol that will crash Emacs in some
> situations means that --enable-checking should be completely avoided.
> It's still quite useful, and we're fine as long as we avoid using that
> one symbol, right?
>
> OTOH and IMHO, it would be preferable if that symbol could not crash
> Emacs.  Can we come up with a good way to fix that, while preserving the
> check that Andrea wants to keep?

That sounds like a good thing to focus on, yes.  We need to have a value
in a vector that we Fread that is distinguishable from all other values.

One cheap way of doing that is to make the structure circular, and use
references to the object being read as a placeholder value.

So instead of having

"[a b lambda-fixup c]"

in the blob, we'd have

"#1=[a b #1# c]"

Note that that makes element 2 of the vector EQ to the vector being
read, which only comes into existence during reading, so there's no way
to create such a reference in any other way.

We could also make this more explicit by using a more complex expression
involving #1# instead of #1# directly.  For example,

"#1=[a b (unresolved-lambda-fixup #1#) c]"

One side effect is that we need to make sure read-circle is t before
reading the string (which is, IIUC, already produced with print-circle
bound to t, so reading it with read-circle seems like the correct thing
to do).

For example, right now this code doesn't work:

(let ((print-circle t) (read-circle nil))
  (message "%S" (funcall (native-compile (lambda () #1=[#1#])))))

(read, of course, with read-circle bound to t)

but this does:

(let ((print-circle t) (read-circle nil))
  (message "%S" (funcall (lambda () #1=[#1#]))))

So this seems like a cheap drive-by fix.

A more complicated approach would be to add a special read syntax (if
there isn't one that I missed) producing references to objects defined
outside of the read string: in essence, we'd call Fread in a special way
that makes #s(magic-cookie x) evaluate to be eq to something passed in
as a "magic cookie".  This might be generally useful and avoids the
problems of circular data structures.

But I'll try to come up with a patch doing the simple #1# thing first.

>> That's a problem, because if we run into problems there, we'll have no
>> way of knowing whether the problem was present on the pre-merge master
>> (where we didn't check) or not.  But, again, as it's specific to
>> --enable-checking, we can simply stop using that.
>>
>>> and if so, can't it wait until then?
>>
>> Of course, but changing two things at a time makes debugging harder.
>> (And IIUC, we won't rename the symbol on master until we merge, so
>> that's three changes which can cause trouble with the nativecomp code,
>> all introduced at the same time).
>
> I still don't think I understand your argument here, sorry.
>
> The scratch/no-purespace branch contains several different changes, all
> of which have had to pass through review, testing and verification.

> Why is it particularly important to "backport" this change to master, in
> advance of the merge, but not any of the other changes on that branch?
> What am I missing?

On reflection, it was I who was missing something, that no-purespace
already has moved from "the minimum set of changes necessary for getting
rid of purespace and unexec" to "a coherent set of changes including
purespace and unexec removal and quite a few other things".  I think
that's a good change to have happened, it just took me a while to catch
up with it.

So no continued objections there.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19 17:55                           ` Stefan Kangas
  2024-12-19 20:13                             ` Pip Cet via Emacs development discussions.
@ 2024-12-20  8:42                             ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-20  8:42 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Andrea Corallo, Gerd Möllmann, Eli Zaretskii, emacs-devel,
	monnier

Pip Cet <pipcet@protonmail.com> writes:

> But I'll try to come up with a patch doing the simple #1# thing first.

Easy enough to do, but it doesn't solve the read-circle problem (which
appears to affect autoloaded .elc files as well as .eln files, so it's
probably best to discuss that in a separate bug report once this is
merged).

From 6f4133b54edc890993d6198940800cc9a19df85c Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@protonmail.com>
Date: Fri, 20 Dec 2024 07:53:38 +0000
Subject: [PATCH] Use circular references as placeholders in nativecomp relocs

* lisp/emacs-lisp/comp.el (comp--finalize-container): Use temporary
placeholder rather than `--lambda-fixup'.
* src/comp.c (declare_imported_data_relocs): Replace temporary
placeholders by permanent placeholders.
(check_comp_unit_relocs): Check for circular reference instead of
`--lambda-fixup'.
(syms_of_comp): Remove `--lambda-fixup'.
* src/pdumper.c (dump_do_dump_relocation): Check for circular reference
instead of `--lambda-fixup'.
---
 lisp/emacs-lisp/comp.el |  8 ++++----
 src/comp.c              | 14 ++++++++++----
 src/pdumper.c           |  2 +-
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/lisp/emacs-lisp/comp.el b/lisp/emacs-lisp/comp.el
index ab6fd77f11a..d5351fe440d 100644
--- a/lisp/emacs-lisp/comp.el
+++ b/lisp/emacs-lisp/comp.el
@@ -3254,10 +3254,10 @@ comp--finalize-container
                  ;; from the corresponding m-var.
                  collect (if (gethash obj
                                       (comp-ctxt-byte-func-to-func-h comp-ctxt))
-                             ;; Hack not to have `--lambda-fixup' in
-                             ;; data relocations as it would trigger the
-                             ;; check in 'check_comp_unit_relocs'.
-                             (intern (concat (make-string 1 ?-) "-lambda-fixup"))
+                             ;; Temporary placeholder, to be replaced by
+                             ;; a circular reference to the vector
+                             ;; containing the element
+                             (comp-data-container-idx cont)
                            obj))))
 
 (defun comp--finalize-relocs ()
diff --git a/src/comp.c b/src/comp.c
index 8b38adec252..3b3dfa58602 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -2854,13 +2854,20 @@ emit_static_object (const char *name, Lisp_Object obj)
 declare_imported_data_relocs (Lisp_Object container, const char *code_symbol,
 			      const char *text_symbol)
 {
+  Lisp_Object index_table =
+    CALL1I (comp-data-container-idx, container);
   /* Imported objects.  */
   reloc_array_t res;
   res.len =
-    XFIXNUM (CALL1I (hash-table-count,
-		     CALL1I (comp-data-container-idx, container)));
+    XFIXNUM (CALL1I (hash-table-count, index_table));
   Lisp_Object d_reloc = CALL1I (comp-data-container-l, container);
   d_reloc = Fvconcat (1, &d_reloc);
+  /* Replace temporary placeholders, turning them into circular
+   * references to the vector itself. */
+  EMACS_INT n = XFIXNUM (Flength (d_reloc));
+  for (EMACS_INT i = 0; i < n; i++)
+    if (EQ (AREF (d_reloc, i), index_table))
+      ASET (d_reloc, i, d_reloc);
 
   res.r_val =
     gcc_jit_lvalue_as_rvalue (
@@ -5173,7 +5180,7 @@ check_comp_unit_relocs (struct Lisp_Native_Comp_Unit *comp_u)
   for (ptrdiff_t i = 0; i < d_vec_len; i++)
     {
       Lisp_Object x = data_relocs[i];
-      if (EQ (x, Q__lambda_fixup))
+      if (EQ (x, comp_u->data_vec))
 	return false;
       else if (NATIVE_COMP_FUNCTIONP (x))
 	{
@@ -5622,7 +5629,6 @@ syms_of_comp (void)
   DEFSYM (Qfixnum, "fixnum");
   DEFSYM (Qscratch, "scratch");
   DEFSYM (Qlate, "late");
-  DEFSYM (Q__lambda_fixup, "--lambda-fixup");
   DEFSYM (Qgccjit, "gccjit");
   DEFSYM (Qcomp_subr_trampoline_install, "comp-subr-trampoline-install");
   DEFSYM (Qnative_comp_warning_on_missing_source,
diff --git a/src/pdumper.c b/src/pdumper.c
index d45bbc84bba..956da4d8c72 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -5504,7 +5504,7 @@ dump_do_dump_relocation (const uintptr_t dump_base,
 	    XSETSUBR (tem, subr);
 	    Lisp_Object *fixup =
 	      &(comp_u->data_relocs[XFIXNUM (lambda_data_idx)]);
-	    eassert (EQ (*fixup, Q__lambda_fixup));
+	    eassert (EQ (*fixup, comp_u->data_vec));
 	    *fixup = tem;
 	    Fputhash (tem, Qt, comp_u->lambda_gc_guard_h);
 	  }
-- 
2.47.0





^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19 20:13                             ` Pip Cet via Emacs development discussions.
@ 2024-12-20 15:59                               ` Stefan Monnier
  2024-12-20 16:22                                 ` Pip Cet via Emacs development discussions.
  2025-01-03 22:36                               ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2024-12-20 15:59 UTC (permalink / raw)
  To: Pip Cet
  Cc: Stefan Kangas, Andrea Corallo, Gerd Möllmann, Eli Zaretskii,
	emacs-devel

>> OTOH and IMHO, it would be preferable if that symbol could not crash
>> Emacs.  Can we come up with a good way to fix that, while preserving the
>> check that Andrea wants to keep?
>
> That sounds like a good thing to focus on, yes.  We need to have a value
> in a vector that we Fread that is distinguishable from all other values.

How 'bout an uninterned symbol `#:foo`?


        Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 15:59                               ` Stefan Monnier
@ 2024-12-20 16:22                                 ` Pip Cet via Emacs development discussions.
  2024-12-20 17:25                                   ` Gerd Möllmann
  0 siblings, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-20 16:22 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Stefan Kangas, Andrea Corallo, Gerd Möllmann, Eli Zaretskii,
	emacs-devel

"Stefan Monnier" <monnier@iro.umontreal.ca> writes:

>>> OTOH and IMHO, it would be preferable if that symbol could not crash
>>> Emacs.  Can we come up with a good way to fix that, while preserving the
>>> check that Andrea wants to keep?
>>
>> That sounds like a good thing to focus on, yes.  We need to have a value
>> in a vector that we Fread that is distinguishable from all other values.
>
> How 'bout an uninterned symbol `#:foo`?

I think those are legal for native-compiled subrs to use (there's a
comment about it, at least), so that wouldn't do us any good, would it?

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 16:22                                 ` Pip Cet via Emacs development discussions.
@ 2024-12-20 17:25                                   ` Gerd Möllmann
  2024-12-20 20:35                                     ` Andrea Corallo
  2024-12-20 20:38                                     ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-20 17:25 UTC (permalink / raw)
  To: Pip Cet via Emacs development discussions.
  Cc: Stefan Monnier, Pip Cet, Stefan Kangas, Andrea Corallo,
	Eli Zaretskii

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

> "Stefan Monnier" <monnier@iro.umontreal.ca> writes:
>
>>>> OTOH and IMHO, it would be preferable if that symbol could not crash
>>>> Emacs.  Can we come up with a good way to fix that, while preserving the
>>>> check that Andrea wants to keep?
>>>
>>> That sounds like a good thing to focus on, yes.  We need to have a value
>>> in a vector that we Fread that is distinguishable from all other values.
>>
>> How 'bout an uninterned symbol `#:foo`?
>
> I think those are legal for native-compiled subrs to use (there's a
> comment about it, at least), so that wouldn't do us any good, would it?
>
> Pip

Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
Wouldn't that help?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 17:25                                   ` Gerd Möllmann
@ 2024-12-20 20:35                                     ` Andrea Corallo
  2024-12-20 20:39                                       ` Pip Cet via Emacs development discussions.
  2024-12-20 20:38                                     ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 58+ messages in thread
From: Andrea Corallo @ 2024-12-20 20:35 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via Emacs development discussions., Stefan Monnier,
	Pip Cet, Stefan Kangas, Eli Zaretskii

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
>
>> "Stefan Monnier" <monnier@iro.umontreal.ca> writes:
>>
>>>>> OTOH and IMHO, it would be preferable if that symbol could not crash
>>>>> Emacs.  Can we come up with a good way to fix that, while preserving the
>>>>> check that Andrea wants to keep?
>>>>
>>>> That sounds like a good thing to focus on, yes.  We need to have a value
>>>> in a vector that we Fread that is distinguishable from all other values.
>>>
>>> How 'bout an uninterned symbol `#:foo`?
>>
>> I think those are legal for native-compiled subrs to use (there's a
>> comment about it, at least), so that wouldn't do us any good, would it?
>>
>> Pip
>
> Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
> Wouldn't that help?

Yes. we want and still can use eq in the test, I guess the best is to
create and store an uninterned symbol somewhere, so we can use it to
mark and test those locations in the relocation vector.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 17:25                                   ` Gerd Möllmann
  2024-12-20 20:35                                     ` Andrea Corallo
@ 2024-12-20 20:38                                     ` Pip Cet via Emacs development discussions.
  2024-12-20 20:57                                       ` Gerd Möllmann
  1 sibling, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-20 20:38 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Stefan Monnier, Stefan Kangas, Andrea Corallo, Eli Zaretskii

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
>
>> "Stefan Monnier" <monnier@iro.umontreal.ca> writes:
>>
>>>>> OTOH and IMHO, it would be preferable if that symbol could not crash
>>>>> Emacs.  Can we come up with a good way to fix that, while preserving the
>>>>> check that Andrea wants to keep?
>>>>
>>>> That sounds like a good thing to focus on, yes.  We need to have a value
>>>> in a vector that we Fread that is distinguishable from all other values.
>>>
>>> How 'bout an uninterned symbol `#:foo`?
>>
>> I think those are legal for native-compiled subrs to use (there's a
>> comment about it, at least), so that wouldn't do us any good, would it?
>>
>> Pip
>
> Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
> Wouldn't that help?

I don't see how, sorry.  Nativecomp produces a string in the .eln file
that is passed to make_string(), then to Fread().  The string can
include source code for uninterned symbols, but there's no way for us to
tell whether a #:lambda-fixup in there was put in by us or by the user.

What we could do is expand read so this:

(let* ((cookie (cons nil nil))
       (read-table (list (cons "cookie" cookie)))
       (result (read "#s(cookie cookie)")))
   (eq result cookie))

to return t.  Then we'd have an expressive syntax for this kind of
thing (obviously, "cookie" should be something useful in the #s(...)
syntax.

A slightly different option is to do this:

(let* ((cookie (cons nil nil))
       (read-circle (list (cons 1 cookie)))
       (result (read "#1#")))
   (eq result cookie))

in effect allowing us to use prime the read-circle table with
externally-provided objects.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 20:35                                     ` Andrea Corallo
@ 2024-12-20 20:39                                       ` Pip Cet via Emacs development discussions.
  2024-12-21  6:33                                         ` Gerd Möllmann
  2024-12-21  6:56                                         ` Andrea Corallo
  0 siblings, 2 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-20 20:39 UTC (permalink / raw)
  To: Andrea Corallo
  Cc: Gerd Möllmann,
	Pip Cet via "Emacs development discussions.",
	Stefan Monnier, Stefan Kangas, Eli Zaretskii

"Andrea Corallo" <acorallo@gnu.org> writes:

>> Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
>> Wouldn't that help?
>
> Yes. we want and still can use eq in the test, I guess the best is to
> create and store an uninterned symbol somewhere, so we can use it to
> mark and test those locations in the relocation vector.

I still don't see how, sorry.  We can put a new uninterned symbol into
the relocation vector, but there's no read syntax for "put this specific
uninterned symbol here", is there?

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 20:38                                     ` Pip Cet via Emacs development discussions.
@ 2024-12-20 20:57                                       ` Gerd Möllmann
  0 siblings, 0 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-20 20:57 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Stefan Monnier, Stefan Kangas, Andrea Corallo, Eli Zaretskii

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>> writes:
>>
>>> "Stefan Monnier" <monnier@iro.umontreal.ca> writes:
>>>
>>>>>> OTOH and IMHO, it would be preferable if that symbol could not crash
>>>>>> Emacs.  Can we come up with a good way to fix that, while preserving the
>>>>>> check that Andrea wants to keep?
>>>>>
>>>>> That sounds like a good thing to focus on, yes.  We need to have a value
>>>>> in a vector that we Fread that is distinguishable from all other values.
>>>>
>>>> How 'bout an uninterned symbol `#:foo`?
>>>
>>> I think those are legal for native-compiled subrs to use (there's a
>>> comment about it, at least), so that wouldn't do us any good, would it?
>>>
>>> Pip
>>
>> Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
>> Wouldn't that help?
>
> I don't see how, sorry.  Nativecomp produces a string in the .eln file
> that is passed to make_string(), then to Fread().  The string can
> include source code for uninterned symbols, but there's no way for us to
> tell whether a #:lambda-fixup in there was put in by us or by the user.
>
> What we could do is expand read so this:
>
> (let* ((cookie (cons nil nil))
>        (read-table (list (cons "cookie" cookie)))
>        (result (read "#s(cookie cookie)")))
>    (eq result cookie))
>
> to return t.  Then we'd have an expressive syntax for this kind of
> thing (obviously, "cookie" should be something useful in the #s(...)
> syntax.
>
> A slightly different option is to do this:
>
> (let* ((cookie (cons nil nil))
>        (read-circle (list (cons 1 cookie)))
>        (result (read "#1#")))
>    (eq result cookie))
>
> in effect allowing us to use prime the read-circle table with
> externally-provided objects.
>

I was thinking of using something like

  (read-from-string "[#1=#:a #1# #1#]")
  => ([a a a] . 16)

in which case the first element of the result vector is the uninterned
symbol that, only in this read result, would serve as a marker. The
marker would be different with each read.

Don't know if that's good, though. Just what I think Stefan's idea was.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 20:39                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-21  6:33                                         ` Gerd Möllmann
  2024-12-21  6:56                                         ` Andrea Corallo
  1 sibling, 0 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-21  6:33 UTC (permalink / raw)
  To: Pip Cet via Emacs development discussions.
  Cc: Andrea Corallo, Pip Cet, Stefan Monnier, Stefan Kangas,
	Eli Zaretskii

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

> "Andrea Corallo" <acorallo@gnu.org> writes:
>
>>> Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
>>> Wouldn't that help?
>>
>> Yes. we want and still can use eq in the test, I guess the best is to
>> create and store an uninterned symbol somewhere, so we can use it to
>> mark and test those locations in the relocation vector.
>
> I still don't see how, sorry.  We can put a new uninterned symbol into
> the relocation vector, but there's no read syntax for "put this specific
> uninterned symbol here", is there?

No, there isn't except for the #1=, #1# thing I mentioned.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-20 20:39                                       ` Pip Cet via Emacs development discussions.
  2024-12-21  6:33                                         ` Gerd Möllmann
@ 2024-12-21  6:56                                         ` Andrea Corallo
  1 sibling, 0 replies; 58+ messages in thread
From: Andrea Corallo @ 2024-12-21  6:56 UTC (permalink / raw)
  To: Pip Cet
  Cc: Gerd Möllmann,
	Pip Cet via "Emacs development discussions.",
	Stefan Monnier, Stefan Kangas, Eli Zaretskii

Pip Cet <pipcet@protonmail.com> writes:

> "Andrea Corallo" <acorallo@gnu.org> writes:
>
>>> Uninterned symbols are unique, i.e. (eq '#:a '#:a) => nil.
>>> Wouldn't that help?
>>
>> Yes. we want and still can use eq in the test, I guess the best is to
>> create and store an uninterned symbol somewhere, so we can use it to
>> mark and test those locations in the relocation vector.
>
> I still don't see how, sorry.  We can put a new uninterned symbol into
> the relocation vector, but there's no read syntax for "put this specific
> uninterned symbol here", is there?

Ops you are totally right. Mmhhh...



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-18  6:56     ` Helmut Eller
@ 2024-12-21 17:41       ` Helmut Eller
  2024-12-21 18:32         ` Gerd Möllmann
                           ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Helmut Eller @ 2024-12-21 17:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stefan Kangas, emacs-devel, Pip Cet

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On Wed, Dec 18 2024, Helmut Eller wrote:

[...]
> So the pdumper copies objects from purespace to the dump like normal
> objects; when loading the dump, purespace stays empty.
>
> I had (wrongly) assumed that the pdumper creates a separate section for
> pure objects.  Creating such a section sounds easy enough (hmm, maybe
> not so easy because of hashtables).  Still not sure if it would be worth
> the effort.

Out of curiosity, I implemented such a section with the attached patch.
It seems that it would save ~2ms per collection cycle.

To measure this, I compared the output of 

 ./src/emacs -Q --batch --eval \
 '(let* ((stats (benchmark-run 1000 (garbage-collect)))) 
     (message "%s" (/ (elt stats 2) (elt stats 1))))'

between the versions with and without the patch.  The results was:

   without-pure-section: 0.006251480181
      with-pure-section: 0.003986384857

Helmut


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-WIP-create-pure-space-in-dump.patch --]
[-- Type: text/x-diff, Size: 10755 bytes --]

From ad64bcdfa2d91e59e64adbcf896ddf66bba725cb Mon Sep 17 00:00:00 2001
From: Helmut Eller <eller.helmut@gmail.com>
Date: Sat, 21 Dec 2024 16:06:45 +0100
Subject: [PATCH] WIP create pure space in dump

---
 src/alloc.c    |   9 +++++
 src/frame.c    |   7 +++-
 src/pdumper.c  | 108 +++++++++++++++++++++++++++++++++++++++++++------
 src/puresize.h |   8 +++-
 4 files changed, 118 insertions(+), 14 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 4fab0d54248..6f57563d80c 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -386,7 +386,12 @@ static char *spare_memory[7];
    remapping on more recent systems because this is less important
    nowadays than in the days of small memories and timesharing.  */
 
+#if PDUMPER_PURE == 1
+EMACS_INT *pure;
+#else
 EMACS_INT pure[(PURESIZE + sizeof (EMACS_INT) - 1) / sizeof (EMACS_INT)] = {1,};
+#endif
+
 #define PUREBEG (char *) pure
 
 /* Pointer to the pure area, and its size.  */
@@ -8111,6 +8116,10 @@ init_alloc_once (void)
 static void
 init_alloc_once_for_pdumper (void)
 {
+#ifdef PDUMPER_PURE
+  pure = xzalloc ((PURESIZE + sizeof (EMACS_INT) - 1));
+  pure[0] = 2;
+#endif
   purebeg = PUREBEG;
   pure_size = PURESIZE;
   mem_init ();
diff --git a/src/frame.c b/src/frame.c
index f22bd501a8d..0bef8ba8661 100644
--- a/src/frame.c
+++ b/src/frame.c
@@ -53,6 +53,7 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 #include "widget.h"
 #endif
 #include "pdumper.h"
+#include "puresize.h"
 
 /* The currently selected frame.  */
 Lisp_Object selected_frame;
@@ -1201,7 +1202,11 @@ make_initial_frame (void)
   Vframe_list = Fcons (frame, Vframe_list);
 
   tty_frame_count = 1;
-  fset_name (f, build_pure_c_string ("F1"));
+
+  if (PDUMPER_PURE)
+    fset_name (f, build_string ("F1"));
+  else
+    fset_name (f, build_pure_c_string ("F1"));
 
   SET_FRAME_VISIBLE (f, 1);
 
diff --git a/src/pdumper.c b/src/pdumper.c
index c8baa311854..47fa979da8c 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -45,6 +45,7 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 #include "thread.h"
 #include "bignum.h"
 #include "treesit.h"
+#include "puresize.h"
 
 #ifdef CHECK_STRUCTS
 # include "dmpstruct.h"
@@ -399,6 +400,9 @@ struct dump_header
 
   /* Offset of a vector of the dumped hash tables.  */
   dump_off hash_list;
+
+  dump_off pure_start;
+  dump_off pure_end;
 };
 
 /* Double-ended singly linked list.  */
@@ -480,6 +484,7 @@ struct dump_flags
   bool_bf defer_cold_objects : 1;
   /* Punt on copied objects: defer them to ctx->copied_queue.  */
   bool_bf defer_copied_objects : 1;
+  bool_bf defer_pure_objects : 1;
 };
 
 /* Information we use while we dump.  Note that we're not the garbage
@@ -546,6 +551,7 @@ struct dump_context
   Lisp_Object copied_queue;
   /* Queue of cold objects to dump.  */
   Lisp_Object cold_queue;
+  Lisp_Object pure_queue;
 
   /* Relocations in the dump.  */
   Lisp_Object dump_relocs[RELOC_NUM_PHASES];
@@ -576,7 +582,8 @@ struct dump_context
    are physical dump offsets.  */
 enum dump_object_special_offset
   {
-   DUMP_OBJECT_IS_RUNTIME_MAGIC = -6,
+   DUMP_OBJECT_IS_RUNTIME_MAGIC = -7,
+   DUMP_OBJECT_ON_PURE_QUEUE = -6,
    DUMP_OBJECT_ON_COPIED_QUEUE = -5,
    DUMP_OBJECT_ON_HASH_TABLE_QUEUE = -4,
    DUMP_OBJECT_ON_SYMBOL_QUEUE = -3,
@@ -2620,15 +2627,18 @@ dump_vectorlike_generic (struct dump_context *ctx,
       skip = 0;
     }
 
-  /* We may have written a non-Lisp vector prefix above.  If we have,
-     pad to the lisp content start with zero, and make sure we didn't
-     scribble beyond that start.  */
-  dump_off prefix_size = ctx->offset - prefix_start_offset;
-  eassert (prefix_size > 0);
-  dump_off skip_start = ptrdiff_t_to_dump_off ((char *) &v->contents[skip]
-					       - (char *) v);
-  eassert (skip_start >= prefix_size);
-  dump_write_zero (ctx, skip_start - prefix_size);
+  if (ctx->flags.dump_object_contents)
+    {
+      /* We may have written a non-Lisp vector prefix above.  If we have,
+	 pad to the lisp content start with zero, and make sure we didn't
+	 scribble beyond that start.  */
+      dump_off prefix_size = ctx->offset - prefix_start_offset;
+      eassert (prefix_size > 0);
+      dump_off skip_start = ptrdiff_t_to_dump_off ((char *) &v->contents[skip]
+						   - (char *) v);
+      eassert (skip_start >= prefix_size);
+      dump_write_zero (ctx, skip_start - prefix_size);
+    }
 
   /* dump_object_start isn't what records conservative-GC object
      starts --- dump_object_1 does --- so the hack below of using
@@ -3148,6 +3158,24 @@ dump_vectorlike (struct dump_context *ctx,
   error_unsupported_dump_object (ctx, lv, msg);
 }
 
+static bool
+is_pure_object (Lisp_Object x)
+{
+  switch (XTYPE (x))
+    {
+    case Lisp_Symbol: return PURE_P (XSYMBOL (x));
+    case Lisp_String: return PURE_P (XSTRING (x));
+    case Lisp_Cons: return PURE_P (XCONS (x));
+    case Lisp_Vectorlike: return PURE_P (XVECTOR (x));
+      //case Lisp_Float: return PURE_P (XFLOAT (x));
+    case Lisp_Float: return false;
+    case Lisp_Int0: return false;
+    case Lisp_Int1: return false;
+    case Lisp_Type_Unused0: emacs_abort ();
+    }
+  emacs_abort ();
+}
+
 /* Add an object to the dump.
 
    CTX is the dump context; OBJECT is the object to add.  Normally,
@@ -3218,6 +3246,25 @@ dump_object (struct dump_context *ctx, Lisp_Object object)
       return offset;
     }
 
+  if (is_pure_object (object) && ctx->flags.defer_pure_objects)
+    {
+      if (offset != DUMP_OBJECT_ON_PURE_QUEUE)
+        {
+	  eassert (offset == DUMP_OBJECT_ON_NORMAL_QUEUE
+		   || offset == DUMP_OBJECT_NOT_SEEN);
+          dump_push (&ctx->pure_queue, object);
+	  offset = DUMP_OBJECT_ON_PURE_QUEUE;
+          dump_remember_object (ctx, object, offset);
+
+	  struct dump_flags old_flags = ctx->flags;
+	  ctx->flags.dump_object_contents = false;
+          ctx->flags.defer_pure_objects = false;
+          dump_object (ctx, object);
+          ctx->flags = old_flags;
+	}
+      return offset;
+    }
+
   /* Object needs to be dumped.  */
   if (dump_set_referrer (ctx))
     ctx->current_referrer = object;
@@ -3606,6 +3653,25 @@ dump_drain_cold_data (struct dump_context *ctx)
   ctx->flags = old_flags;
 }
 
+static void
+dump_drain_pure_data (struct dump_context *ctx)
+{
+  Lisp_Object pure_queue = Fnreverse (ctx->pure_queue);
+  ctx->pure_queue = Qnil;
+
+  struct dump_flags old_flags = ctx->flags;
+
+  /* Actually dump pure objects instead of deferring them.  */
+  ctx->flags.defer_pure_objects = false;
+
+  while (!NILP (pure_queue))
+    {
+      Lisp_Object item = dump_pop (&pure_queue);
+      dump_object (ctx, item);
+    }
+  ctx->flags = old_flags;
+}
+
 static void
 read_ptr_raw_and_lv (const void *mem,
                      enum Lisp_Type type,
@@ -4195,6 +4261,7 @@ types.  */)
   ctx->symbol_aux = Qnil;
   ctx->copied_queue = Qnil;
   ctx->cold_queue = Qnil;
+  ctx->pure_queue = Qnil;
   for (int i = 0; i < RELOC_NUM_PHASES; ++i)
     ctx->dump_relocs[i] = Qnil;
   ctx->object_starts = Qnil;
@@ -4216,6 +4283,7 @@ types.  */)
   /* These objects go into special sections.  */
   ctx->flags.defer_cold_objects = true;
   ctx->flags.defer_copied_objects = true;
+  ctx->flags.defer_pure_objects = true;
 
   ctx->current_referrer = Qnil;
   if (!NILP (track_referrers))
@@ -4297,6 +4365,11 @@ types.  */)
   ctx->header.hash_list = ctx->offset;
   dump_hash_table_list (ctx);
 
+  ctx->header.pure_start = ctx->offset;
+  dump_drain_pure_data (ctx);
+  ctx->header.pure_end = ctx->offset;
+  dump_write_zero (ctx, PURESIZE - (ctx->offset - ctx->header.pure_start));
+
   /* dump_hash_table_list just adds a new vector to the dump but all
      its content should already have been in the dump, so it doesn't
      add anything to any queue.  */
@@ -4378,6 +4451,7 @@ types.  */)
   eassert (dump_queue_empty_p (&ctx->dump_queue));
   eassert (NILP (ctx->copied_queue));
   eassert (NILP (ctx->cold_queue));
+  eassert (NILP (ctx->pure_queue));
   eassert (NILP (ctx->deferred_symbols));
   eassert (NILP (ctx->deferred_hash_tables));
   eassert (NILP (ctx->fixups));
@@ -4401,13 +4475,16 @@ types.  */)
     header_bytes = header_end - header_start,
     hot_bytes = hot_end - hot_start,
     discardable_bytes = discardable_end - ctx->header.discardable_start,
-    cold_bytes = cold_end - ctx->header.cold_start;
+    cold_bytes = cold_end - ctx->header.cold_start,
+    pure_bytes = ctx->header.pure_end - ctx->header.pure_start;
   fprintf (stderr,
 	   ("Dump complete\n"
 	    "Byte counts: header=%"PRIdDUMP_OFF" hot=%"PRIdDUMP_OFF
 	    " discardable=%"PRIdDUMP_OFF" cold=%"PRIdDUMP_OFF"\n"
+	    " pure=%"PRIdDUMP_OFF"\n"
 	    "Reloc counts: hot=%"PRIdDUMP_OFF" discardable=%"PRIdDUMP_OFF"\n"),
 	   header_bytes, hot_bytes, discardable_bytes, cold_bytes,
+	   pure_bytes,
            number_hot_relocations,
            number_discardable_relocations);
 
@@ -5270,7 +5347,8 @@ pdumper_find_object_type_impl (const void *obj)
     return PDUMPER_NO_OBJECT;
   ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
   if (offset < dump_private.header.discardable_start
-      && !dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno))
+      && !dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno)
+      && !PURE_P (obj))
     return PDUMPER_NO_OBJECT;
   const struct dump_reloc *reloc =
     dump_find_relocation (&dump_private.header.object_starts, offset);
@@ -5814,6 +5892,12 @@ pdumper_load (const char *dump_filename, char *argv0)
   dump_do_all_dump_reloc_for_phase (header, dump_base, LATE_RELOCS);
   dump_do_all_dump_reloc_for_phase (header, dump_base, VERY_LATE_RELOCS);
 
+  if (PDUMPER_PURE)
+    {
+      xfree (pure);
+      pure = (EMACS_INT *)(dump_base + header->pure_start);
+    }
+
   /* Run the functions Emacs registered for doing post-dump-load
      initialization.  */
   for (int i = 0; i < nr_dump_late_hooks; ++i)
diff --git a/src/puresize.h b/src/puresize.h
index d7d8f0b4eec..1c906536eec 100644
--- a/src/puresize.h
+++ b/src/puresize.h
@@ -23,6 +23,8 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 
 INLINE_HEADER_BEGIN
 
+#define PDUMPER_PURE 1
+
 /* Define PURESIZE, the number of bytes of pure Lisp code to leave space for.
 
    At one point, this was defined in config.h, meaning that changing
@@ -79,13 +81,17 @@ INLINE_HEADER_BEGIN
 
 extern AVOID pure_write_error (Lisp_Object);
 
+#if PDUMPER_PURE == 1
+extern EMACS_INT *pure;
+#else
 extern EMACS_INT pure[];
+#endif
 
 /* The puresize_h_* macros are private to this include file.  */
 
 /* True if PTR is pure.  */
 
-#define puresize_h_PURE_P(ptr) \
+#define puresize_h_PURE_P(ptr)				\
   ((uintptr_t) (ptr) - (uintptr_t) pure <= PURESIZE)
 
 INLINE bool
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-21 17:41       ` Helmut Eller
@ 2024-12-21 18:32         ` Gerd Möllmann
  2024-12-21 22:19         ` Pip Cet via Emacs development discussions.
  2024-12-22  1:28         ` Stefan Kangas
  2 siblings, 0 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-21 18:32 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Stefan Monnier, Stefan Kangas, emacs-devel, Pip Cet

Helmut Eller <eller.helmut@gmail.com> writes:

> On Wed, Dec 18 2024, Helmut Eller wrote:
>
> [...]
>> So the pdumper copies objects from purespace to the dump like normal
>> objects; when loading the dump, purespace stays empty.
>>
>> I had (wrongly) assumed that the pdumper creates a separate section for
>> pure objects.  Creating such a section sounds easy enough (hmm, maybe
>> not so easy because of hashtables).  Still not sure if it would be worth
>> the effort.
>
> Out of curiosity, I implemented such a section with the attached patch.
> It seems that it would save ~2ms per collection cycle.
>
> To measure this, I compared the output of 
>
>  ./src/emacs -Q --batch --eval \
>  '(let* ((stats (benchmark-run 1000 (garbage-collect)))) 
>      (message "%s" (/ (elt stats 2) (elt stats 1))))'
>
> between the versions with and without the patch.  The results was:
>
>    without-pure-section: 0.006251480181
>       with-pure-section: 0.003986384857
>
> Helmut

Interesting! And impressive, that's ca. 1/3 less. 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-21 17:41       ` Helmut Eller
  2024-12-21 18:32         ` Gerd Möllmann
@ 2024-12-21 22:19         ` Pip Cet via Emacs development discussions.
  2024-12-22  1:28         ` Stefan Kangas
  2 siblings, 0 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-21 22:19 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Stefan Monnier, Stefan Kangas, emacs-devel

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Wed, Dec 18 2024, Helmut Eller wrote:
>
> [...]
>> So the pdumper copies objects from purespace to the dump like normal
>> objects; when loading the dump, purespace stays empty.
>>
>> I had (wrongly) assumed that the pdumper creates a separate section for
>> pure objects.  Creating such a section sounds easy enough (hmm, maybe
>> not so easy because of hashtables).  Still not sure if it would be worth
>> the effort.
>
> Out of curiosity, I implemented such a section with the attached patch.
> It seems that it would save ~2ms per collection cycle.

Very interesting, thank you for sharing!

I tried everything I could think of to disprove your findings, but while
I did find a minor optimization opportunity in the pdumper mark bits
code (a missing eassume (offset >= 0) / eassume (word_number >= 0)),
this affects both branches equally.  I can only conclude that your
observation is accurate, and we now have a number that is still close to
zero, but measurable.

The effect seems to be mostly constant (1-2 ms / GC cycle), but I did
see some slightly larger differences when more objects were on the heap.
Cache effects, probably (PURE_P doesn't have to wait for memory,
pdumper_marked_p_impl does).

(I did run into crashes when running emacs without --batch, but those
are easily fixed and did not affect the performance measurements at
all. I only mention this so that others running tests don't get their
hopes (or fears) up).

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-21 17:41       ` Helmut Eller
  2024-12-21 18:32         ` Gerd Möllmann
  2024-12-21 22:19         ` Pip Cet via Emacs development discussions.
@ 2024-12-22  1:28         ` Stefan Kangas
  2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
                             ` (2 more replies)
  2 siblings, 3 replies; 58+ messages in thread
From: Stefan Kangas @ 2024-12-22  1:28 UTC (permalink / raw)
  To: Helmut Eller, Stefan Monnier; +Cc: emacs-devel, Pip Cet

Helmut Eller <eller.helmut@gmail.com> writes:

> On Wed, Dec 18 2024, Helmut Eller wrote:
>
> [...]
>> So the pdumper copies objects from purespace to the dump like normal
>> objects; when loading the dump, purespace stays empty.
>>
>> I had (wrongly) assumed that the pdumper creates a separate section for
>> pure objects.  Creating such a section sounds easy enough (hmm, maybe
>> not so easy because of hashtables).  Still not sure if it would be worth
>> the effort.
>
> Out of curiosity, I implemented such a section with the attached patch.
> It seems that it would save ~2ms per collection cycle.
>
> To measure this, I compared the output of
>
>  ./src/emacs -Q --batch --eval \
>  '(let* ((stats (benchmark-run 1000 (garbage-collect))))
>      (message "%s" (/ (elt stats 2) (elt stats 1))))'
>
> between the versions with and without the patch.  The results was:
>
>    without-pure-section: 0.006251480181
>       with-pure-section: 0.003986384857

This is interesting, thanks.  Would this experiment easily transfer to
and be relevant to the MPS branch?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22  1:28         ` Stefan Kangas
@ 2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
  2024-12-22 13:07             ` Eli Zaretskii
  2024-12-22 15:51             ` Stefan Monnier
  2024-12-22 13:13           ` Pip Cet via Emacs development discussions.
  2024-12-22 14:16           ` Helmut Eller
  2 siblings, 2 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-22 11:12 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Helmut Eller, Stefan Monnier, emacs-devel

"Stefan Kangas" <stefankangas@gmail.com> writes:

> Helmut Eller <eller.helmut@gmail.com> writes:
>
>> On Wed, Dec 18 2024, Helmut Eller wrote:
>>
>> [...]
>>> So the pdumper copies objects from purespace to the dump like normal
>>> objects; when loading the dump, purespace stays empty.
>>>
>>> I had (wrongly) assumed that the pdumper creates a separate section for
>>> pure objects.  Creating such a section sounds easy enough (hmm, maybe
>>> not so easy because of hashtables).  Still not sure if it would be worth
>>> the effort.
>>
>> Out of curiosity, I implemented such a section with the attached patch.
>> It seems that it would save ~2ms per collection cycle.
>>
>> To measure this, I compared the output of
>>
>>  ./src/emacs -Q --batch --eval \
>>  '(let* ((stats (benchmark-run 1000 (garbage-collect))))
>>      (message "%s" (/ (elt stats 2) (elt stats 1))))'
>>
>> between the versions with and without the patch.  The results was:
>>
>>    without-pure-section: 0.006251480181
>>       with-pure-section: 0.003986384857
>
> This is interesting, thanks.  Would this experiment easily transfer to
> and be relevant to the MPS branch?

TL;DR: let's add check_writable to replace CHECK_IMPURE rather than
dropping the calls entirely.  It'd be a nop in the initial merge, but it
would facilitate experiments for regaining the advantages of purespace
without the ugliness.

If we want to perform an optimization such as this one, we should think
of ways to do so without pure space and all its restrictions.  I think I
have one.

My experiments suggest that the main performance impact is that when we
reach a pure object, we stop scanning, in effect treating it as
though it were already marked.  Checking and setting the mark bit of the
object doesn't affect the performance much, in my experiments.

My idea is this: we add an extra mark bit area to the pdumper file for
objects which we know to be "tenured": i.e. objects that we'll treat as
immortal, but for which we also know that all referenced objects will
also be "tenured", or static.

Such "tenured" objects aren't scanned during GC: they're already marked
when we start the GC cycle.

If we write to such an object, we clear the bit, and put it on a special
set to maintain its tenure (it'd be nicer to simply set another bit, but
non-MPS pdumper cannot do so).  This should happen rarely, but it's
better than the current CHECK_IMPURE thing.

Some housekeeping would be required to adjust the "tenured" set, but the
important point is that "tenured" is no longer a user-visible property:
correctness would be maintained even for dumb choices of the "tenured"
set.

The problems are:

1. choose a "tenured" set. I would hazard a guess that most objects in
the pdumper file are usually not actually written to, ever, so maybe we
could even get away with starting with a universally-filled "tenured"
set.  We could use, in essence, PGO to do so, by identifying call stacks
for objects that are modified and excluding them from the "tenured" set
based on the call stacks when the real Emacs is built.

2. keep the immortal set small.  In particular, even if an object was
written to, it's quite possible that it can be removed from the set
again, because the new references are also "tenured".  In extreme cases,
we could get away with clearing the entire "pure" set and continuing
without this optimization.  However, if we find an object cannot have
its tenured bit set anymore, we can recursively scan the remaining
"tenured" objects which reference this object and revoke their tenure,
and remove the problematic object from the immortal set.

In essence, all this would turn the user-visible "pure" status,
available only to explicitly-specified objects, into a user-invisible
"tenured" status, awarded automatically to some objects without
programmer interaction.  The benefits may well be greater because the
automatic detection of "tenured" objects might very easily be superior
to the current "pure" selection.  "Tenured" status would be available to
all pdumped objects, but for MPS builds it would be possible to expand
this so even objects created after loading the dump can become
"tenured".

(I realize "tenure" isn't a great term because a sensible implementation
would probably revoke tenure in some cases.  "GC-skippable" or
"skippable" might be better.)

The main implementation problem I see right now is that to correct our
tenured set, and for the immortal set, we'd need an efficient "set"
structure.  In particular, while ordinary GC would only need to use the
"immortal" set, analysis and fixing the "tenured" set would require
knowing about all "tenured" objects.  For the MPS build, this is
comparatively easy as we can simply set a bit and linear-scan the
pdumper area (all pdumped objects have igc headers).  For non-MPS, a
full hash table would probably be required, and it would have to be
dumped.

My important points are:

1. It's probably possible to do better performance-wise than Helmut's
experiment.
2. This potential approach would not require the reintroduction of a
user-visible "pure" status.
3. In particular, there's (almost) nothing to stop us from delaying
this experiment until after pure space is removed.
4. The exception to the "almost" is: We should probably introduce dummy
check_writable (Lisp_Object) calls rather than dropping CHECK_IMPURE ()
entirely.  We'll need them for experiments, even if the standard
implementation would be a nop.
5. While the details of the implementation would differ, this
optimization could apply to the MPS build, too, and even be generalized
(on no-purespace, we can't move an object into the pdumper are, but on
scratch/igc, we may be able to hack MPS to move an object into the
immovable "tenured" pool).

However, I realize that (1) is currently a sheer guess. I haven't
decided whether it's worth it to get an upper bound on the saved GC time
by implementing a universal "tenured" set and performing a GC right
after loading (which should be very fast, not marking any pdumped
objects).  The problem is that it's an upper bound: the effect will
become less significant as the immortal set would grow over time; at
some point, it's no longer worth it, and we could clear all "tenured"
bits and continue without this optimization.

While my experiments suggest it's not relevant, this would reduce some
potential benefits of Helmut's approach: locality wouldn't be as good,
we'd need to check a bit rather than doing a bounds check to identify
skippable objects, and of course there's the extra overhead from another
bits table and the set implementation.  If anyone tries this, please
also try whether increasing the pdumper alignment to 16 or further
improves performance; that would mean some wasted space, but it would
also reduce the sheer number of bits to check.

If we want dynamic adjustment of the skippable/"tenured" set, we'd also
need to duplicate a lot of the GC code to trace references for analysis
rather than setting their mark bits.  We might be able to avoid the code
duplication with a trick, but realistically we'd be looking at two code
paths: the "fast" GC path which performs a hardcoded action, and the
"slow" analysis path which merely collects references and returns them
to the caller.

Sorry this is a bit long.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
@ 2024-12-22 13:07             ` Eli Zaretskii
  2024-12-22 14:12               ` Pip Cet via Emacs development discussions.
  2024-12-22 15:51             ` Stefan Monnier
  1 sibling, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-12-22 13:07 UTC (permalink / raw)
  To: Pip Cet; +Cc: stefankangas, eller.helmut, monnier, emacs-devel

> Date: Sun, 22 Dec 2024 11:12:30 +0000
> Cc: Helmut Eller <eller.helmut@gmail.com>,
>  Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> 
> 1. choose a "tenured" set. I would hazard a guess that most objects in
> the pdumper file are usually not actually written to, ever, so maybe we
> could even get away with starting with a universally-filled "tenured"
> set.  We could use, in essence, PGO to do so, by identifying call stacks
> for objects that are modified and excluding them from the "tenured" set
> based on the call stacks when the real Emacs is built.

I don't think your guess is correct.  There are definitely objects in
the pdumper file that are written to.  Example: the *scratch* buffer.

Historically, the purecopy calls that created pure objects have their
legacy from the unexec era, when any Lisp object generated in temacs
was dumped into the emacs binary, and originally ended up in the BSS
section (and was shared between different emacs instances running on
the same system).  It is thus possible that quite a few of pure
objects were there "by accident", and there's no need to keep them
from temacs stage.  We already started to get rid of some of those,
but meanwhile did that only where pdumper caused trouble.  IOW, we are
now in an interim state where some dumped objects don't need to be
dumped at all, but should be recreated anew when Emacs starts.

So going by dumped objects is IMO following the wrong lead.

It would make sense to have a special API for defining objects that
need not be scanned by GC, in effect reintroducing pure space (but
with a different name and only for objects that are truly immutable).
However, it sounds like this adds back one reason why we wanted to get
rid of purespace: the tedious and error-prone job of identifying such
objects and marking them in the sources.

OTOH, what other reliable ways do we have?  Whether a given object is
immutable is a programmer-level decision.  I really don't see how this
decision could be made automatically by Emacs.




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22  1:28         ` Stefan Kangas
  2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
@ 2024-12-22 13:13           ` Pip Cet via Emacs development discussions.
  2024-12-22 14:16           ` Helmut Eller
  2 siblings, 0 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-22 13:13 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Helmut Eller, Stefan Monnier, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> However, I realize that (1) is currently a sheer guess. I haven't
> decided whether it's worth it to get an upper bound on the saved GC time
> by implementing a universal "tenured" set and performing a GC right
> after loading (which should be very fast, not marking any pdumped
> objects).

I did.  This got long again.  That's because I wanted to be really sure
that merging no-purespace isn't going to prevent worthwhile
optimizations in the future, and I am now.  Feel free to skip the rest
:-)

My initial results are that simply "tenuring" the char tables in the
pdump seems to have such a drastic effect that it's hard to perform a
fair measurement: process_mark_stack is called (in emacs -Q, no --batch)
21384 times if we "tenure" the char tables, and 135345 times if we
don't.

(This suggests that char tables may be worth optimizing for the "old"
GC: simply keep a set of GC-relevant values in the char table, and scan
that rather than scanning the entire char table.  However, we can't do
that with MPS, so I'm not overly interested in it.  Also, I doubt the
optimization decisions required for char tables would be made the same
way if they were reimplemented today, so it may be more productive to
start over from scratch, with a particular focus on reducing the time
needed for GC rather than ordinary performance)

Also, we need to add a few check_writable calls to avoid segfaults.  I
should have expected that, I guess.

The good news is that few pdumped objects (256 once a non-batched Emacs
is started) actually appear to be written to, so it's not entirely
hopeless to identify those in one run and mark them non-tenured in the
real Emacs.

IOW, my tentative conclusion is that it's possible to perform such
optimizations after pure space is dropped, and there's no reason to
delay the merge.

Optimizing based on a *hint* that an object probably won't be mutated is
a potential way forward.

Optimizing based on a hard promise that an object won't be mutated, as
the old purespace code does, not so much.  Even the old purespace code,
with the years of development it's seen, ended up losing the
optimization and causing preventable segfaults for valid-looking Elisp
code.

I must confess I'm fundamentally opposed to having objects come in a
"read-only" and a "read-write" flavor.  Either they should always be
immutable, such as bignums and floats are now, or we should go to the
trouble of supporting the rare cases in which an object hinted or
guessed to be read-only turned out not to be.  (This is independent of
the question of whether the characters in a string can be changed or
not.)

It's very hard even to define what constitutes mutation of an object and
what doesn't.  Setting a symbol's global value is clearly a mutation in
the current code, but what if we keep those global values in a hash
table instead, and the struct Lisp_Symbol is never written to?  Does
lexically (or dynamically) binding a symbol mean the entire symbol is no
longer read-only?  If we ever implement hash-collision workarounds by
randomizing hash seeds, would re-seeding count as a mutation of the hash
table?  What about (aset v 0 (aref v 0))? Hash table resizing? Removing
dead keys from Weak hash tables? Pinning a string to use it in a byte
code object?  Wouldn't it make sense to protect hash table (or obarray)
keys from mutation if that may result in irretrievable entries?

Most of these questions have two good answers, one which aids in
optimization, and one which Lisp programmers would expect.  They're
often different.

To get back to the no-purespace branch, I think we should consider
reintroducing check_writable () calls (which would currently be no-ops
on the master branch) after the merge, if we can agree on precisely when
this macro should be called and how.  The old locations of CHECK_IMPURE
can serve as a hint, but no more, so let's drop CHECK_IMPURE first and
start with a clean slate there.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22 13:07             ` Eli Zaretskii
@ 2024-12-22 14:12               ` Pip Cet via Emacs development discussions.
  0 siblings, 0 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-22 14:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefankangas, eller.helmut, monnier, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sun, 22 Dec 2024 11:12:30 +0000
>> Cc: Helmut Eller <eller.helmut@gmail.com>,
>>  Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>>
>> 1. choose a "tenured" set. I would hazard a guess that most objects in
>> the pdumper file are usually not actually written to, ever, so maybe we
>> could even get away with starting with a universally-filled "tenured"
>> set.  We could use, in essence, PGO to do so, by identifying call stacks
>> for objects that are modified and excluding them from the "tenured" set
>> based on the call stacks when the real Emacs is built.
>
> I don't think your guess is correct.

My proposal was about hints (or guesses), not about hard promises.  I
see no reason to prefer the latter to the former: on the C code side of
things, it causes complexity, because we need to deal with broken
promises somehow (currently, we just crash, in some cases).  I'm
convinced it's worse for performance, because there are many
mostly-immutable-but-someone-might objects which are still worth
optimizing for.

As for my guess, it is correct.  I guessed that few objects are written
to, so we can give a blanket *hint* that an object in the pdump is
probably not going to be mutated.  It's obviously true that some objects
are, so inevitably some of the hints are going to be incorrect, but
that's why they're just hints.

> There are definitely objects in the pdumper file that are written to.

Of course, but that wasn't the question.

> Historically, the purecopy calls that created pure objects have their
> legacy from the unexec era, when any Lisp object generated in temacs
> was dumped into the emacs binary, and originally ended up in the BSS
> section (and was shared between different emacs instances running on
> the same system).  It is thus possible that quite a few of pure
> objects were there "by accident", and there's no need to keep them
> from temacs stage.  We already started to get rid of some of those,
> but meanwhile did that only where pdumper caused trouble.  IOW, we are
> now in an interim state where some dumped objects don't need to be
> dumped at all, but should be recreated anew when Emacs starts.
>
> So going by dumped objects is IMO following the wrong lead.

I agree.

> It would make sense to have a special API for defining objects that
> need not be scanned by GC, in effect reintroducing pure space (but
> with a different name and only for objects that are truly immutable).

I disagree.  I don't see the need for the "hard promise" character of
that API.  Hints seem to be sufficient, they cause much less trouble,
and we can generate them (semi-)automatically because an incorrect hint
would cost some performance but wouldn't crash Emacs.

Most importantly, whether an object needs to be scanned by GC or not is
very difficult to decide, and liable to change, and then we have to
check all the API calls, and if we make a mistake it'll result in
crashes.

We know that would happen because we already did it, with purespace, and
now master-branch Emacs is crashable.

> However, it sounds like this adds back one reason why we wanted to get
> rid of purespace: the tedious and error-prone job of identifying such
> objects and marking them in the sources.

That's why I think hints/guesses are the only option: then we can just
ignore them, or lose some performance because of wrong hints, or correct
them automatically, or whatever, without ever crashing Emacs.

> OTOH, what other reliable ways do we have?  Whether a given object is

With hints, we don't need to be reliable, we just need an "oops, turns
out not to be read-only" fire escape, which isn't that hard to do.

> immutable is a programmer-level decision.  I really don't see how this

Why encourage programmers to even consider the question, though?  I
think of it as analogous to deciding the right size of a C integer: Lisp
avoids forcing the programmer to make that decision, and it's a better
language because of that.  Guesses or hints are sufficient for
performance, and they maintain the flexibility of simply mutating an
object when we need to.  They're also better for performance.

(Char tables can be "sealed" so we don't have to scan the entire
table, just the values used in it (and each value only needs to be
scanned once).  Mutation would mean unsealing them, and then we could
re-seal them after another GC cycle or two without modfications.  That's
the kind of approach we should generalize, not "I promise not to modify
this object".)

> decision could be made automatically by Emacs.

Hints (or guesses) can be done automatically.  Hard promises, no.

I'm comfortable giving up on hard promises for good.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22  1:28         ` Stefan Kangas
  2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
  2024-12-22 13:13           ` Pip Cet via Emacs development discussions.
@ 2024-12-22 14:16           ` Helmut Eller
  2 siblings, 0 replies; 58+ messages in thread
From: Helmut Eller @ 2024-12-22 14:16 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Stefan Monnier, emacs-devel, Pip Cet

On Sun, Dec 22 2024, Stefan Kangas wrote:

> Helmut Eller <eller.helmut@gmail.com> writes:
[...]
>> between the versions with and without the patch.  The results was:
>>
>>    without-pure-section: 0.006251480181
>>       with-pure-section: 0.003986384857
>
> This is interesting, thanks.  Would this experiment easily transfer to
> and be relevant to the MPS branch?

To be useful for MPS, we would probably need sections that can be loaded
to different/non-contiguous memory blocks that are allocated by MPS.
(One section per MPS pool).  That would require some work on the
pdumper.

It's less relevant for MPS because MPS is incremental and generational
and so pure objects contribute much less to the average GC pause time.

Helmut



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
  2024-12-22 13:07             ` Eli Zaretskii
@ 2024-12-22 15:51             ` Stefan Monnier
  2024-12-22 17:09               ` Gerd Möllmann
  2024-12-22 17:10               ` Pip Cet via Emacs development discussions.
  1 sibling, 2 replies; 58+ messages in thread
From: Stefan Monnier @ 2024-12-22 15:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: Stefan Kangas, Helmut Eller, emacs-devel

> My idea is this: we add an extra mark bit area to the pdumper file for
> objects which we know to be "tenured": i.e. objects that we'll treat as
> immortal, but for which we also know that all referenced objects will
> also be "tenured", or static.

IIUC this sounds like a kind of generational GC, except that promotion
to the "tenured" set is made somewhat visible instead of being 100% internal.

> If we write to such an object, we clear the bit, and put it on a special
> set to maintain its tenure (it'd be nicer to simply set another bit, but
> non-MPS pdumper cannot do so).  This should happen rarely, but it's
> better than the current CHECK_IMPURE thing.

If my understanding above is correct, then the
`CHECK_IMPURE/check_writable` is what we usually call "write barrier",
and the "special set" above is what we usually call the "remembered set".


        Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22 15:51             ` Stefan Monnier
@ 2024-12-22 17:09               ` Gerd Möllmann
  2024-12-22 17:10               ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 58+ messages in thread
From: Gerd Möllmann @ 2024-12-22 17:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Pip Cet, Stefan Kangas, Helmut Eller, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> My idea is this: we add an extra mark bit area to the pdumper file for
>> objects which we know to be "tenured": i.e. objects that we'll treat as
>> immortal, but for which we also know that all referenced objects will
>> also be "tenured", or static.
>
> IIUC this sounds like a kind of generational GC, except that promotion
> to the "tenured" set is made somewhat visible instead of being 100% internal.
>
>> If we write to such an object, we clear the bit, and put it on a special
>> set to maintain its tenure (it'd be nicer to simply set another bit, but
>> non-MPS pdumper cannot do so).  This should happen rarely, but it's
>> better than the current CHECK_IMPURE thing.
>
> If my understanding above is correct, then the
> `CHECK_IMPURE/check_writable` is what we usually call "write barrier",
> and the "special set" above is what we usually call the "remembered set".
>

Same understanding here.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22 15:51             ` Stefan Monnier
  2024-12-22 17:09               ` Gerd Möllmann
@ 2024-12-22 17:10               ` Pip Cet via Emacs development discussions.
  2025-01-04  0:09                 ` Stefan Monnier
  1 sibling, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-22 17:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stefan Kangas, Helmut Eller, emacs-devel

"Stefan Monnier" <monnier@iro.umontreal.ca> writes:

>> My idea is this: we add an extra mark bit area to the pdumper file for
>> objects which we know to be "tenured": i.e. objects that we'll treat as
>> immortal, but for which we also know that all referenced objects will
>> also be "tenured", or static.
>
> IIUC this sounds like a kind of generational GC, except that promotion
> to the "tenured" set is made somewhat visible instead of being 100% internal.

Kind of a very special case of it, but there's no major GC and we accept
that objects in the "old" generation are simply immortal.  I think it's
easier to get to the concept by starting from "here's a quick hack"
rather than "start with generational GC, then remove bits from it until
all that's left is a quick hack".

I was intending for the promotion to be invisible to "normal" Lisp
users, though.  Sorry if that didn't become clear.

>> If we write to such an object, we clear the bit, and put it on a special
>> set to maintain its tenure (it'd be nicer to simply set another bit, but
>> non-MPS pdumper cannot do so).  This should happen rarely, but it's
>> better than the current CHECK_IMPURE thing.
>
> If my understanding above is correct, then the
> `CHECK_IMPURE/check_writable` is what we usually call "write barrier",

Yes. I was trying to avoid the term because there are other uses for
check_writable than clearing a write barrier, particularly if we pair
each call to check_writable with one to
no_longer_care_whether_the_object_is_writable.

> and the "special set" above is what we usually call the "remembered set".

TBH, I'm not entirely sure about that one, and I see I misspoke there;
while the object remains immortal (and thus "tenured"), it's no longer
considered skippable; in effect, the object combines the disadvantages
of being young and being old, permanently, rather than clearly being one
of the two.  I guess you could say we turn it into an additional root?

It becomes much closer to generational GC if you reintroduce "major" GCs
which would recalculate the tenure of all objects, but that's where we
hit the limits of "quick hack" territory, and I don't see a way of
detecting when we would want to do so automatically.

Anyway, leaving those very general questions aside, char table GC is
inefficient and fixing it will speed up the first few GCs considerably,
and we can do so without making anything "pure".

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-19 20:13                             ` Pip Cet via Emacs development discussions.
  2024-12-20 15:59                               ` Stefan Monnier
@ 2025-01-03 22:36                               ` Pip Cet via Emacs development discussions.
  2025-01-03 23:50                                 ` Stefan Monnier
  2025-01-04  7:26                                 ` Eli Zaretskii
  1 sibling, 2 replies; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2025-01-03 22:36 UTC (permalink / raw)
  To: Pip Cet via "Emacs development discussions."
  Cc: Stefan Kangas, Andrea Corallo, Gerd Möllmann, Eli Zaretskii,
	monnier

"Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org> writes:

> "Stefan Kangas" <stefankangas@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> "Stefan Kangas" <stefankangas@gmail.com> writes:
>>>
>>>> Pip Cet <pipcet@protonmail.com> writes:
>>>>
>>>>> What I think we should do doesn't really matter, but it seems quite
>>>>> obvious to me that we should make the code on the master branch
>>>>> perform all three checks on all relocations, as the code on
>>>>> no-purespace does.
>>>>
>>>> Maybe.  But won't we get those checks with no additional effort once we
>>>> merge no-purespace,
>>>
>>> Yes, we will.  (And the forbidden symbol; even if the forbidden symbol
>>> doesn't cause trouble, which I think it will, it's simply very poor
>>> programming practice to do things that way, particularly since the
>>> crash may happen a long time after the compilation.  But, again, what I
>>> think obviously doesn't matter here.  I'll just remember that
>>> --enable-checking causes false positive crashes and shouldn't be used).
>>
>> I don't think the existence of one symbol that will crash Emacs in some
>> situations means that --enable-checking should be completely avoided.
>> It's still quite useful, and we're fine as long as we avoid using that
>> one symbol, right?
>>
>> OTOH and IMHO, it would be preferable if that symbol could not crash
>> Emacs.  Can we come up with a good way to fix that, while preserving the
>> check that Andrea wants to keep?
>
> That sounds like a good thing to focus on, yes.  We need to have a value
> in a vector that we Fread that is distinguishable from all other values.

I just reread the code, and #$ may be what we're looking for.  It's a
unique value that we can pass in to Fread (let-binding load-file-name),
and it already exists.  OTOH, it's used for docstrings normally, so it
may be cleaner to invent new read syntax.

I'd really like to fix this.

> For example, right now this code doesn't work:
>
> (let ((print-circle t) (read-circle nil))
>   (message "%S" (funcall (native-compile (lambda () #1=[#1#])))))
>
> (read, of course, with read-circle bound to t)
>
> but this does:
>
> (let ((print-circle t) (read-circle nil))
>   (message "%S" (funcall (lambda () #1=[#1#]))))
>
> So this seems like a cheap drive-by fix.

The non-nativecomp code also breaks if read-circle is nil (I assume this
is related to autoload), so it isn't just the nativecomp code that
should be fixed.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2025-01-03 22:36                               ` Pip Cet via Emacs development discussions.
@ 2025-01-03 23:50                                 ` Stefan Monnier
  2025-01-04  7:26                                 ` Eli Zaretskii
  1 sibling, 0 replies; 58+ messages in thread
From: Stefan Monnier @ 2025-01-03 23:50 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Stefan Kangas, Andrea Corallo, Gerd Möllmann, Eli Zaretskii

> I just reread the code, and #$ may be what we're looking for.  It's a
> unique value that we can pass in to Fread (let-binding load-file-name),
> and it already exists.  OTOH, it's used for docstrings normally, so it
> may be cleaner to invent new read syntax.

I still don't understand what this is going to be used for, so I'm not
sure how hackish this would be, but indeed #$ can be used to have the
reader return a known, unique object that we can check with `eq`.


        Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2024-12-22 17:10               ` Pip Cet via Emacs development discussions.
@ 2025-01-04  0:09                 ` Stefan Monnier
  0 siblings, 0 replies; 58+ messages in thread
From: Stefan Monnier @ 2025-01-04  0:09 UTC (permalink / raw)
  To: Pip Cet; +Cc: Stefan Kangas, Helmut Eller, emacs-devel

>> and the "special set" above is what we usually call the "remembered set".
>
> TBH, I'm not entirely sure about that one, and I see I misspoke there;
> while the object remains immortal (and thus "tenured"), it's no longer
> considered skippable; in effect, the object combines the disadvantages
> of being young and being old, permanently, rather than clearly being one
> of the two.  I guess you could say we turn it into an additional root?

AFAIK in the context of generational GC, when a write-barrier records
a mutation in the old/tenured object, different GCs make different
choices w.r.t what is recorded in the "remembered set": the set may
record the new target of the pointer that's assigned (assuming it's
a pointer), or it record the address of the field that's mutated, or it
can record the actual object that's mutated, or it records a "card"
(i.e. a somewhat arbitrary memory region that contains the field,
usually sized and aligned on a power of 2), ...

So IMO what you describe still sits comfortably within the scope of
"remembered set".

> It becomes much closer to generational GC if you reintroduce "major" GCs
> which would recalculate the tenure of all objects, but that's where we
> hit the limits of "quick hack" territory, and I don't see a way of
> detecting when we would want to do so automatically.

Generational GCs don't have "a way of detecting when we would want to do
[a major GC] automatically" either.  They just use heuristics.

Regarding "quick hack", I think the main issue is the write barrier.


        Stefan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2025-01-03 22:36                               ` Pip Cet via Emacs development discussions.
  2025-01-03 23:50                                 ` Stefan Monnier
@ 2025-01-04  7:26                                 ` Eli Zaretskii
  2025-01-04 11:12                                   ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2025-01-04  7:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel, stefankangas, acorallo, gerd.moellmann, monnier

> Date: Fri, 03 Jan 2025 22:36:42 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Stefan Kangas <stefankangas@gmail.com>, Andrea Corallo <acorallo@gnu.org>, Gerd Möllmann <gerd.moellmann@gmail.com>, Eli Zaretskii <eliz@gnu.org>, monnier@iro.umontreal.ca
> 
> "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org> writes:
> 
> > "Stefan Kangas" <stefankangas@gmail.com> writes:
> >
> >> Pip Cet <pipcet@protonmail.com> writes:
> >>
> >>> "Stefan Kangas" <stefankangas@gmail.com> writes:
> >>>
> >>>> Pip Cet <pipcet@protonmail.com> writes:
> >>>>
> >>>>> What I think we should do doesn't really matter, but it seems quite
> >>>>> obvious to me that we should make the code on the master branch
> >>>>> perform all three checks on all relocations, as the code on
> >>>>> no-purespace does.
> >>>>
> >>>> Maybe.  But won't we get those checks with no additional effort once we
> >>>> merge no-purespace,
> >>>
> >>> Yes, we will.  (And the forbidden symbol; even if the forbidden symbol
> >>> doesn't cause trouble, which I think it will, it's simply very poor
> >>> programming practice to do things that way, particularly since the
> >>> crash may happen a long time after the compilation.  But, again, what I
> >>> think obviously doesn't matter here.  I'll just remember that
> >>> --enable-checking causes false positive crashes and shouldn't be used).
> >>
> >> I don't think the existence of one symbol that will crash Emacs in some
> >> situations means that --enable-checking should be completely avoided.
> >> It's still quite useful, and we're fine as long as we avoid using that
> >> one symbol, right?
> >>
> >> OTOH and IMHO, it would be preferable if that symbol could not crash
> >> Emacs.  Can we come up with a good way to fix that, while preserving the
> >> check that Andrea wants to keep?
> >
> > That sounds like a good thing to focus on, yes.  We need to have a value
> > in a vector that we Fread that is distinguishable from all other values.
> 
> I just reread the code, and #$ may be what we're looking for.  It's a
> unique value that we can pass in to Fread (let-binding load-file-name),
> and it already exists.  OTOH, it's used for docstrings normally, so it
> may be cleaner to invent new read syntax.
> 
> I'd really like to fix this.
> 
> > For example, right now this code doesn't work:
> >
> > (let ((print-circle t) (read-circle nil))
> >   (message "%S" (funcall (native-compile (lambda () #1=[#1#])))))
> >
> > (read, of course, with read-circle bound to t)
> >
> > but this does:
> >
> > (let ((print-circle t) (read-circle nil))
> >   (message "%S" (funcall (lambda () #1=[#1#]))))
> >
> > So this seems like a cheap drive-by fix.
> 
> The non-nativecomp code also breaks if read-circle is nil (I assume this
> is related to autoload), so it isn't just the nativecomp code that
> should be fixed.

Is this about bug#74966?  If so, let's not discuss the same issue
here. Andrea suggested a change that he prefers, and from where I
stand, we should wait for him to publish such a patch.  Then we can
discuss the patch if there are comments or objections.  But let's do
all that in the bug thread, not here.

Apologies if I misunderstood what the above is about.




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2025-01-04  7:26                                 ` Eli Zaretskii
@ 2025-01-04 11:12                                   ` Pip Cet via Emacs development discussions.
  2025-01-04 13:50                                     ` Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2025-01-04 11:12 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: emacs-devel, stefankangas, acorallo, gerd.moellmann, monnier

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Fri, 03 Jan 2025 22:36:42 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>, Andrea Corallo
>> <acorallo@gnu.org>, Gerd Möllmann <gerd.moellmann@gmail.com>, Eli
>> Zaretskii <eliz@gnu.org>, monnier@iro.umontreal.ca
>>
>> "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org> writes:
>>
>> > "Stefan Kangas" <stefankangas@gmail.com> writes:
>> >
>> >> Pip Cet <pipcet@protonmail.com> writes:
>> >>
>> >>> "Stefan Kangas" <stefankangas@gmail.com> writes:
>> >>>
>> >>>> Pip Cet <pipcet@protonmail.com> writes:
>> >>>>
>> >>>>> What I think we should do doesn't really matter, but it seems quite
>> >>>>> obvious to me that we should make the code on the master branch
>> >>>>> perform all three checks on all relocations, as the code on
>> >>>>> no-purespace does.
>> >>>>
>> >>>> Maybe.  But won't we get those checks with no additional effort once we
>> >>>> merge no-purespace,
>> >>>
>> >>> Yes, we will.  (And the forbidden symbol; even if the forbidden symbol
>> >>> doesn't cause trouble, which I think it will, it's simply very poor
>> >>> programming practice to do things that way, particularly since the
>> >>> crash may happen a long time after the compilation.  But, again, what I
>> >>> think obviously doesn't matter here.  I'll just remember that
>> >>> --enable-checking causes false positive crashes and shouldn't be used).
>> >>
>> >> I don't think the existence of one symbol that will crash Emacs in some
>> >> situations means that --enable-checking should be completely avoided.
>> >> It's still quite useful, and we're fine as long as we avoid using that
>> >> one symbol, right?
>> >>
>> >> OTOH and IMHO, it would be preferable if that symbol could not crash
>> >> Emacs.  Can we come up with a good way to fix that, while preserving the
>> >> check that Andrea wants to keep?
>> >
>> > That sounds like a good thing to focus on, yes.  We need to have a value
>> > in a vector that we Fread that is distinguishable from all other values.
>>
>> I just reread the code, and #$ may be what we're looking for.  It's a
>> unique value that we can pass in to Fread (let-binding load-file-name),
>> and it already exists.  OTOH, it's used for docstrings normally, so it
>> may be cleaner to invent new read syntax.
>>
>> I'd really like to fix this.
>>
>> > For example, right now this code doesn't work:
>> >
>> > (let ((print-circle t) (read-circle nil))
>> >   (message "%S" (funcall (native-compile (lambda () #1=[#1#])))))
>> >
>> > (read, of course, with read-circle bound to t)
>> >
>> > but this does:
>> >
>> > (let ((print-circle t) (read-circle nil))
>> >   (message "%S" (funcall (lambda () #1=[#1#]))))
>> >
>> > So this seems like a cheap drive-by fix.
>>
>> The non-nativecomp code also breaks if read-circle is nil (I assume this
>> is related to autoload), so it isn't just the nativecomp code that
>> should be fixed.
>
> Is this about bug#74966?  If so, let's not discuss the same issue

No.  This is about the --enable-checking code treating --lambda-fixup as
a forbidden symbol and crashing if it is used in the obvious fashion by
any code that happens to be native-compiled, which is the current status
on scratch/no-purespace.

Pip




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Merging scratch/no-purespace to remove unexec and purespace
  2025-01-04 11:12                                   ` Pip Cet via Emacs development discussions.
@ 2025-01-04 13:50                                     ` Eli Zaretskii
  0 siblings, 0 replies; 58+ messages in thread
From: Eli Zaretskii @ 2025-01-04 13:50 UTC (permalink / raw)
  To: Pip Cet, acorallo; +Cc: emacs-devel, stefankangas, gerd.moellmann, monnier

> Date: Sat, 04 Jan 2025 11:12:48 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: emacs-devel@gnu.org, stefankangas@gmail.com, acorallo@gnu.org, gerd.moellmann@gmail.com, monnier@iro.umontreal.ca
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > Is this about bug#74966?  If so, let's not discuss the same issue
> 
> No.  This is about the --enable-checking code treating --lambda-fixup as
> a forbidden symbol and crashing if it is used in the obvious fashion by
> any code that happens to be native-compiled, which is the current status
> on scratch/no-purespace.

Then I'd like to hear Andrea's opinion on this.



^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2025-01-04 13:50 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-17 10:47 Merging scratch/no-purespace to remove unexec and purespace Stefan Kangas
2024-12-17 13:12 ` Gerd Möllmann
2024-12-17 14:20   ` Gerd Möllmann
2024-12-17 14:30     ` Gerd Möllmann
2024-12-17 17:56       ` Gerd Möllmann
2024-12-17 18:50         ` Eli Zaretskii
2024-12-17 18:56           ` Gerd Möllmann
2024-12-18 12:55             ` Andrea Corallo
2024-12-18 14:03               ` Gerd Möllmann
2024-12-18 16:05                 ` Pip Cet via Emacs development discussions.
2024-12-18 16:30                   ` Gerd Möllmann
2024-12-18 16:25                 ` Pip Cet via Emacs development discussions.
2024-12-18 22:27                   ` Andrea Corallo
2024-12-19  9:28                     ` Pip Cet via Emacs development discussions.
2024-12-19 10:38                       ` Andrea Corallo
2024-12-19 10:50                       ` Stefan Kangas
2024-12-19 12:08                         ` Pip Cet via Emacs development discussions.
2024-12-19 17:55                           ` Stefan Kangas
2024-12-19 20:13                             ` Pip Cet via Emacs development discussions.
2024-12-20 15:59                               ` Stefan Monnier
2024-12-20 16:22                                 ` Pip Cet via Emacs development discussions.
2024-12-20 17:25                                   ` Gerd Möllmann
2024-12-20 20:35                                     ` Andrea Corallo
2024-12-20 20:39                                       ` Pip Cet via Emacs development discussions.
2024-12-21  6:33                                         ` Gerd Möllmann
2024-12-21  6:56                                         ` Andrea Corallo
2024-12-20 20:38                                     ` Pip Cet via Emacs development discussions.
2024-12-20 20:57                                       ` Gerd Möllmann
2025-01-03 22:36                               ` Pip Cet via Emacs development discussions.
2025-01-03 23:50                                 ` Stefan Monnier
2025-01-04  7:26                                 ` Eli Zaretskii
2025-01-04 11:12                                   ` Pip Cet via Emacs development discussions.
2025-01-04 13:50                                     ` Eli Zaretskii
2024-12-20  8:42                             ` Pip Cet via Emacs development discussions.
2024-12-18  0:18         ` Stefan Kangas
2024-12-17 19:30 ` Helmut Eller
2024-12-17 20:47   ` Stefan Monnier
2024-12-18  2:15     ` Stefan Kangas
2024-12-18  7:11       ` Helmut Eller
2024-12-18 13:35         ` Pip Cet via Emacs development discussions.
2024-12-18  6:56     ` Helmut Eller
2024-12-21 17:41       ` Helmut Eller
2024-12-21 18:32         ` Gerd Möllmann
2024-12-21 22:19         ` Pip Cet via Emacs development discussions.
2024-12-22  1:28         ` Stefan Kangas
2024-12-22 11:12           ` Pip Cet via Emacs development discussions.
2024-12-22 13:07             ` Eli Zaretskii
2024-12-22 14:12               ` Pip Cet via Emacs development discussions.
2024-12-22 15:51             ` Stefan Monnier
2024-12-22 17:09               ` Gerd Möllmann
2024-12-22 17:10               ` Pip Cet via Emacs development discussions.
2025-01-04  0:09                 ` Stefan Monnier
2024-12-22 13:13           ` Pip Cet via Emacs development discussions.
2024-12-22 14:16           ` Helmut Eller
2024-12-18  9:30   ` Pip Cet via Emacs development discussions.
2024-12-18  0:50 ` Po Lu
2024-12-18  2:12   ` Stefan Kangas
2024-12-18 21:26   ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).